Type Inference
Experimental
The optype.infer module is experimental and its API may change without notice.
optype.infer works out which optype protocols a function requires of its parameters.
It runs the function against recording proxies that trace every operation, then renders
the result as a PEP 695 signature.
For the Python API surface (the infer function and its exceptions), see the
optype.infer reference.
Usage
infer(func, *params) returns the inferred signature as a string:
>>> from optype.infer import infer
>>> infer(lambda x: x + 1)
'[R](x: CanAdd[Literal[1], R]) -> R'
>>> print(infer(list))
(tuple[()] = ...) -> list[Never]
[R](CanIter[CanNext[R]] & ~tuple[()]) -> list[R]
Pass parameter names or positions to report only those parameters:
The optype infer command takes a Python snippet whose final statement is
an expression or a def/class definition; any leading statements run as setup:
$ optype infer "lambda x: x * 2"
[R](x: CanMul[Literal[2], R]) -> R
$ optype infer "import math; math.sqrt"
(CanFloat | CanIndex) -> float
Overloads
A binary operator can dispatch to either operand, so it is reported as one overload per line:
$ optype infer "lambda x, y: x * y"
[T, R](x: CanMul[T, R], y: T) -> R
[T, R](x: T, y: CanRMul[T, R]) -> R
An operator applied to its own result yields a recursive bound, where the bound of T
refers back to T:
$ optype infer "lambda x: -x + x"
[T: CanNeg[CanAdd[T, R]], R](x: T) -> R
[T, R](x: CanNeg[T] & CanRAdd[T, R]) -> R
Intersections
Python has no intersection types, so the & is not valid Python: both requirements
apply at once. Since each optype protocol declares a single method, an intersection
of distinct protocols is expressible as a protocol that inherits from all members:
This turns the CanNeg[T] & CanRAdd[T, R] above into the valid CanNegRAdd[T, R].
Where optype already ships the combined protocol, infer reports it directly:
CanGetitem & CanLen merges into CanSequence, and a traced with statement renders
as CanWith rather than CanEnter & CanExit:
$ optype infer "lambda x, i: x[i] if len(x) else None"
[T, R](x: CanSequence[T, R], i: T) -> R | None
Only an intersection of the same protocol shares its method, which happens when an operation is traced at several arities:
$ optype infer "lambda x: (round(x), round(x, 2))"
[R, R2](x: CanRound[R] & CanRound[Literal[2], R2]) -> tuple[R, R2]
That takes @overloads instead, which optype ships as the three-parameter
CanRound: this intersection is CanRound[Literal[2], R, R2].
Operators
Every operation that dispatches through a dunder is traced. That includes augmented assignments, which map to the in-place protocols:
The same goes for builtins such as divmod, round, and reversed, which dispatch
through dunders too:
$ optype infer "lambda x: divmod(x, 2)"
[R](x: CanDivmod[Literal[2], R]) -> R
$ optype infer "lambda x: round(x, 2)"
[R](x: CanRound[Literal[2], R]) -> R
$ optype infer "lambda x: reversed(x)"
[R](x: CanReversed[R]) -> R
A call dispatches through __call__, but a parameter that is called renders in
signature syntax rather than as CanCall or Callable, with any keyword arguments
as named parameters:
Attributes
Attribute access dispatches through __getattr__, except for the few attributes
that every object already has, like __doc__, which require nothing. Reading a
special attribute that optype ships a single-member Has* protocol for reports
that protocol directly:
An attribute without a shipped protocol renders as the fictional inline form
Has['name', T]. Like & and ~, it is not valid Python, but it is expressible as
a protocol with a single member. The type argument carries a variance sign: a read
requires only the covariant +T, so a @property getter suffices:
This turns Has['spam', +R] into the valid HasSpam[R]:
Reading the +/- signs
The sign is the direction the value flows: a covariant +T is produced (read
out, like a getter's return type), and a contravariant -T is consumed (passed
in, like a setter's argument). So a read is +, a write is -.
When the attribute is called, the sign sinks into the callable's return type, where
the covariance applies: Has['spam', () -> +R] is the method def spam(self) -> R:
An assignment requires the contravariant -T: a mutable attribute that accepts the
assigned value's type, such as spam: T itself, or any wider type. A deletion, or a
read whose result is unused, requires only that the attribute exists (deletability
itself is not expressible):
$ optype infer "def f(x): x.spam = 1; return x"
[T: Has['spam', -Literal[1]]](x: T) -> T
$ optype infer "def f(x): del x.spam"
(x: Has['spam']) -> None
A builtin object renders by its importable types name, so a module is ModuleType, a
code object CodeType, etc.
Classes
type reads the class directly instead of dispatching through a dunder, but every
recording proxy has a unique class, so the result is still tied to its parameter:
$ optype infer "lambda x: type(x)"
[T](x: T) -> type[T]
$ optype infer "lambda x: type(next(x))"
[R](x: CanNext[R]) -> type[R]
That unique class also ties its instances back to the parameter:
An attribute access on the class itself renders inside ClassVar, mirroring a protocol
member declared as spam: ClassVar[...].
$ optype infer "lambda x: type(x).spam"
[R](x: Has['spam', ClassVar[+R]]) -> R
$ optype infer "def f(x): type(x).spam = 1"
(x: Has['spam', ClassVar[-Literal[1]]]) -> None
$ optype infer "def f(x): del type(x).spam"
(x: Has['spam', ClassVar]) -> None
A concrete class renders parameterized when it resolves by name (a local class stays
a bare type), and type is covariant, so a subclass is absorbed by its parent:
$ optype infer "lambda: bool"
() -> type[bool]
$ optype infer "lambda x: int if x else bool"
(x: CanBool) -> type[int]
A subscripted generic keeps its arguments, builtin or user-defined:
$ optype infer "lambda: list[int]"
() -> type[list[int]]
$ optype infer "lambda: dict[str, int]"
() -> type[dict[str, int]]
A union or Callable has no type[...] form, so it renders as a TypeForm
(PEP 747):
$ optype infer "lambda: int | str"
() -> TypeForm[int | str]
$ optype infer "from collections.abc import Callable
lambda: Callable[[int], str]"
() -> TypeForm[(int) -> str]
An origin or argument that doesn't resolve by name keeps the bare GenericAlias.
Branches
Both sides of a conditional are explored, so the parameter has to satisfy every branch (an intersection) and the return is the union of the branch results:
$ optype infer "lambda x: x if x > 0 else -x"
[T: CanGt[Literal[0], CanBool] & CanNeg[R], R](x: T) -> T | R
Branching and overloads combine:
$ optype infer "lambda x, y: (x + y) if x else y"
[T, R](x: CanBool & CanAdd[T, R], y: T) -> R | T
[T: CanBool, U: CanRAdd[T, R], R](x: T, y: U) -> R | U
A comparison chain short-circuits, so it branches too: a falsy first comparison is
returned as-is, and only a truthy one evaluates the second (0 < x reflects to
x.__gt__):
$ optype infer "lambda x: 0 < x < 10"
[R, R2: CanBool](x: CanGt[Literal[0], R2 & CanBool] & CanLt[Literal[10], R]) -> R | R2
A predicate is assumed stable within a single call, so repeating it on the same operand agrees rather than branching again. A self-contradicting guard is therefore never satisfiable, and its body is left untraced:
Distinct operands stay independent, so a in x and b in x still branch separately.
Variadic parameters
A *args parameter that is only passed around as a whole is inferred as a
PEP 646 variadic type parameter, *Ts:
$ optype infer "lambda *args: args"
[*Ts](*args: *Ts) -> tuple[*Ts]
$ optype infer "lambda *args: (1, *args)"
[*Ts](*args: *Ts) -> tuple[Literal[1], *Ts]
Operating on individual elements is not expressible with a variadic type parameter, so
the elements then share a single inferred element type, as does every value of
**kwargs:
$ optype infer "lambda *args: args[0] + args[1]"
[T: CanAdd[T, R], R](*args: T) -> R
[T: CanRAdd[T, R], R](*args: T) -> R
$ optype infer "lambda **kwargs: kwargs"
[T](**kwargs: T) -> dict[str, T]
A variadic spread into a callable collapses into a single *tuple[T, ...], so its
arity tracks the variadic rather than the placeholder count. This also covers element
spreads like map (whose signature and strict flag require Python 3.14):
$ optype infer "lambda f, *args: f(*args)"
[T, R](f: (*tuple[T, ...]) -> R, *args: T) -> R
$ optype infer "map"
[T, U, R]((T, *tuple[U, ...]) -> R, CanIter[CanNext[T]], *iterables: CanIter[CanNext[U]], strict: CanBool = False) -> map[R]
Parameter defaults
PEP 696 type parameter defaults are used when appropriate:
A parameter without its own typevar shows the default inline instead:
When omission behaves differently, such as when the function branches on the default, the call without the argument is reported as a separate overload:
$ optype infer "def f(x=None): return [] if x is None else x"
(x: None = None) -> list[Never]
[T: ~None](x: T) -> T
The ~None complement makes the overloads disjoint: the first one covers f() and
f(None), and the second one everything else. Like &, the ~ is not valid Python;
in practice it's fine to omit it, as overloads are matched in order anyway.
A Python 3.15+ sentinel is its own type per
PEP 661, spelled as its declared name, so the
common sentinel-default pattern renders just like the None default above:
$ optype infer "MISSING = sentinel('MISSING')
def f(x=MISSING): return [] if x is MISSING else x"
(x: MISSING = MISSING) -> list[Never]
[T: ~MISSING](x: T) -> T
Deprecation
A call that emits a PEP 702 DeprecationWarning is
reported with an @deprecated marker carrying the warning's message. This covers both the
@deprecated decorator and a plain warnings.warn(..., DeprecationWarning) in the body:
$ optype infer "from warnings import deprecated
@deprecated('Use bar instead')
def foo(x): return x + 1"
@deprecated('Use bar instead')
[R](x: CanAdd[Literal[1], R]) -> R
The warning has to actually fire, so only the overloads that raise it are marked. Omitting a default that takes a quiet branch leaves that overload unmarked:
$ optype infer "import warnings
def f(x, y=None):
if y is None: return x + 1
warnings.warn('y is deprecated', DeprecationWarning)
return x + y"
[R](x: CanAdd[Literal[1], R], y: None = None) -> R
@deprecated('y is deprecated')
[T: ~None, R](x: CanAdd[T, R], y: T) -> R
@deprecated('y is deprecated')
[T, R](x: T, y: CanRAdd[T, R] & ~None) -> R
Only DeprecationWarning is recognized (not PendingDeprecationWarning, nor a
@deprecated(..., category=...) override), and a numpy ufunc is never marked.
Methods
Anything callable can be inferred: not just functions, but also builtins (like
math.sqrt above), callable instances, and unbound method descriptors. A
positional-only parameter cannot be passed by keyword, so it renders as a bare type
without its name. A method descriptor's self requires a real instance of its
defining class, so it is reported as that concrete type:
$ optype infer "str.upper"
(str) -> str
$ optype infer "dict.get"
[T = None](dict, CanHash, T = None) -> T
When a builtin rejects the recording proxy for a defaulted parameter, its default is passed instead and the parameter is pinned to it, while the accepting parameters stay structural:
Context managers
A with statement requires __enter__ and __exit__ together, which optype combines
as CanWith; the __exit__ result is unused, so it stays object:
Async
Coroutine functions are run to completion, so await, async with, and async for are
traced like their synchronous counterparts:
$ optype infer "async def f(x): return await x"
[R](x: CanAwait[R]) -> R
$ optype infer "async def f(xs): return [x async for x in xs]"
[R](xs: CanAIter[CanANext[CanAwait[R]]]) -> list[R]
async with combines __aenter__ and __aexit__ as CanAsyncWith, whose parameters
are the awaited results:
>>> async def f(x):
... async with x as y:
... return y
>>> infer(f)
'[R](x: CanAsyncWith[R, object]) -> R'
Generators and lazy iterators
A generator is lazy, so it is iterated to collect the types it yields:
$ optype infer "def f(n): yield from range(n)"
(n: CanIndex) -> Generator[int]
$ optype infer "def f(): yield None; yield 1"
() -> Generator[None | int]
$ optype infer "async def f(xs): return (x async for x in xs)"
[R](xs: CanAIter[CanANext[CanAwait[R]]]) -> AsyncGenerator[R]
Lazy builtin iterators (map, filter, zip, and enumerate) are iterated the same
way, and a callable argument is traced right through them:
$ optype infer "lambda x: map(str, x)"
(x: CanIter[CanNext[CanStr]]) -> map[str]
$ optype infer "lambda f, x: map(f, x)"
[T, R](f: (T) -> R, x: CanIter[CanNext[T]]) -> map[R]
zip tuples its iterables together, so the element type tracks how many there are: a
fixed call is a fixed-length tuple, and the variadic builtin is a homogeneous
tuple[R, ...]:
$ optype infer "lambda x, y: zip(x, y)"
[R, R2](x: CanIter[CanNext[R]], y: CanIter[CanNext[R2]]) -> zip[tuple[R, R2]]
$ optype infer "zip"
[R](*iterables: CanIter[CanNext[R]], strict: CanBool = False) -> zip[tuple[R, ...]]
Iteration yields two elements, so a callable that compares them, like sorted or max,
traces the comparison too:
$ optype infer "lambda xs: sorted(xs)"
[R: CanLt[R, CanBool]](xs: CanIter[CanNext[R]]) -> list[R]
$ optype infer "lambda xs: max(xs)"
[R: CanGt[R, CanBool]](xs: CanIter[CanNext[R]]) -> R
Unpacking
An unpacking iterates the parameter, and every target shares one element type, like
*args; a starred target collects the rest into a list:
$ optype infer "def f(x): a, b = x; return a, b"
[R](x: CanIter[CanNext[R]]) -> tuple[R, R]
$ optype infer "def f(x): a, *b = x; return a, b"
[R](x: CanIter[CanNext[R]]) -> tuple[R, list[R]]
$ optype infer "lambda x: {k: v for k, v in x}"
[R: CanHash](x: CanIter[CanNext[CanIter[CanNext[R]]]]) -> dict[R, R]
A placeholder iterator yields two elements, so a two-target unpack (and a starred one) is
covered, but a fixed unpack of three or more targets cannot be satisfied and raises an
InferError.
Returned functions
A returned function is lazy too, so it is explored with placeholders of its own, and its type renders in the same signature syntax:
An operation inside the returned function is required of the closed-over parameter, and overloads reflect as usual:
$ optype infer "lambda x: lambda y: x + y"
[T, R](x: CanAdd[T, R]) -> (y: T) -> R
[T, R](x: T) -> (y: CanRAdd[T, R]) -> R
This applies to anything function-like: builtins, method descriptors, and a
functools.partial (explored with its bound arguments in place), also from within a
returned container:
$ optype infer "lambda: str.upper"
() -> (str) -> str
$ optype infer "import functools
def add(x, y): return x + y
lambda: functools.partial(add, 1)"
[R]() -> (y: CanRAdd[Literal[1], R]) -> R
$ optype infer "def counter(start): return (lambda: start), (lambda by: start + by)"
[T: CanAdd[U, R], U, R](start: T) -> tuple[() -> T, (by: U) -> R]
[T, R](start: T) -> tuple[() -> T, (by: CanRAdd[T, R]) -> R]
A recursive function (factory) has an inexpressible type, so it stays an opaque
function, as does one with variadic parameters:
Containers
Element types are tracked through the containers that hold them, so a typevar can surface at any depth, on either side of the signature:
$ optype infer "lambda x: (x + 1, x + 1)"
[R](x: CanAdd[Literal[1], R]) -> tuple[R, R]
$ optype infer "lambda x: {0: [v + 1 for v in x[0]]}"
[R](x: CanGetitem[Literal[0], CanIter[CanNext[CanAdd[Literal[1], R]]]]) -> dict[Literal[0], list[R]]
The Python 3.15+ frozendict parametrizes like dict does:
$ optype infer "lambda x: frozendict({'k': x + 1})"
[R](x: CanAdd[Literal[1], R]) -> frozendict[Literal['k'], R]
A container that holds itself is a recursive type. The cycle is detected by identity and tied off as a typevar bounded by its own structure:
Unions
A union member that is a subtype of another member is absorbed into it, following the
runtime subclass relations, such as bool <: int and FileNotFoundError <: OSError:
$ optype infer "def f(): yield True; yield 1"
() -> Generator[int]
$ optype infer "def f(): yield FileNotFoundError(); yield OSError()"
() -> Generator[OSError]
PEP 484's int <: float <: complex numeric tower is a static-typing fiction with no
runtime counterpart, so it is not applied:
Variance is respected: a covariant tuple simplifies, but an invariant list does not:
$ optype infer "lambda x: (OSError(), 'a') if x else (FileNotFoundError(), 'a')"
(x: CanBool) -> tuple[OSError, Literal['a']]
$ optype infer "lambda x: [FileNotFoundError()] if x else [OSError()]"
(x: CanBool) -> list[FileNotFoundError] | list[OSError]
NumPy
Info
NumPy is not a required dependency, and optype infer works fine without it
installed.
A ufunc requires each operand to
either override it (NEP 13's CanArrayUFunc) or be an array-like of its widest accepted
dtype (read from its .types):
A NEP 18 function such as
np.mean requires the CanArrayFunction override:
Limitations
infer calls the function, so it only works on functions that are safe to run with
placeholder arguments (no real side effects, no reliance on concrete values). This
extends to anything it returns: a returned function is called with placeholders of its
own, and a returned lazy iterator is iterated. A returned function that raises during
this exploration is not treated as an error: its type stays an opaque function.
A single-parameter function that dispatches on an attribute's presence (hasattr,
getattr with a default, or try/except AttributeError) is explored again with the
attribute forced absent, to reach the branch a placeholder otherwise hides. Because the
attribute is tolerated rather than required, the parameter widens past Has[...]. When
the return ignores the attribute's value, a single overload covers it, its return the
union of both branches (so a hasattr predicate accepts any object and returns bool);
when the present branch returns the value, that overload stays over an object fallback
(the only sound supertype of an arbitrary present return):
>>> print(infer(lambda x: hasattr(x, "spam")))
(x: object) -> bool
>>> print(infer(lambda x: 0 if hasattr(x, "spam") else 1))
(x: object) -> int
>>> print(infer(lambda x: getattr(x, "spam", None)))
[R](x: Has['spam', +R]) -> R
(x: object) -> object
Anything else keeps the strict baseline that requires the attribute: more than one
parameter, several presence-tests at once, a present branch that needs more than the
attribute itself, or an absent branch that can't run on placeholders (as with dict).
When infer can't handle the input, it raises InferError (a NotImplementedError
subclass). That's the case for operations without a matching protocol, such as an
attribute access whose name is not statically known (a computed getattr); for arguments
that aren't callable to begin with; and for a builtin whose parameters inspect.signature
cannot recover (such as iter, max, or type itself, though type(x) within a
function is fine). A function that never runs to completion, such as lambda: 0 / 0,
raises InferError chained from the triggering exception (__cause__).
Variadic parameters are explored with a few placeholders, retried with more after an
out-of-range index, a failed unpacking, or a missing **kwargs key, and reported as an
InferError once the budget runs out. The placeholders remain observable: len(args)
reports their count, and args and kwargs are never empty.
The number of explored branches is capped, so a function with many of them gets a
signature that only covers the explored ones, along with an InferWarning that names the
gap and the affected call form. The CLI prints it to standard error; strict=True raises
an InferError instead.
It can only observe operations that go through a dunder method. Anything that inspects a
parameter at the C level is invisible, so a parameter passed to id(), isinstance(),
or an identity check (is) is reported as object rather than its real requirement. The
same blind spot hides a branch on a derived integer: len, int, and index return a
small placeholder, so len(x) > 5 is never explored.
Generic bounds
An inferred typevar bound can itself be generic, such as [T: CanAdd[T, R], R] where
T's bound references T and R. Python's type system does not currently support
generic typevar bounds, so these signatures are not always valid Python.