-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
Typing support for shapes #16544
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
This will take quite some work here and in mypy. At a discussion held several weeks ago, the agreement was on the plan in the README. So while this will happen, adding support for checking shapes will not happen soon. There is much to decide, such as how to define syntax related to declaring array shape, and how an actual type checking implementation should work. |
@shoyer: I'm not sure where this should be noted, but the framing of numpy/numpy-stubs#6 and python/typing#516 seems to indicate that you're favoring putting the full shape in the type specification: that is, for a two-dimensional array of size (M,N), the proposal seems to be that the type annotation would convey both the value of M and N. I think we should consider a more modest version in which only the number of dimensions is tracked via the type, but not the shape itself. That is, instead of |
Tracking only the rank (as opposed to the full shape) of the array through the type system is, for what it's worth, how all of the C++ tensor libraries work to the best of my knowledge. It's more consistent with the rest of the PEP484 container types as well, in the sense that the type |
@rmcgibbo Yes, this needs justification. My reasoning for favoring specifying shapes in types comes down to:
With regards to C++ libraries, Eigen at least does allow for either runtime and statically defined shapes. That said, I certainly would not bother with static/compile-time shapes on their own, because it is indeed rare that you know the exact shape of arrays before running a program. But with generics, we have the possibility of doing much more sophisticated pattern matching. This isn't feasible in C++, where you would need to use templates for every possible dimension size. In contrast, we can probably do something more sophisticated in Python to allow for typing shapes without enumerating all of them (similar to what you see in Haskell). |
Shape is clearly more expressive than rank. So in that sense, I agree with both of your bulleted points above (not mutually exclusive, and static shape checks, if they worked, would provide more value to users). There are two factors on the other side that come to mind.
Mypy tracks the type of M and N, but not their values. It can infer the rank of
Not going to work.
|
Take another look at my doc on shape typing. You're right that functions like
I'm not certain that mypy will be unable to infer shapes in this case. If typing adds support for literals (which is almost certainly necessary regardless for NumPy), then I imagine it will also support literals in variables, in the same way that it supports type aliases. |
My suspicions are that shape tracking is (1) going to add a small marginal value, since it will be rarely be possible to statically infer them (e.g. my two examples above) in real world code, whereas it will often be possible to statically infer rank in real world code; and (2) going to add a ton of complexity. That suggests to me it's not the best place to start for the "1.0". If there's a way to design the syntax so that it is initially rank-only, build a viable package around that, and add shape information afterwards in a second stage, then that would be my preference. |
Maybe for v1.0 we could have Edit: It might also be a good idea to have a warning if the shape contains anything other than |
Yes, I agree that it could make sense to add shape support incrementally. We'll probably still need upstream work in typing even to make that work well, though (e.g., to support integer values in types). |
It seems like there is some interesting work being done on variadic generics: Depending on how this progresses from here it is possible we could end up with either |
It seems like we now have a (draft) pep for variadic generics, |
I think we're almost there now with PEP 646. One concern I have, though, is that a related PEP (PEP 637, indexing with keyword arguments) was just rejected, and in some discussion on python-dev, Guido van Rossum noted:
I therefore wanted to ask: What's the current feeling around how likely NumPy would be to use PEP 646 if it were accepted? A vote of support from NumPy folks would be a good sign that we're on the right track. (On the other hand, if the answer is "We're not sure enough yet about how shape-typing as a whole would work to want to commit to anything", that would be a very reasonable answer too - and would suggest that perhaps we need to do more prototyping first.) |
Realized the other day that most of what is wanted here can already be supported using For example, one could type-hint from typing import Tuple, TypeVar, overload
import numpy as np
from numpy.typing import DType, NDArray
T1 = TypeVar("T1", bound=int)
T2 = TypeVar("T2", bound=int)
T3 = TypeVar("T3", bound=int)
DTypeVar = TypeVar("DTypeVar", bound=DType)
@overload
def matmul(x1: NDArray[Tuple[T1], DTypeVar], x2: NDArray[Tuple[T1], DTypeVar], /) -> DTypeVar:
...
@overload
def matmul(x1: NDArray[Tuple[T1], DTypeVar], x2: NDArray[Tuple[T1, T2], DTypeVar], /) -> NDArray[Tuple[T2], DTypeVar]:
...
@overload
def matmul(x1: NDArray[Tuple[T1, T2], DTypeVar], x2: NDArray[Tuple[T2], DTypeVar], /) -> NDArray[Tuple[T1], DTypeVar]:
...
@overload
def matmul(x1: NDArray[Tuple[T1, T2], DTypeVar], x2: NDArray[Tuple[T2, T3], DTypeVar], /) -> NDArray[Tuple[T1, T3], DTypeVar]:
... If it works with HEIGHT = Literal[500]
WIDTH = Literal[1000]
x: NDArray[Tuple[HEIGHT, WIDTH], int] = ...
A: NDArray[Tuple[WIDTH, WIDTH], int] = ...
y: NDArray[Tuple[HEIGHT, WIDTH], int] = matmul(x, A) To make it more readable, one could also alias x: NDArray[Shape[Len[500], Len[1000]], int] Although a little clunkier, it avoids the issue of type-checkers failing to recognize constants like: HEIGHT = 500
WIDTH = 1000
x: NDArray[[WIDTH, HEIGHT], DType] It's also probably easier to create generics like this using # dataset[index][row][col]
dataset: NDArray[Tuple[AnyLen, HEIGHT, WIDTH], DType] Semantically I would also argue that it makes a lot of sense, since this is the literal type-hint for |
variadic generics (PEP 646) was accepted on 19. Jan 2022: https://mail.python.org/archives/list/python-dev@python.org/message/OR5RKV7GAVSGLVH3JAGQ6OXFAXIP5XDX/ variadic generics support in typing_extensions seems to have been merged 7 days ago: python/typing#963 Does that mean it's possible for numpy to finally implement support for this now? 🤑 |
Is there an update on this, maybe an expected release date?
|
type NDArray[T: np.generic] = np.ndarray[Any, np.dtype[T]] https://github.com/numpy/numpy/blob/v1.26.4/numpy/_typing/_array_like.py#L32 |
I've been thinking about this for a while now, and I believe that the following proposal could work. For the sake of brevity and readability, I'll be using the Python 3.12+ PEP 695 syntax. Additionally, I'll prefix type parameter declarations with either a The
|
@jorenham, good proposal, however consider annotating types should be simple, easy and fast: def convert_image_rgb8_grayscale1[W, H](image: Array[H, W, 3, uint8]) -> Array[H, W, uint1]:
return (image.sum(axis=2)/3).astype(uint1) |
Is it not possible to keep the parentheses around the shape dimensions? ( |
I just checked and that syntax is not allowed, so it should be |
Ive checked and there is not syntax error. Perhaps your met Array restrictions? Array = ... # define type
def f[W,H](a: Array[(H,W,3),int]):... |
I did not mean a syntax error from Python (which does not care about annotations), but from MyPy. |
Ah, sure. We can discuss anything interface here but these all breaks into mypy wall. Tbh, I dont track and know current way to achieve |
Edit: Forgot the new 3.12 syntax. FWIW, here's the current code in pre-3.12 syntax: _ShapeType = TypeVar("_ShapeType", bound=Any)
_DType_co = TypeVar("_DType_co", covariant=True, bound=dtype[Any])
class ndarray(_ArrayOrScalarCommon, Generic[_ShapeType, _DType_co]): If you're talking about modifying
I'd also add that stepping down this path could help illuminate the right way to resolve a lot of the "left for the future" stuff in PEP 646. EDIT: It would be nice to understand how/whether this applies to record arrays and masked arrays, which I don't have much experience with. |
There appears to be some confusion about the PEP 695 syntax, and variadic typing parameters (i.e. @baterflyrity You use an integer directly in your example, that is not allowed. Instead, a @vnmabus You correctly noticed that using a @Jacob-Stevens-Haas I agree that this proposal highlights several missing features within python typing. But I'd prefer to tackle one problem at a time (I know from experience how deep the typing PEP proposal rabbithole goes). |
I fully agree - I meant only that taking these small steps would, as a bonus, help make future PEP considerations more clear, i.e. by providing examples of |
On older numpy versions, np.ndarray was less forgiving and wasn't allowing passing 1 argument instead of required 2. And turned out numpy doesn't yet have typing for shapes (numpy/numpy#16544), so all matrices and other shapes specified as `npt.NDArray[np.float64]`. Fixed type discrepancies for `get_edges` and `get_faces` and also had to fix `import_ifc` as Blender apparently has problems with storing np.int32 in custom attributes (https://projects.blender.org/blender/blender/issues/121072), tested that Blender is okay with np.int32 in other cases we had (addressing BMesh.verts[i] where `i` is np.int32).
Following the example provided by @jorenham, I tried this: Array2x2uint8: TypeAlias = np.ndarray[tuple[Literal[2], Literal[2]], np.dtype[np.uint8]] It works well with my_array: Array2x2uint8 = np.empty((2, 2), dtype=np.uint8) But when I use my_array: Array2x2uint8 = np.array([[1, 2], [3, 4]], dtype=np.uint8) Type checkers are not happy: Pyright:
MyPy:
Any chance that this will word in the future? Which workaround could I use for now? |
Typing support is currently experimental, so use it as your own risk.
We're currently actively working on implementing shape-typing support in numpy. |
Sorry if I missed it somewhere in this thread, but is there a summary of how much of this is expected to work with NumPy 2.2.x? Here's a program I was hoping would work, but with NumPy 2.2.4 and basedpyright the variable import numpy as np
x = np.zeros(3)
reveal_type(x)
y = x[0]
reveal_type(y) $ basedpyright test_type_check_ndarray.py
test_type_check_ndarray.py
test_type_check_ndarray.py:4:13 - information: Type of "x" is "ndarray[tuple[int], dtype[float64]]"
test_type_check_ndarray.py:5:1 - warning: Type of "y" is Any (reportAny)
test_type_check_ndarray.py:6:13 - information: Type of "y" is "Any"
0 errors, 1 warning, 2 notes
$ conda list numpy
numpy 2.2.4 py312h72c5963_0 conda-forge
$ conda list basedpyright
basedpyright 1.28.3 pyh29332c3_0 conda-forge |
From #16544 (comment):
... so in other words: Shape typing isn't expected to work with NumPy 2.2 :) To cheekily quote myself from that same post:
And that's still very much the case.You can follow the development at numpy/numtype |
See how contracts package are trying to provide support for something similar to shapes. They are extending annotations in a different way than just standard typing and maybe something like that could be also done. So instead of providing specific extension (PEP) for typing to allow thing like shapes, it might be maybe more useful to determine a syntax for general constraints on types and use that, in addition to standard types through typing.
The text was updated successfully, but these errors were encountered: