Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

NEP: add NEP on a Python API cleanup for NumPy 2.0 #23537

Merged
merged 11 commits into from
May 29, 2023

Conversation

rgommers
Copy link
Member

@rgommers rgommers commented Apr 6, 2023

This is still draft, but close enough for now. It was discussed in the NumPy 2.0 developer meeting a few days ago, with mostly positive feedback.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this very much. Some comments in-line, as well as suggestions that perhaps go too far given the wish to ensure that most code that runs on 2.0 would also run on 1.2*

# Legacy (prefer not to use). Third grouping in the reference guide.
numpy.char
numpy.distutils
numpy.matrixlib # or deprecate?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We put in a PendingDeprecationWarning for 1.15 (#10142) - I'd say it could go to a real deprecation soon and then be removed for 2.0...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at this, it's np.matrix that has the PendingDeprecationWarning. That's a topic that may be unblocked soon, because there's work on shuffling around *_matrix and *_array in scipy.sparse. But it is unrelated to removing the matrixlib namespace, because that namespace is pretty pointless, its contents are already imported into the main namespace with from .matrixlib import *.

doc/neps/nep-0052-python-api-cleanup.rst Show resolved Hide resolved
numpy.exceptions
numpy.fft
numpy.linalg
numpy.ma
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numpy 2.0 might be the last chance to really think through what MaskedArray should be like... If unchanged, perhaps putting it under legacy makes more sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'd really like to see @ahaldane's rewrite included for 2.0. That needs a champion to move it forward though. The current numpy.ma isn't salvageable I'd say, and if we keep it then I agree with you that "legacy" makes sense. We're then effectively abandoning masked data in favor of proper missing data support as provided by dataframe libraries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @ahaldane and I can have a joint look; my sense is that some of the API really has to be different. Note that I ended up rewriting masked data for astropy, as we really need to be able to have proper masked subclasses; see https://docs.astropy.org/en/latest/utils/masked/index.html. I think my scheme has a bit more hope of being useful also for different array implementations (say, masked dask), though I have not actually tested that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got some more feedback on legacy status from numpy.ma during the community meeting. After some more discussion on what legacy means (de-emphasize, move to a different place in the docs, suggest more appropriate alternatives - but no deprecation warnings) people were fine with that. Everyone agreed that numpy.ma is not in good shape, and it doesn't have a maintainer; a replacement is very much welcome (inside or outside NumPy).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved numpy.ma to the legacy section now.

doc/neps/nep-0052-python-api-cleanup.rst Outdated Show resolved Hide resolved

.. note::

TBD: will we preserve ``np.lib`` or not? It only has a couple of unique
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but it would help here to know where would things like arraysetops and nanfunctions would live (I realize this is not so important from a user perspective, since the functions are in the top namespace, but it is probably also good to think from the developer perspective).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that they have to move somewhere, which is a single git mv plus use of git-blame-ignore-revs to not mess up the usefulness of git blame. Where it ends up exactly is less important to me, at least from the perspective of this NEP. It could be:

  • numpy/_lib/nanfunctions.py
  • numpy/lib/_nanfunctions.py
  • or something more different than adding a single underscore

doc/neps/nep-0052-python-api-cleanup.rst Outdated Show resolved Hide resolved
doc/neps/nep-0052-python-api-cleanup.rst Outdated Show resolved Hide resolved
doc/neps/nep-0052-python-api-cleanup.rst Outdated Show resolved Hide resolved
.. code:: python

>>> # np.intp is different, but compares equal too
>>> np.int64 == np.int_ == np.dtype('i8') == np.sctypeDict['i8']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always disliked this mixing up of a data type with a scalar variable class of that data type. If we can rid of those... (and perhaps have definitions for the standard dtypes, i.e., f8 = np.dtype('f8')? see below too)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like __eq__ on dtypes to do the expected thing for Python objects indeed.

Less sure about f8 = np.dtype('f8') - I dislike the character codes a lot, they're too obscure. Scalars will hopefully go away completely at some point (they're not necessary for anything, and everyone seems to be in favor of them being removed eventually, it's just too big a lift for 2.0 it looks like).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'd like is clear principles, like:

  • We have two sets of names for things:
    • the "bit-suffixed names" like int32, float64 - with canonical names like bool, void, object added to that
    • the "C names" like short, int, double, clongdouble, etc.
  • Other aliases that don't fall into these categories should either be removed, or de-emphasized in case we need them for backwards compatibility
  • Avoid creating or using array scalars by instantiating them with np.dtype directly as much as possible, it's typically an anti-pattern

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm super-used to using the string variants, but definitely can see your point. The annoyance is that type(int32) is not actually a dtype subclass. Anyway, that's for #23358; agreed here that a minimal set is a good idea.

Copy link
Contributor

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting stuff! Not much to comment ATM, just a typo.

doc/neps/nep-0052-python-api-cleanup.rst Outdated Show resolved Hide resolved
@rgommers
Copy link
Member Author

I propose to merge this with Draft status. It's in pretty good shape, and it'd be nice to make it more visible and have the rendered html version to point at.

@WarrenWeckesser
Copy link
Member

I propose to merge this with Draft status.

According to NEP 0, before merging, the next step is to announce the NEP on the mailing list.

@rgommers
Copy link
Member Author

Thanks Warren, good point - I thought I did that already last month, but it appears that my memory is faulty. Will write an email now.

doc/neps/nep-0052-python-api-cleanup.rst Outdated Show resolved Hide resolved
doc/neps/nep-0052-python-api-cleanup.rst Outdated Show resolved Hide resolved
numpy.ctypeslib
numpy.emath
numpy.f2py # only a couple of public functions, like `compile` and `get_include`
numpy.lib.stride_tricks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about moving stride_tricks to numpy.stride_tricks, and considering the rest of numpy.lib legacy stuff?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a reasonable option; there's a few more public things in numpy.lib though:

I think that's all, but hard to be 100% sure. It's hard to find good places for each of them, so I had in mind to keep np.lib as a grab bag of stuff that does not deserve its own namespace right below the main namespace.

We can probably get rid of Arrayterator; I'm not sure about the rest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the list above to the note. I'd like to resolve this later (post-merge of this NEP as Draft). Leaning towards keeping numpy.lib, but either way is not ideal.

@rgommers
Copy link
Member Author

I addressed the few comments that came in after posting the draft NEP to the mailing list two weeks ago; I think it'd be good to merge this with Draft status now.

@mattip mattip merged commit fa284a5 into numpy:main May 29, 2023
@mattip
Copy link
Member

mattip commented May 29, 2023

Merged. It would be good to have a follow-on PR that

  • adds a link to the mailing list announcement/discussion
  • tighten up the language a bit in the "Cleaning up the main namespace" section, there is a mixture of "can be removed", "will be removed", and "New namespaces are introduced"

@mattip
Copy link
Member

mattip commented May 29, 2023

Thanks @rgommers

@rgommers rgommers deleted the nep-python-api branch June 13, 2023 13:06
@rgommers
Copy link
Member Author

Merged. It would be good to have a follow-on PR that

Thanks for the suggestions Matti. I'll take this along soon. It won't be the last update of this NEP.

@leofang
Copy link
Contributor

leofang commented Jun 14, 2023

Q: Is there a general announcement/milestone post/issue tracker that we can link to regarding NumPy 2.0? Would be useful to refer to it in this NEP and elsewhere 🙂

@rgommers
Copy link
Member Author

Q: Is there a general announcement/milestone post/issue tracker that we can link to regarding NumPy 2.0? Would be useful to refer to it in this NEP and elsewhere slightly_smiling_face

Good question @leofang. There isn't at this point - only a project board (https://github.com/orgs/numpy/projects/9, not useful for this purpose) and a scattered number of proposals, mailing list threads, meeting notes, etc.

I think we could use a pinned issue for this, since there's bound to be more and more questions like these. A problem of course is that it may also attract a lot of random questions, at which point it becomes too noisy. We could perhaps do it as a locked pinned issue, purely meant for maintainers to unlock, post update, and lock again. Pointing to relevant updates, docs, etc. as they get posted. WDYT?

@seberg
Copy link
Member

seberg commented Jun 19, 2023

Might as well do that for now just to have something brief somewhere.

@rgommers
Copy link
Member Author

Tracking issue created: #24300

@bsipocz
Copy link
Member

bsipocz commented Aug 30, 2023

I was recently linking to this NEP as a reference for the upcoming 2.0 changes, and noticed it's still in "Draft" state. I wonder whether it could be updated to accepted as the implementations are basically well underway?

(I suppose more NEPs in the "Draft" status could be updated, but that's a bigger scope than this comment, and I'm happy to open an issue for it).

@rgommers
Copy link
Member Author

xref gh-24620 to mark this NEP as Accepted - still under review but should get merged soon.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.