Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

APE 22: Public API #85

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

APE 22: Public API #85

wants to merge 10 commits into from

Conversation

nstarman
Copy link
Member

@nstarman nstarman commented Apr 28, 2023

Up for discussion!

Very much a work in progress. Hopefully refined by discussion at the upcoming conference.

Signed-off-by: nstarman <nathanielstarkman@gmail.com>
Copy link
Member

@astrofrog astrofrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments/questions so far - one of the main things that bothers me in general and that I don't think there is a solution to is that I can do e.g.:

from astropy.cosmology.realizations import ScienceState

Obviously this isn't public API, and ScienceState isn't in __all__, but there's not real way to prevent users from relying on it, and we can't rename all imports in a module to include a _ prefix. So I think that we should probably also make it so that our API docs are also an authoritative source of public API and ensure that it's consistent with the other rules. We should also make sure that all public APIs are documented in the docs (we don't check that this is the case right now).

APE_public.rst Outdated
public API." SciPy allows modules to lack an ``__all__`` attribute, meaning a
user and their tools must understand the nuances of the previous rules. Having
an ``__all__`` attribute in every module is simpler, unambiguous, and better for
introspection by both users and automated systems.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth mentioning at this point how a user would know that a function/class is in __all__. Is it not possible to explicitly import functions/classes in a module that are not in __all__? If so then having something in __all__ does not stop e.g. CoPilot from suggesting code that uses a function not in __all__.

APE_public.rst Outdated
2. All modules must have an ``__all__`` attribute, even if it is empty. The
``__all__`` attribute defines the public and private interface of the module
in which it is defined. Anything in ``__all__`` is public, including
underscore-prefixed symbols. Anything not in ``__all__`` is private in that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although I think we should probably disallow underscore-prefixed symbols in __all__?

Copy link
Member Author

@nstarman nstarman May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that practically there should not, but I strongly think __all__ should be definitive, so if we have an underscore-prefixed object that we want to make public the mandatory steps are (in this order):

  1. put it in __all__.
  2. update the docs to reflect __all__.
  3. super strongly encouraged to remove the underscore prefix (updating steps 1 & 2).

This is adopting the Scipy disambiguation of PEP 8, adding primarily the mandate that empty __all__ be included in modules with no Public API.

APE_public.rst Outdated Show resolved Hide resolved
APE_public.rst Outdated Show resolved Hide resolved
APE_public.rst Outdated
- Clearly state if a documented object is actually private.
3. **Add prefixes**: 1. Add prefixes to all modules that are not public. 2. Add
prefixes to all classes, functions, and attributes that are not public.
:note:`I'm less enthusiastic on this point.`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of this as long as - as described above - we don't need to add underscore to symbols that are already in an underscore module for instance (so e.g. nothing in astropy.io.fits._tiled_compression needs to have an underscore prefix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with that. I have reservations about going all in on underscore prefixing everything. Not needing to underscore symbols in private modules (which still have an __all__) SGTM.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astrofrog, resolve?

@astrofrog
Copy link
Member

astrofrog commented Apr 28, 2023

After further thought, I think what is going to be really important here is to define what public API is from the perspective of a user - that is, a user won't know what __all__ is, so when we communicate with a user, should we tell them that public API is anything in the API docs, or e.g. anything that can be accessed through tab completion in IPython and which does not have _ prefixes anywhere?

@saimn
Copy link

saimn commented Apr 29, 2023

The problem with __all__ is that it only affects import * (and for a long time it was it's only meaning, the pep8 section about public/private interface was added later, python/peps@7dba60c). So it doesn't prevent importing from a module, and autocompletion doesn't use it.
So the only way to enforce public/private API is by renaming private modules with an underscore. That's what Scipy did.

@saimn
Copy link

saimn commented Apr 29, 2023

What do you propose for submodules that currently define __all__ but from which people should not import directly, e.g. astropy.convolution.core (and many others). This is the scheme that is mostly used currently in Astropy.

@astrofrog
Copy link
Member

astrofrog commented Apr 29, 2023

Maybe in that case .core should technically be private? (._core)

@nstarman
Copy link
Member Author

nstarman commented Apr 30, 2023

Maybe in that case .core should technically be private? (._core)

👍. That is the suggestion of PEP 8 -- and that .core should also have a blank __all__ = [].

@nstarman
Copy link
Member Author

nstarman commented Apr 30, 2023

What do you propose for submodules that currently define __all__ but from which people should not import directly, e.g. astropy.convolution.core (and many others). This is the scheme that is mostly used currently in Astropy.

So this would be one of the largest changes to Astropy. PEP8 and thus Scipy and static typing all say that __all__ refers to what is public in that module. With this in mind, doing

# __init__.py
# No __all__ is defined
from .core import *
# core.py
__all__ = ["Foo"]

class Foo: ...

Means that Foo is public in astropy.convolution.core and private in astropy.convolution. This is contrary to how Astropy intends, where we are saying __all__in astropy.convolution.core means that it is actually private in astropy.convolution.core and public in astropy.convolution. This is confusing for many reasons. If we were to adopt PEP8 (as suggested in this draft APE) then the previous example would look like

# __init__.py
__all__ = ["Foo"]  # Foo is public in this module, even if it is defined elsewhere.
from .core import Foo
# core.py
__all__ = []  # nothing is public in this module. Please look elsewhere.

class Foo: ...

Essentially we need to move the contents of various __all__ to where the code is actually public, leaving behind empty __all__ to indicate where no code is public.
Caveat private modules defining non-empty __all__ is fine. This enables *-imports in the public modules.
Thanks @astrofrog for the clarification, which is now detailed in the APE.

Update

The better option is to rename core.py to _core.py and add an __all__ to __init__ like so.

# __init__.py
from . import _core, ...
from ._core import *
...

__all__ = [] + _core.__all__ + ...
# _core.py (formerly core.py)
__all__ = ["Foo"]

class Foo: ...

This still supports * imports if you want them and retains 100% unambiguity about what is public and where.

@nstarman
Copy link
Member Author

nstarman commented Apr 30, 2023

After further thought, I think what is going to be really important here is to define what public API is from the perspective of a user - that is, a user won't know what all is, so when we communicate with a user, should we tell them that public API is anything in the API docs, or e.g. anything that can be accessed through tab completion in IPython and which does not have _ prefixes anywhere?

I think Yes. To make sure we're on the same page, I think communicating it this way to users should be the logical consequence of the deeper rules:

  • That __all__ is authoritative
  • That we follow PEP 8 public versus internal interfaces, e.g. with undercore prefixes
  • That the docs are up-to-date.

Having all these rules means that a user can only get to public symbols though other public symbols -- that "anything that can be accessed through tab completion in IPython and which does not have _ prefixes anywhere" and "anything in the API docs" (unless explicitly stated) is public. It's not the source or our definition of public API, it's the consequence.

@nstarman
Copy link
Member Author

nstarman commented Apr 30, 2023

The problem with all is that it only affects import * (and for a long time it was it's only meaning, the pep8 section about public/private interface was added later, python/peps@7dba60c). So it doesn't prevent importing from a module, and autocompletion doesn't use it.
So the only way to enforce public/private API is by renaming private modules with an underscore. That's what Scipy did.

I agree, __all__ is not enough to prevent autocomplete, though some autocomplete does use __all__: see https://ipython.readthedocs.io/en/stable/config/options/terminal.html#configtrait-IPCompleter.limit_to__all__.
I also agree we should rename modules with an underscore, as part of adhering to PEP 8.
It should be noted that adding underscores does not actually "enforce" public/private API as Python does not have true language-level features for public vs internal interfaces. Like __all__, single underscores are convention and Scipy says that __all__ takes precedence over underscores. In this APE I propose that we adopt a Scipy-like rule set where __all__ takes precedence over underscores and we use both according to PEP 8.

@saimn
Copy link

saimn commented May 1, 2023

@astrofrog - Maybe in that case .core should technically be private? (._core)

If we want to control strictly what's public and private yes, basically all submodules should be private (renamed with underscore) and public API exported in the subpackages' __init__.py. That's what Scipy did.

@nstarman - So this would be one of the largest changes to Astropy.

With this solution yes, this would require a lot of changes, and may be painful to do. So I don't like this solution.
But you also seem to agree with renaming with underscores, though I think those are two different solutions.

To summarize:

  • Option 1: use __all__ = [] in submodules, and import explicitly public functions/classes in subpackages' __init__.py and list those in __all__.
  • Option 2: rename submodules with underscore, keep their list of public functions/classes in __all__ (a lot of them already have it) and just change the import in subpackages' __init__.py (from .module import *from ._module import *). That's what Scipy did. [1]

I don't like option 1 because it moves the list of exported functions from the module itself to another place, and requires a lot of changes which can be prone to errors. Option 2 is more reasonable.

Then as you say, the underscore prefix is also just a convention, but that's the closest thing to a private scope in Python's land. And autocomplete respect it, so when users browse the functions in their shell they will see only public API. As a users I never checked the content of __all__ of a package, I use autocompletion and the docs.

[1] They also kept module.py with deprecation warnings because there is a lot of code using import from e.g. scipy.ndimage.morphology instead of scipy.ndimage. We may want to do that in specific cases, but I don't think we would need it to do that in a systemic way.

@pllim pllim changed the title Public API APE 22: Public API May 2, 2023
@eteq
Copy link
Member

eteq commented May 2, 2023

(Some of this developed from discussion at the coordination meeting (including @nstarman, @saimn , @astrofrog , @tepickering, @pllim, @nden, @WilliamJamieson), although I don't think I can say that all of my points above are consensus of those folks, it's some mix of that and just straight up my own opinion.)

I very much agree with @astrofrog's point here:

After further thought, I think what is going to be really important here is to define what public API is from the perspective of a user - that is, a user won't know what all is, so when we communicate with a user, should we tell them that public API is anything in the API docs, or e.g. anything that can be accessed through tab completion in IPython and which does not have _ prefixes anywhere?

Which I think has up until this point (in an uncodified way) is that whatever the docs say is the public API. So I think it makes sense to codify that as the "true answer". @nstarman's point was that if we follow the rules here, it's the same between those, and that it's only an aberration if these are different. But I was/am concerned about the inevitable state when something isn't working right. So I think what we settled on is that we start by saying as of when this APE is accepted, the docs are the "true" public API, but this APE presents a plan to get to a state where the rules highlighted in this APE lead naturaly to the docs just reflecting the same thing as these rules.

I still personally think we should have it true that the final source of authority is the documentation, because that's more user-facing of a contract, as @astrofrog says. But I think if we say that's the reality now, and we might re-visit it after this APEs plan is implemented, that's a reasonable compromise.

Two more opinions to offer:

  • A consequence of this is that modules like astropy/quantity/quantity.py become astropy/quantity/_quantity.py. I think leading underscore module names are ugly. That's subjective, but still.
  • Another consequence is that the actual public API __all__ s are in different files than the thing-to-be-made-public E.g., Quantity would be in astropy/quantity/__init__.py instead of astropy/quantity/_quantity.py. I don't like that because it means a small change in one file requires one to understand the full API structure to know which __all__ to add it to. I'm not sure that's annoying enough to justify changing anything, but it's a complaint I want to register and think about how we might get around it.

And one question:

  • Does this apply to coordinated packages in addition to the core? In principle it should, but that might be signing up for a lot more work because they are probably more of a mess than the core...

@pllim
Copy link
Member

pllim commented May 3, 2023

Does this apply to coordinated packages

I would say not. Even for things like removing astropy-helpers, it was a tedious campaign with writing up a transition guide and opening some PRs downstream. For things like this that would break API, it is a non-starter.

nstarman added 3 commits July 30, 2023 17:26
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
@nstarman nstarman requested a review from astrofrog July 31, 2023 00:44
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
@nstarman
Copy link
Member Author

@astrofrog @eteq @saimn @pllim, I've updated this APE based on the discussions we had at the conference an in this thread. LMK what you think!

nstarman added 2 commits July 30, 2023 21:32
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
@nstarman
Copy link
Member Author

nstarman commented Aug 1, 2023

What do you propose for submodules that currently define __all__ but from which people should not import directly, e.g. astropy.convolution.core (and many others). This is the scheme that is mostly used currently in Astropy.

@saimn I've been working on this over at astropy.cosmology. We've successfully transitioned .utils -> ._utils, io -> ._io, and I'm working on the rest. The code is clearer from a user's perspective since there's only one obvious place to import thing from and all the hidden modules aren't tab-completion discoverable.

@nstarman
Copy link
Member Author

nstarman commented Aug 1, 2023

  • Option 2: rename submodules with underscore, keep their list of public functions/classes in __all__ (a lot of them already have it) and just change the import in subpackages' __init__.py (from .module import *from ._module import *). That's what Scipy did. [1]

I also like Option 2 a lot. It works because _module is not made public in __init__, so even though _module defines an __all__ and makes it's contents locally / contextually public that is within a private module. It's impossible to publicly navigate to the contents of _module, only what is exported to __init__.

With this as the template, the example from #85 (comment) becomes

# __init__.py
__all__ = ["Foo"]  # Foo is public in this module, even if it is defined in `_core`, which is private.
from ._core import Foo
# _core.py
__all__ = ["Foo"]  # Foo is "public" in this module, but this module is private.

class Foo: ...

Another consequence is that the actual public API all s are in different files than the thing-to-be-made-public E.g., Quantity would be in astropy/quantity/init.py instead of astropy/quantity/_quantity.py. I don't like that because it means a small change in one file requires one to understand the full API structure to know which all to add it to. I'm not sure that's annoying enough to justify changing anything, but it's a complaint I want to register and think about how we might get around it.

@eteq, I believe @astrofrog's comment largely answers this question.

@nstarman
Copy link
Member Author

nstarman commented Aug 1, 2023

So I think what we settled on is that we start by saying as of when this APE is accepted, the docs are the "true" public API, but this APE presents a plan to get to a state where the rules highlighted in this APE lead naturaly to the docs just reflecting the same thing as these rules.

@eteq, I agree.

But I was/am concerned about the inevitable state when something isn't working right.

I added a section on a pre-commit CI check. It actually looks to be fairly simple to check that a public module has corresponding documentation since we have docs/api that collects our documented objects. I believe we can go further and make a two-way check to also check that something in docs/api is also in __all__.
Given all this, we can make it 🤞 impossible for the docs to not reflect the public API as defined in the code.

Copy link
Member Author

@nstarman nstarman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments.

not have a uniform and systematic approach to communicating what is public vs
internal, then we cannot expect users and especially their tools to know what is
public vs internal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From astropy/astropy#15169, I recall another source of issues that has bitten astropy in the past: I/O registration. When deep sub-modules look public, e.g. astropy.cosmology.flrw.lambdacdm.LambdaCDM then that is generally used as the class pathway in I/O rather than the better astropy.cosmology.LambdaCDM. When the class is moved it breaks the I/O and needs a whole backwards-compatibility logic to support files using the previous path. Cleaning up public vs internal, e.g. to astropy.cosmology._flrw.lambdacdm.LambdaCDM would make this an obviously bad way to serialize the class and code authors would get it "right" (astropy.cosmology.LambdaCDM) the first time. Clear public API makes for more stable code.

probably deserves a few sentences.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly orthogonal to the issue here: really, one needs to set __module__ to avoid this - registration just followed that, not a choice made by coders. Note that numpy in fact does this - which partially is annoying, since one no longer knows where the function is defined.

Copy link
Member Author

@nstarman nstarman Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me the pain point was YAML serialization because I used f"{class.__module__}.{class.__qualname__}", which looks reasonable as code, but led to the problems described above. If, when making the tests, I had noticed a private path I would have fixed the problem from the start.

I believe this applies more generally, if something is defined in a private module but imported to a public location then it may be imported from there again, e.g. as part of I/O. Serializing using only the public import location is much better than what I did, which seemed reasonable at the time. Setting __module__ is one way to force public locations in serialization (which I agree has issues); a path map, or str regex / replace operation are other means.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that one is an issue for examples and sphinx too - I really dislike that when we made representations a module (which was definitely a good idea!), we had to make so many updates to the docs. It would also be nice to use regex for things like __construct_mixin_classes

For me, this is still separate from the APE, since this has annoyed me amply already! But I can see that coming from an environment where files with leading underscores have more meaning, you could have avoided the issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed we can leave it out of this APE. I think the point I made "If, when making the tests, I had noticed a private path I would have fixed the problem from the start." means adopting this APE will automatically force most of these I/O issues to be fixed

APE22.rst Outdated Show resolved Hide resolved
APE22.rst Outdated Show resolved Hide resolved
APE22.rst Outdated Show resolved Hide resolved
APE22.rst Outdated Show resolved Hide resolved
Copy link

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks extremely solid and a well motivated solution to a real problem, so I 100% get behind this APE. Here are a couple, mostly superficial, remarks and suggestions.


author: Nathaniel Starkman

date-created: 2013 November 5 <replace with the date you submit the APE>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this date can be set already ?

APE22.rst Outdated Show resolved Hide resolved
APE22.rst Outdated Show resolved Hide resolved
APE22.rst Outdated Show resolved Hide resolved
Comment on lines +218 to +220
public. This means that a module can define an ``__all__`` attribute but if the
module itself is not public, then anything in the ``__all__`` attribute cannot
be publicly accessed from outside the module.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this sentence. Does this imply that any private module should set __all__ = [], or does it mean that __all__, within a private module, defines an interface that's meant for internal use only ? (now that I wrote it down, the latter seems way more likely, but I think there's room to improve the phrasing)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the latter. If you can't publicly navigate to a module, it doesn't matter what's "public" within that module. The full path must be public.
The issue with relying on this alone is what you discovered in that Issue (trying to find the one) where we have modules that look public (because they are not underscored) so it's possible to construct an import path that looks totally public, but actually isn't.

Copy link

@neutrinoceros neutrinoceros Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can't publicly navigate to a module, it doesn't matter what's "public" within that module.

I think that's what confused me; since it doesn't matter, why should we care what's in a private module's __all__ ? Wouldn't it be simpler to recommend/rule that __all__ must be set to [] in private modules ?

The issue with relying on this alone is what you discovered in that Issue (trying to find the one)

For reference, I believe you meant to link astropy/astropy#15666

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might still be useful to be able to indicate in a private sub-module which things are meant to be importable in other parts of astropy, so __all__ could be useful in that way rather than always setting to [].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! This is demonstrated in lines 281-287

Copy link
Member Author

@nstarman nstarman Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the examples of good module layouts.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think I understand and agree with everything said in this thread, but I still can't wrap my head around the phrasing. I think my main issue is that I don't know what's the technical definition for "publicly accessed"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astrofrog, see https://github.com/astropy/astropy/pull/15109/files as an example doing what you're talking about. I agree, this is super useful.

APE22.rst Outdated Show resolved Hide resolved
APE22.rst Outdated Show resolved Hide resolved
APE22.rst Outdated Show resolved Hide resolved
APE22.rst Show resolved Hide resolved
APE22.rst Show resolved Hide resolved
Copy link
Member Author

@nstarman nstarman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @neutrinoceros!

APE22.rst Show resolved Hide resolved
Co-authored-by: Clément Robert <cr52@protonmail.com>
@nstarman
Copy link
Member Author

nstarman commented Dec 6, 2023

@eerovaher @WilliamJamieson @taldcroft @eteq @pllim, I would appreciate some more eyes on this APE, if you have the time in this busy season. As we often follow the lead of NumPy I think this APE is very timely given their refactor to quite nearly follow this APE.

@pllim
Copy link
Member

pllim commented Dec 7, 2023

@nstarman , not sure if I have time to ponder this soon. Can this wait till the Coordination Meeting or is that too long to wait?

@nstarman
Copy link
Member Author

nstarman commented Dec 8, 2023

@pllim, it can definitely wait to be approved. I'm not sure who would have the time to lead this effort if this APE were accepted sooner. But it would be great to have this be essentially finalized by the time of the Meeting.

Signed-off-by: nstarman <nstarman@users.noreply.github.com>
@pllim
Copy link
Member

pllim commented Jan 29, 2024

@nstarman , apparently APE 22 is taken by #87 . You will have to rename your file... but at this point, I am not sure to what. Maybe @eteq can advise. See #87 (comment)

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at this again and must admit I find the prospect of yet another major refactoring rather off-putting, especially as it aims to solve what seems a minor problem; certainly, at the user level we have had few if any complaints as we (re)move code, suggesting that users have no issues finding things they need and not rely (overly) on private API.

We should really think carefully about the costs of these refactoring exercises. I'd estimate the implementation, including reviews, is going to be of order 10^2 hours (certainly more than 10^1, surely less than 10^3, though perhaps 10^1.5 is more realistic). Do we really want to spend the equivalent of $ 5...10k on this? (Though I guess we have already spent several 10k $ equivalent on ruff "rules"; has that been really worth it? What else could we have done with that amount of expensive developer time?)

I also think we (and thus the APE) should consider seriously alternatives that start from current practice, that anything under the basic subpackages like astropy.coordinates should be taken to be private unless explicitly stated otherwise, and just try for formalize that in a way is good enough, without worrying too much about perfection (PEP 8 is explicit about why that is a bad idea). Can this not be made explicit without much effort (and no renaming)? E.g., how about the following alternative, with three nicely incremental steps:

  1. We explicitly document current practice that everything under subpackages is private and add a corresponding comment in all their top level __init__.py files (making appropriate exceptions in io and utils).
  2. We add __all__ to all submodule __init__.py files that include the public items, including public submodules of the subpackages.
  3. We slowly add __all__ to the rest of astropy, to indicate to ourselves which parts are meant to be used outside a given module.

``baz.__all__``, but what if ``baz`` is private?


Astropy partially implements all three of the conventions: some private classes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes the problem sound much worse than it is: astropy actually has a clear, basic definition of what is public, which is everything directly in a submodule like astropy.coordinates. All imports should be from there, none from lower levels. In common usage the only exception for this is io.fits (all other io is accessed only through QTable.read/write), and some of the things in utils (which could certainly use clarification in their respective files).

Anyway, specifically for here, I think it would help the case if the text accurately described the current state, and if the example of things that were unclear actually referred to our code base rather than a hypothetical one, and gave some links to issues where our current layout was clearly the source of confusion/problems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

astropy actually has a clear, basic definition of what is public, which is everything directly in a submodule like astropy.coordinates.

Do you know where this is written down?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it pretty obvious from the documentation - see the reference sections of time, coordinates, timeseries - though of course the moment I look at others, I see that nddata is not quite as good, and modeling has definite public modules (though clearly indicated as such).

I do stick with my main point that we actually have had very few cases where users were confused about where to import things from.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not too difficult to find counter-examples from astroquery, specutils, photutils and ccdproc, and these are all coordinated packages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting these should all be converted to a particular scheme too?Then we better get more feedback on this APE 😺 I doubt I'm the only one who feels time is better spent elsewhere (and sadly this is not something that can be done without impacting everybody).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presented a few counter-examples from coordinated packages to the claim

...that we actually have had very few cases where users were confused about where to import things from.

We have more control over coordinated packages, so if those are being confused about importing then I'm guessing the confusion is even worse in non-coordinated packages.


Let's consider an example::

src/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I feel any example should be from actual astropy code, not something fake.

Copy link
Member Author

@nstarman nstarman May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"truth is stranger than fiction. Fiction has to make sense." Mark Twain.
I'm happy to go find Astropy examples, but illustrative examples are useful pedagogy. When I update this with real examples I'll probably keep the illustrative one since it puts all scenarios in one grokkable example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: difficulties in public/private API definitions in astropy/astropy#16519


**We do nothing:**

This is the status quo. It is not a good option because it does not solve the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option has the quite large benefit that there won't be yet another large set of PRs to review that have nothing to do with science (plus the need to retrain with new file names).

Copy link
Member Author

@nstarman nstarman May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK. It's a fair bit of work for whomever does the file renaming (not because renaming files is hard but because of CI and the docs). In my experience reviewing a PR that's just a big pile of file renames (using git-mv so it doesn't even affect the git history) is SUPER easy.

Also, if you take a look at https://github.com/GalacticDynamics/galax/tree/main/src/galax/potential, I structured this submodule so there's the public API declared right at the top, then everything in https://github.com/GalacticDynamics/galax/tree/main/src/galax/potential/_potential looks exactly like it would were this Astropy. This boiled down to a folder rename, a __init__.py (and since I'm doing some things statically and with lazy loading, a __init__.pyi). This is easy to review.

APE22.rst Show resolved Hide resolved
issue. The aforementioned problems of not knowing what is stable and
supported, and what is not, remain.

**We allow** ``__all__`` **to be optional:**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another alternative, where we just formalize current practice: that anything under the basic submodules like astropy.coordinates should be taken to be private unless explicitly stated otherwise. And we can add __all__ incrementally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very supportive of doing what you suggest in another APE! If such an APE existed I would make this APE a draft until the other APE were implemented. I still think this APE is the right way to go, but since the APE you are proposing is a subset of this one, so it's 🆗 by me to get the ball rolling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: IDK when they did this, but scipy now has underscore-prefixed module names everywhere. They fixed their public/private problem. e.g. https://github.com/scipy/scipy/tree/main/scipy/fft

@nstarman
Copy link
Member Author

nstarman commented May 31, 2024

Note: if we take cues from "upstream" packages, scipy now has underscore-prefixed names for modules.

@pllim
Copy link
Member

pllim commented Jun 21, 2024

This was discussed as part of "State of APEs" at Coordination Meeting 2024. I think reactions were mixed and I cannot see any clear action items on how to move this forward (or if we should).

@astrofrog
Copy link
Member

One idea I raised was that at the very least if we cannot reach consensus on changing current code, we should see if we can agree on rules for any new code?

@pllim
Copy link
Member

pllim commented Jun 21, 2024

Re: #85 (comment)

For completeness, my response to that idea in the meeting was that if it is only recommendation for new code, I do not think we need an APE, but rather we can modify the dev docs.

@mhvk
Copy link
Contributor

mhvk commented Jun 22, 2024

There was indeed no consensus on the underscore prefixes, in large part because, contrary to what I thought at least, things like from astropy.units.quantity import Quantity were widespread in other github repositories. Hence, changing to underscores is guaranteed to break quite a bit of downstream code, and it is not clear this is worth it.

There also seemed to be consensus that in the end the documentation should be the ultimate arbiter, since that is what users would normally see (and the mistake of documenting the quantity submodule is probably at least partially to blame for the wrong usages...). So, it remains a good idea to ensure we document what is public and private, but start incrementally, as suggested in #85 (review):

  1. We explicitly document current practice that everything under subpackages is private and add a corresponding comment in all their top level __init__.py files (making appropriate exceptions in io and utils).
  2. We add __all__ to all subpackage __init__.py files that include the public items, including public submodules of the subpackages.
  3. We slowly add __all__ to the rest of astropy, to indicate to ourselves which parts are meant to be used outside a given module.

Finally, I'd say there was no consensus either on new vs old code, or subpackages doing different things, with the latter having the advantage of maintainers being able to set a policy they feel is best, but the disadvantage that then there is no package-wide logic anymore at all, while currently there is (with cosmology the only exception).

@nstarman
Copy link
Member Author

Hence, changing to underscores is guaranteed to break quite a bit of downstream code, and it is not clear this is worth it.

from astropy.units.quantity is private. IMO we might as well break usage of private code in one fell swoop and then not (ever?) again rather than do it piecemeal as publicly-visible-private-code is changed over the years. I would find that less disruptive.

@mhvk
Copy link
Contributor

mhvk commented Jun 29, 2024

from astropy.units.quantity is private. IMO we might as well break usage of private code in one fell swoop and then not (ever?) again rather than do it piecemeal as publicly-visible-private-code is changed over the years. I would find that less disruptive.

It is private, indeed. But the feeling at the coordination meeting was that breaking people's code for code style purity is too big a price to pay. And it is not likely we would ever define Quantity in another place than astropy.units.quantity. I also think the issue may be moot sooner or later, since I do think there will eventually be a general units/quantity package that we are going to be based on (hopefully by combining our units machinery with Quantity 2.0!).

But for astropy as a whole, continuity and consistency are important too. But nothing stopping us from making it clearer what is public and not, by defining appropriate __all__ and ensuring that, unlike for units, "private" modules do not appear in the documentation unless strictly necessary, and then with a clear docstring that states why they are included.

@nstarman
Copy link
Member Author

nstarman commented Jun 29, 2024

since I do think there will eventually be a general units/quantity package that we are going to be based on (hopefully by combining our units machinery with Quantity 2.0!)

🎉. That would be excellent.

But nothing stopping us from making it clearer what is public and not, by defining appropriate __all__

A great thing to do, no matter the outcome of this APE.


When first proposed, one of the counter-arguments was that "upstream" libraries haven't done this. But now both numpy and scipy have basically done this (slightly different implementations).
And they did it in one fell swoop (numpy 2, recent scipy), so that users didn't suffer multiple falls from repeated swoops.
I'm just wondering why we're different. Same problem, similar solution?

When I first wrote this APE it was to make an argument "why we should do this". Now that our upstreams have done the same thing, IMO the argument shifts to "why aren't we doing this?". Prima facie we should.

Documentation is important, but it is most certainly not how any of our upstream libraries define their public API. The point of public-facing documentation is to document what is public, not to make it public. Just like how we generate documentation from docstrings (prioritizing that the code contains its on documentation) so too does Python, our upstream libraries, and most everyone else makes it so that public/private is a product of the code, not imposed on it.
And this understanding is intrinsic to how we've built tooling for Astropy, like sphinx-automodapi: it looks at __all__.

@neutrinoceros
Copy link

And they did it in one fell swoop (numpy 2, recent scipy), so that users didn't suffer multiple falls from repeated swoops.
I'm just wondering why we're different. Same problem, similar solution?

I see 3 options here
a) moving private code to private modules over one swoop
b) moving private code to private modules piecewise
c) do nothing1

I agree with @nstarman that a>b. I also agree with Marten c>b. The remaining question is how to compare a VS c.

Since, as you guys pointed out, we may eventualy have to move part the private code from astropy.units in response to Quantity 2.0 becoming a dependency, why not use that event as a pivot to switch our strategy to a, and keep status quo (c) in the mean time ?

Footnotes

  1. I'm only speaking about moving modules/members around here. Defining __all__ is a separate discussion and one that seems more consensual anyway.

@mhvk
Copy link
Contributor

mhvk commented Jun 30, 2024

Numpy was in a different state, with, e.g., some parts of np.lib being public, while other parts were not, so there was more urgency than we have. Even so, they held off to numpy 2.0, where a lot of other stuff was broken too.

Overall, @neutrinoceros made the right list, and I guess the conclusion in the coordination meeting was that c>a at the present time. At a time when there is a larger API change (as Quantity 2.0 would be), the conclusion may well be different.

In the meantime, there's nothing stopping us from incrementally ensuring that docstrings and __all__ are all consistent and clear.

@pllim
Copy link
Member

pllim commented Jul 1, 2024

Also keep in mind that NumPy has the backing of private industry (e.g., NVidia). Astronomy does not. I have started seeing pipelines pinning numpy<2 privately just because they have larger fish to fry and no time to deal with breaking API here and there. To them, calibration accuracy and stability is way more important than whether astropy.units.quantity is private or not. We have to keep our main "customers" in mind and they are not "big money".

Copy link
Member

@astrofrog astrofrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we could consider doing in any case would be for common cases of misuse (such as astropy.units.quantity) we could use batchpr to search through GitHub and open PRs to fix these incorrect imports.

It's also worth considering another option d, which is to do The Right Thing ™️ for new sub-packages and code, to at least not increase the problem.

@pllim
Copy link
Member

pllim commented Dec 31, 2024

xref astropy/astropy#17589

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants