-
-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
APE 22: Public API #85
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: nstarman <nathanielstarkman@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments/questions so far - one of the main things that bothers me in general and that I don't think there is a solution to is that I can do e.g.:
from astropy.cosmology.realizations import ScienceState
Obviously this isn't public API, and ScienceState
isn't in __all__
, but there's not real way to prevent users from relying on it, and we can't rename all imports in a module to include a _
prefix. So I think that we should probably also make it so that our API docs are also an authoritative source of public API and ensure that it's consistent with the other rules. We should also make sure that all public APIs are documented in the docs (we don't check that this is the case right now).
APE_public.rst
Outdated
public API." SciPy allows modules to lack an ``__all__`` attribute, meaning a | ||
user and their tools must understand the nuances of the previous rules. Having | ||
an ``__all__`` attribute in every module is simpler, unambiguous, and better for | ||
introspection by both users and automated systems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth mentioning at this point how a user would know that a function/class is in __all__
. Is it not possible to explicitly import functions/classes in a module that are not in __all__
? If so then having something in __all__
does not stop e.g. CoPilot from suggesting code that uses a function not in __all__
.
APE_public.rst
Outdated
2. All modules must have an ``__all__`` attribute, even if it is empty. The | ||
``__all__`` attribute defines the public and private interface of the module | ||
in which it is defined. Anything in ``__all__`` is public, including | ||
underscore-prefixed symbols. Anything not in ``__all__`` is private in that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although I think we should probably disallow underscore-prefixed symbols in __all__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that practically there should not, but I strongly think __all__
should be definitive, so if we have an underscore-prefixed object that we want to make public the mandatory steps are (in this order):
- put it in
__all__
. - update the docs to reflect
__all__
. - super strongly encouraged to remove the underscore prefix (updating steps 1 & 2).
This is adopting the Scipy disambiguation of PEP 8, adding primarily the mandate that empty __all__
be included in modules with no Public API.
APE_public.rst
Outdated
- Clearly state if a documented object is actually private. | ||
3. **Add prefixes**: 1. Add prefixes to all modules that are not public. 2. Add | ||
prefixes to all classes, functions, and attributes that are not public. | ||
:note:`I'm less enthusiastic on this point.` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of this as long as - as described above - we don't need to add underscore to symbols that are already in an underscore module for instance (so e.g. nothing in astropy.io.fits._tiled_compression
needs to have an underscore prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with that. I have reservations about going all in on underscore prefixing everything. Not needing to underscore symbols in private modules (which still have an __all__
) SGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@astrofrog, resolve?
After further thought, I think what is going to be really important here is to define what public API is from the perspective of a user - that is, a user won't know what |
The problem with |
What do you propose for submodules that currently define |
Maybe in that case .core should technically be private? (._core) |
👍. That is the suggestion of PEP 8 -- and that |
So this would be one of the largest changes to Astropy. PEP8 and thus Scipy and static typing all say that # __init__.py
# No __all__ is defined
from .core import * # core.py
__all__ = ["Foo"]
class Foo: ... Means that # __init__.py
__all__ = ["Foo"] # Foo is public in this module, even if it is defined elsewhere.
from .core import Foo # core.py
__all__ = [] # nothing is public in this module. Please look elsewhere.
class Foo: ... Essentially we need to move the contents of various Update The better option is to rename # __init__.py
from . import _core, ...
from ._core import *
...
__all__ = [] + _core.__all__ + ... # _core.py (formerly core.py)
__all__ = ["Foo"]
class Foo: ... This still supports |
I think Yes. To make sure we're on the same page, I think communicating it this way to users should be the logical consequence of the deeper rules:
Having all these rules means that a user can only get to public symbols though other public symbols -- that "anything that can be accessed through tab completion in IPython and which does not have _ prefixes anywhere" and "anything in the API docs" (unless explicitly stated) is public. It's not the source or our definition of public API, it's the consequence. |
I agree, |
If we want to control strictly what's public and private yes, basically all submodules should be private (renamed with underscore) and public API exported in the subpackages'
With this solution yes, this would require a lot of changes, and may be painful to do. So I don't like this solution. To summarize:
I don't like option 1 because it moves the list of exported functions from the module itself to another place, and requires a lot of changes which can be prone to errors. Option 2 is more reasonable. Then as you say, the underscore prefix is also just a convention, but that's the closest thing to a private scope in Python's land. And autocomplete respect it, so when users browse the functions in their shell they will see only public API. As a users I never checked the content of [1] They also kept |
(Some of this developed from discussion at the coordination meeting (including @nstarman, @saimn , @astrofrog , @tepickering, @pllim, @nden, @WilliamJamieson), although I don't think I can say that all of my points above are consensus of those folks, it's some mix of that and just straight up my own opinion.) I very much agree with @astrofrog's point here:
Which I think has up until this point (in an uncodified way) is that whatever the docs say is the public API. So I think it makes sense to codify that as the "true answer". @nstarman's point was that if we follow the rules here, it's the same between those, and that it's only an aberration if these are different. But I was/am concerned about the inevitable state when something isn't working right. So I think what we settled on is that we start by saying as of when this APE is accepted, the docs are the "true" public API, but this APE presents a plan to get to a state where the rules highlighted in this APE lead naturaly to the docs just reflecting the same thing as these rules. I still personally think we should have it true that the final source of authority is the documentation, because that's more user-facing of a contract, as @astrofrog says. But I think if we say that's the reality now, and we might re-visit it after this APEs plan is implemented, that's a reasonable compromise. Two more opinions to offer:
And one question:
|
I would say not. Even for things like removing astropy-helpers, it was a tedious campaign with writing up a transition guide and opening some PRs downstream. For things like this that would break API, it is a non-starter. |
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
@astrofrog @eteq @saimn @pllim, I've updated this APE based on the discussions we had at the conference an in this thread. LMK what you think! |
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
@saimn I've been working on this over at |
I also like Option 2 a lot. It works because With this as the template, the example from #85 (comment) becomes # __init__.py
__all__ = ["Foo"] # Foo is public in this module, even if it is defined in `_core`, which is private.
from ._core import Foo # _core.py
__all__ = ["Foo"] # Foo is "public" in this module, but this module is private.
class Foo: ...
@eteq, I believe @astrofrog's comment largely answers this question. |
@eteq, I agree.
I added a section on a pre-commit CI check. It actually looks to be fairly simple to check that a public module has corresponding documentation since we have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments.
not have a uniform and systematic approach to communicating what is public vs | ||
internal, then we cannot expect users and especially their tools to know what is | ||
public vs internal. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From astropy/astropy#15169, I recall another source of issues that has bitten astropy in the past: I/O registration. When deep sub-modules look public, e.g. astropy.cosmology.flrw.lambdacdm.LambdaCDM
then that is generally used as the class pathway in I/O rather than the better astropy.cosmology.LambdaCDM
. When the class is moved it breaks the I/O and needs a whole backwards-compatibility logic to support files using the previous path. Cleaning up public vs internal, e.g. to astropy.cosmology._flrw.lambdacdm.LambdaCDM
would make this an obviously bad way to serialize the class and code authors would get it "right" (astropy.cosmology.LambdaCDM
) the first time. Clear public API makes for more stable code.
probably deserves a few sentences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is slightly orthogonal to the issue here: really, one needs to set __module__
to avoid this - registration just followed that, not a choice made by coders. Note that numpy
in fact does this - which partially is annoying, since one no longer knows where the function is defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me the pain point was YAML serialization because I used f"{class.__module__}.{class.__qualname__}"
, which looks reasonable as code, but led to the problems described above. If, when making the tests, I had noticed a private path I would have fixed the problem from the start.
I believe this applies more generally, if something is defined in a private module but imported to a public location then it may be imported from there again, e.g. as part of I/O. Serializing using only the public import location is much better than what I did, which seemed reasonable at the time. Setting __module__
is one way to force public locations in serialization (which I agree has issues); a path map, or str regex / replace operation are other means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that one is an issue for examples and sphinx
too - I really dislike that when we made representations
a module (which was definitely a good idea!), we had to make so many updates to the docs. It would also be nice to use regex
for things like __construct_mixin_classes
For me, this is still separate from the APE, since this has annoyed me amply already! But I can see that coming from an environment where files with leading underscores have more meaning, you could have avoided the issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed we can leave it out of this APE. I think the point I made "If, when making the tests, I had noticed a private path I would have fixed the problem from the start." means adopting this APE will automatically force most of these I/O issues to be fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks extremely solid and a well motivated solution to a real problem, so I 100% get behind this APE. Here are a couple, mostly superficial, remarks and suggestions.
|
||
author: Nathaniel Starkman | ||
|
||
date-created: 2013 November 5 <replace with the date you submit the APE> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this date can be set already ?
public. This means that a module can define an ``__all__`` attribute but if the | ||
module itself is not public, then anything in the ``__all__`` attribute cannot | ||
be publicly accessed from outside the module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this sentence. Does this imply that any private module should set __all__ = []
, or does it mean that __all__
, within a private module, defines an interface that's meant for internal use only ? (now that I wrote it down, the latter seems way more likely, but I think there's room to improve the phrasing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the latter. If you can't publicly navigate to a module, it doesn't matter what's "public" within that module. The full path must be public.
The issue with relying on this alone is what you discovered in that Issue (trying to find the one) where we have modules that look public (because they are not underscored) so it's possible to construct an import path that looks totally public, but actually isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can't publicly navigate to a module, it doesn't matter what's "public" within that module.
I think that's what confused me; since it doesn't matter, why should we care what's in a private module's __all__
? Wouldn't it be simpler to recommend/rule that __all__
must be set to []
in private modules ?
The issue with relying on this alone is what you discovered in that Issue (trying to find the one)
For reference, I believe you meant to link astropy/astropy#15666
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might still be useful to be able to indicate in a private sub-module which things are meant to be importable in other parts of astropy, so __all__
could be useful in that way rather than always setting to []
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! This is demonstrated in lines 281-287
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the examples of good module layouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I think I understand and agree with everything said in this thread, but I still can't wrap my head around the phrasing. I think my main issue is that I don't know what's the technical definition for "publicly accessed"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@astrofrog, see https://github.com/astropy/astropy/pull/15109/files as an example doing what you're talking about. I agree, this is super useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @neutrinoceros!
Co-authored-by: Clément Robert <cr52@protonmail.com>
@eerovaher @WilliamJamieson @taldcroft @eteq @pllim, I would appreciate some more eyes on this APE, if you have the time in this busy season. As we often follow the lead of NumPy I think this APE is very timely given their refactor to quite nearly follow this APE. |
@nstarman , not sure if I have time to ponder this soon. Can this wait till the Coordination Meeting or is that too long to wait? |
@pllim, it can definitely wait to be approved. I'm not sure who would have the time to lead this effort if this APE were accepted sooner. But it would be great to have this be essentially finalized by the time of the Meeting. |
@nstarman , apparently APE 22 is taken by #87 . You will have to rename your file... but at this point, I am not sure to what. Maybe @eteq can advise. See #87 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at this again and must admit I find the prospect of yet another major refactoring rather off-putting, especially as it aims to solve what seems a minor problem; certainly, at the user level we have had few if any complaints as we (re)move code, suggesting that users have no issues finding things they need and not rely (overly) on private API.
We should really think carefully about the costs of these refactoring exercises. I'd estimate the implementation, including reviews, is going to be of order 10^2 hours (certainly more than 10^1, surely less than 10^3, though perhaps 10^1.5 is more realistic). Do we really want to spend the equivalent of $ 5...10k on this? (Though I guess we have already spent several 10k $ equivalent on ruff "rules"; has that been really worth it? What else could we have done with that amount of expensive developer time?)
I also think we (and thus the APE) should consider seriously alternatives that start from current practice, that anything under the basic subpackages like astropy.coordinates
should be taken to be private unless explicitly stated otherwise, and just try for formalize that in a way is good enough, without worrying too much about perfection (PEP 8 is explicit about why that is a bad idea). Can this not be made explicit without much effort (and no renaming)? E.g., how about the following alternative, with three nicely incremental steps:
- We explicitly document current practice that everything under subpackages is private and add a corresponding comment in all their top level
__init__.py
files (making appropriate exceptions inio
andutils
). - We add
__all__
to all submodule__init__.py
files that include the public items, including public submodules of the subpackages. - We slowly add
__all__
to the rest of astropy, to indicate to ourselves which parts are meant to be used outside a given module.
``baz.__all__``, but what if ``baz`` is private? | ||
|
||
|
||
Astropy partially implements all three of the conventions: some private classes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes the problem sound much worse than it is: astropy actually has a clear, basic definition of what is public, which is everything directly in a submodule like astropy.coordinates
. All imports should be from there, none from lower levels. In common usage the only exception for this is io.fits
(all other io is accessed only through QTable.read/write
), and some of the things in utils
(which could certainly use clarification in their respective files).
Anyway, specifically for here, I think it would help the case if the text accurately described the current state, and if the example of things that were unclear actually referred to our code base rather than a hypothetical one, and gave some links to issues where our current layout was clearly the source of confusion/problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
astropy actually has a clear, basic definition of what is public, which is everything directly in a submodule like astropy.coordinates.
Do you know where this is written down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it pretty obvious from the documentation - see the reference sections of time
, coordinates
, timeseries
- though of course the moment I look at others, I see that nddata
is not quite as good, and modeling
has definite public modules (though clearly indicated as such).
I do stick with my main point that we actually have had very few cases where users were confused about where to import things from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was not too difficult to find counter-examples from astroquery
, specutils
, photutils
and ccdproc
, and these are all coordinated packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting these should all be converted to a particular scheme too?Then we better get more feedback on this APE 😺 I doubt I'm the only one who feels time is better spent elsewhere (and sadly this is not something that can be done without impacting everybody).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presented a few counter-examples from coordinated packages to the claim
...that we actually have had very few cases where users were confused about where to import things from.
We have more control over coordinated packages, so if those are being confused about importing then I'm guessing the confusion is even worse in non-coordinated packages.
|
||
Let's consider an example:: | ||
|
||
src/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I feel any example should be from actual astropy code, not something fake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"truth is stranger than fiction. Fiction has to make sense." Mark Twain.
I'm happy to go find Astropy examples, but illustrative examples are useful pedagogy. When I update this with real examples I'll probably keep the illustrative one since it puts all scenarios in one grokkable example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: difficulties in public/private API definitions in astropy/astropy#16519
|
||
**We do nothing:** | ||
|
||
This is the status quo. It is not a good option because it does not solve the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option has the quite large benefit that there won't be yet another large set of PRs to review that have nothing to do with science (plus the need to retrain with new file names).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK. It's a fair bit of work for whomever does the file renaming (not because renaming files is hard but because of CI and the docs). In my experience reviewing a PR that's just a big pile of file renames (using git-mv so it doesn't even affect the git history) is SUPER easy.
Also, if you take a look at https://github.com/GalacticDynamics/galax/tree/main/src/galax/potential, I structured this submodule so there's the public API declared right at the top, then everything in https://github.com/GalacticDynamics/galax/tree/main/src/galax/potential/_potential
looks exactly like it would were this Astropy. This boiled down to a folder rename, a __init__.py
(and since I'm doing some things statically and with lazy loading, a __init__.pyi
). This is easy to review.
issue. The aforementioned problems of not knowing what is stable and | ||
supported, and what is not, remain. | ||
|
||
**We allow** ``__all__`` **to be optional:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another alternative, where we just formalize current practice: that anything under the basic submodules like astropy.coordinates
should be taken to be private unless explicitly stated otherwise. And we can add __all__
incrementally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very supportive of doing what you suggest in another APE! If such an APE existed I would make this APE a draft until the other APE were implemented. I still think this APE is the right way to go, but since the APE you are proposing is a subset of this one, so it's 🆗 by me to get the ball rolling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: IDK when they did this, but scipy
now has underscore-prefixed module names everywhere. They fixed their public/private problem. e.g. https://github.com/scipy/scipy/tree/main/scipy/fft
Note: if we take cues from "upstream" packages, |
This was discussed as part of "State of APEs" at Coordination Meeting 2024. I think reactions were mixed and I cannot see any clear action items on how to move this forward (or if we should). |
One idea I raised was that at the very least if we cannot reach consensus on changing current code, we should see if we can agree on rules for any new code? |
Re: #85 (comment) For completeness, my response to that idea in the meeting was that if it is only recommendation for new code, I do not think we need an APE, but rather we can modify the dev docs. |
There was indeed no consensus on the underscore prefixes, in large part because, contrary to what I thought at least, things like There also seemed to be consensus that in the end the documentation should be the ultimate arbiter, since that is what users would normally see (and the mistake of documenting the
Finally, I'd say there was no consensus either on new vs old code, or subpackages doing different things, with the latter having the advantage of maintainers being able to set a policy they feel is best, but the disadvantage that then there is no package-wide logic anymore at all, while currently there is (with |
|
It is private, indeed. But the feeling at the coordination meeting was that breaking people's code for code style purity is too big a price to pay. And it is not likely we would ever define But for astropy as a whole, continuity and consistency are important too. But nothing stopping us from making it clearer what is public and not, by defining appropriate |
🎉. That would be excellent.
A great thing to do, no matter the outcome of this APE. When first proposed, one of the counter-arguments was that "upstream" libraries haven't done this. But now both When I first wrote this APE it was to make an argument "why we should do this". Now that our upstreams have done the same thing, IMO the argument shifts to "why aren't we doing this?". Prima facie we should. Documentation is important, but it is most certainly not how any of our upstream libraries define their public API. The point of public-facing documentation is to document what is public, not to make it public. Just like how we generate documentation from docstrings (prioritizing that the code contains its on documentation) so too does Python, our upstream libraries, and most everyone else makes it so that public/private is a product of the code, not imposed on it. |
I see 3 options here I agree with @nstarman that Since, as you guys pointed out, we may eventualy have to move part the private code from Footnotes
|
Numpy was in a different state, with, e.g., some parts of Overall, @neutrinoceros made the right list, and I guess the conclusion in the coordination meeting was that In the meantime, there's nothing stopping us from incrementally ensuring that docstrings and |
Also keep in mind that NumPy has the backing of private industry (e.g., NVidia). Astronomy does not. I have started seeing pipelines pinning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing we could consider doing in any case would be for common cases of misuse (such as astropy.units.quantity) we could use batchpr to search through GitHub and open PRs to fix these incorrect imports.
It's also worth considering another option d, which is to do The Right Thing ™️ for new sub-packages and code, to at least not increase the problem.
Up for discussion!
Very much a work in progress. Hopefully refined by discussion at the upcoming conference.