Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

APE23: Guidelines for required and optional dependencies #86

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

astrofrog
Copy link
Member

This APE has been in the pipeline for a long time and was inspired by astropy/astropy#12448 - I decided to try and revive it and update it. The main goal here was to try and provide a framework for discussing/thinking about optional dependencies since there are different kinds of such dependencies.

I don't want to be too prescriptive here about what to do specifically about scipy, as I think that should be discussed in the relevant issue, and this is meant to be a more general APE.

@astrofrog astrofrog marked this pull request as draft June 20, 2023 11:16
APE23.rst Outdated Show resolved Hide resolved
@eteq
Copy link
Member

eteq commented Jun 11, 2024

Prompted by some of the discussion at the 2024 coordination meeting:

  • I'm very concerned about this because for both "lean-and-mean" installs and certain CI/testing use cases, it should still be possible to run the tests without scipy/matplotlib to be present.
  • It was clarified in the discussion that this is not meant to imply the above: that is, even after this APE we would continue using # REQUIRES SCIPY, try/except guards around scipy imports, etc to allow astropy functionality to still work if scipy or matplotlib are not installed and also not required. That makes me much more comfortable with this BUT I think it's critical that be explicitly in this APE: that astropy should still be importanble and the tests should still complete even if scipy/matplotlib aren't present (by which I think I mean really all of the "unintuitive" dependencies).
  • I personally don't have a strong opinion on pip install --no-deps vs pi install astropy[all] as solutions to this.

Another thing that came up is the question of what to do with conda. Since conda has no concept of "recommended dependencies" the situation is even worse there. The feeling in the room at the 2024 coordination meeting seemed to be that it made sense to make all the recommended requirements be "requirements" for conda, although it isn't clear to me if the desire was to make this part of this APE vs just a thing we can do in the conda-forge recipe.

@neutrinoceros
Copy link

The feeling in the room at the 2024 coordination meeting seemed to be that it made sense to make all the recommended requirements be "requirements" for conda, although it isn't clear to me if the desire was to make this part of this APE vs just a thing we can do in the conda-forge recipe.

@Cadair started a PR in that direction at conda-forge/astropy-feedstock#137 if anyone's interested

@Cadair
Copy link
Member

Cadair commented Sep 11, 2024

I think we should 100% put all the recommended deps in the conda recipie now. Frankly, I think we should put all of the unintuitive deps in the conda recipie now. It's just user hostile not to, given the lack of [all] and also it's the generally done thing in conda packages.

@Cadair
Copy link
Member

Cadair commented Sep 11, 2024

To express an opinion on a more meta level:

Our current dependency policy, both in how they are specified in pyproject.toml and the conda recipe is actively user hostile for the majority of our users. We can love the purity (and occasional utility) of having it possible to install astropy with few dependencies, but given the tooling we are working with the upshot of supporting that use case is that we make things significantly worse for the majority of people.

We should move scipy and matplotlib for sure into required dependencies in pyproject and we should also add all the non-interface dependencies to the list on conda-forge, we should do the latter now, as there's no good reason I can see not to.

@taldcroft
Copy link
Member

and also it's the generally done thing in conda packages.

While I'm not entirely averse to the idea, I'm not sure I agree with that assertion. You would need to support that with data. For instance I just checked pandas and it does not include optional dependencies in conda.

@Cadair
Copy link
Member

Cadair commented Sep 11, 2024

oooh, well in trying to find a citation, I found this: https://conda-forge.org/docs/maintainer/knowledge_base/#multi-output-recipes

Distributing the same project with different sets of dependencies. For example: The project with the minimal dependencies to run, and a separate output that extends that list:

So clearly we should do that, have "astropy" be basically everything and astropy-base be nothing but recommended.

@bsipocz
Copy link
Member

bsipocz commented Sep 11, 2024

We should move scipy and matplotlib for sure

I understand scipy (though there is still a lot of functionality that doesn't require it, many still just use astropy for the fits and wcs modules), but why mpl, surely that is not at all required for most astropy functionality.

@mhvk
Copy link
Contributor

mhvk commented Sep 11, 2024

So clearly we should do that, have "astropy" be basically everything and astropy-base be nothing but recommended.

As the person who usually complains about all the extra unwanted baggage - this I am totally fine with. As far as I can tell, the only reason we ever have had disagreements about making all optional dependencies the default, is that pip does not allow one to do something similarly simple: one can have astropy and astropy[all] but not astropy[required] and astropy.

So, -1 on changing pyproject.toml, but +100 on having two conda streams.

@mhvk
Copy link
Contributor

mhvk commented Sep 11, 2024

It is stupid to do just because pip is so bad, but maybe we should rename astropy to astropy-base or astropy-core and having pypi install astropy be something that depends on it as well as on every optional dependency?

@Cadair
Copy link
Member

Cadair commented Sep 11, 2024

I still maintain the pain of pip install astropy not doing what most users expect outweighs the upsides of there not being an easy way to install astropy only with the bareset required deps (given it is still possible if you know what you are doing).

@bsipocz
Copy link
Member

bsipocz commented Sep 11, 2024

I still maintain the pain of pip install astropy not doing what most users expect

Users not knowing what their tools doing is not exactly a strong argument here. E.g. pip install pandas doesn't install everything but the kitchen sink either. Once a dependency is required for most functionality, I can be on the side of adding it to the mandatory ones, but I don't find the above a strong enough argument, especially when it would bring in things that are in fact not required for most things (e.g. as I said above I would not like adding mpl to the required dependencies as it's required for only a small set of things).
How about teaching and normalizing the use of pip install astropy[all] instead?

And peppering downstream configs with --no-deps and other workarounds for astropy installs and instead spelling out required dependencies manually is not at all a good alternative (I found the need to explicitly name liberfa in astroquery configs a not at all good experience).

@mhvk
Copy link
Contributor

mhvk commented Sep 11, 2024

Agree with @bsipocz here. Also, as @neutrinoceros noticed elsewhere, most users just have a single virtual environment, in which almost certainly they have installed scipy and matplotlib by hand, quite possibly before installing astropy. And it is not like they don't need to install other packages by hand (say, astroquery).

Anyway, let's get the conda thing in, that seems more of a no-brainer!

And perhaps it really does make sense to just have astropy and astropy-core packages on pypi?

@mhvk
Copy link
Contributor

mhvk commented Sep 11, 2024

And peppering downstream configs with --no-deps

Hadn't even thought of it. Please, don't force me to adjust my other packages (which, yes, all use astropy, but, no, do not require scipy or matplotlib, since really I just use the basics, units, SkyCoord, Time, QTable, io)!

@astrofrog
Copy link
Member Author

astrofrog commented Sep 11, 2024

Just to circle back to the APE, I want to make sure we don't lose sight of the fact that making scipy be installed by default is not just about convenience, it actually reflects that it really should be a required dependency as it is used in so many unexpected places (unlike matplotlib).

@astrofrog
Copy link
Member Author

I will try and update the APE shortly based on the coordination meeting feedback

@dhomeier
Copy link
Contributor

I understand scipy (though there is still a lot of functionality that doesn't require it, many still just use astropy for the fits and wcs modules), but why mpl, surely that is not at all required for most astropy functionality.

Could matplotlib then be put into a separate output in conda as well, like astropy-visualization?

@bsipocz
Copy link
Member

bsipocz commented Sep 11, 2024

Could matplotlib then be put into a separate output in conda as well, like astropy-visualization

For the record, I'm on board with adding dependencies for the conda package and have astropy-base for the minium set. The above resistance is for not adding non-required things to the mandatory list in pyproject.

Having more extras option is also a 👍 from me, even if there are overlaps between the various extras lists (quite like what pandas has on that front, I can just really quickly browse for the keyword parquet or s3 and install only what's needed for those)

@neutrinoceros
Copy link

I'm +1 on what @bsipocz said, especially this

How about teaching and normalizing the use of pip install astropy[all] instead?

and I'll add:
Conda generally aims to make it easier to install everything, while pip's default behavior is more minimalist. Both behaviors are desirable, but to different audiences. In my opinion it is best to adhere to both cultures, but keep them separate; If conda is for end users who don't mind the bloat and rarely ever switch envs, having astropy pulling it all the gang is fine in that context, but let's not try to make pip-world more like conda1: downstream maintainers, packagers, and other minimalists need a happy path too.

Footnotes

  1. with the possible exception of making scipy a hard requirement

@Cadair
Copy link
Member

Cadair commented Sep 24, 2024

This is probably the best place to bring this up but, I have got the conda-forge recipe in a state where we need to actually decide on the concrete list of optional dependencies to install for "astropy" (as opposed to "astropy-base") by default, input and review welcome here: https://github.com/conda-forge/astropy-feedstock/pull/137/files#diff-f3725a55bf339595bf865fec73bda8ac99f283b0810c205442021f29c06eea9aR83

@taldcroft
Copy link
Member

Conda generally aims to make it easier to install everything, while pip's default behavior is more minimalist.

Can you elaborate on that? Both package managers install all required dependencies by default, so to first order the footprint should be the same in both cases, right? Both package managers also include an option to ignore dependencies and allow a "minimal" installation that will likely be broken.

I'm not sure if this is still the case, but the biggest difference is that the conda dependency graph management is more robust, so for instance it won't let you downgrade or remove a package if that breaks any installed package. IIRC, pip will happily let you "corrupt" your environment with no warnings.

In practice, I'm finding that pip actually makes it easier to install the kitchen sink in pip install astropy[all].

@neutrinoceros
Copy link

Can you elaborate on that?

What I meant there was to reformulate how pip/PyPI have mechanisms to require extra dependencies, while conda doesn't. Nothing new really, I just wanted to take a step back and reframe the discussion a bit.

Both package managers install all required dependencies by default, so to first order the footprint should be the same in both cases, right?

Yes.

I'm not sure if this is still the case, but the biggest difference is that the conda dependency graph management is more robust, so for instance it won't let you downgrade or remove a package if that breaks any installed package. IIRC, pip will happily let you "corrupt" your environment with no warnings.

This was actually changed in 2020: modern versions of pip have a "true" dependency resolver and will warn you in case of conflicting requirements. It still lets you ignore them if you explicitly specify requirements that don't respect your existing environment, but there again it does tell you that you're doing it. This is the reason why in 2020 I switched from conda to pip as my installer, since pip resolved the only problem that made me prefer conda at that time.

In practice, I'm finding that pip actually makes it easier to install the kitchen sink in pip install astropy[all].

Related: I started a discussion on uv's bug tracker about the idea of a general tool to do this with any package, in case anyone's interested astral-sh/uv#7295

@taldcroft
Copy link
Member

taldcroft commented Sep 24, 2024

IIRC, pip will happily let you "corrupt" your environment with no warnings.
This was actually changed in 2020: modern versions of pip have a "true" dependency resolver and will warn you in case of conflicting requirements.

@neutrinoceros - Are you sure about that? That is not my experience:

(junk) ➜  ~ pip install astropy
Collecting astropy
  Using cached astropy-6.1.3-cp312-cp312-macosx_11_0_arm64.whl.metadata (10 kB)
Collecting numpy>=1.23 (from astropy)
  Using cached numpy-2.1.1-cp312-cp312-macosx_14_0_arm64.whl.metadata (60 kB)
Collecting pyerfa>=2.0.1.1 (from astropy)
  Using cached pyerfa-2.0.1.4-cp39-abi3-macosx_11_0_arm64.whl.metadata (5.7 kB)
Collecting astropy-iers-data>=0.2024.7.29.0.32.7 (from astropy)
  Downloading astropy_iers_data-0.2024.9.23.0.31.43-py3-none-any.whl.metadata (5.1 kB)
Collecting PyYAML>=3.13 (from astropy)
  Using cached PyYAML-6.0.2-cp312-cp312-macosx_11_0_arm64.whl.metadata (2.1 kB)
Collecting packaging>=19.0 (from astropy)
  Using cached packaging-24.1-py3-none-any.whl.metadata (3.2 kB)
Using cached astropy-6.1.3-cp312-cp312-macosx_11_0_arm64.whl (6.4 MB)
Downloading astropy_iers_data-0.2024.9.23.0.31.43-py3-none-any.whl (1.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB 687.2 kB/s eta 0:00:00
Using cached numpy-2.1.1-cp312-cp312-macosx_14_0_arm64.whl (5.1 MB)
Using cached packaging-24.1-py3-none-any.whl (53 kB)
Using cached pyerfa-2.0.1.4-cp39-abi3-macosx_11_0_arm64.whl (329 kB)
Using cached PyYAML-6.0.2-cp312-cp312-macosx_11_0_arm64.whl (173 kB)
Installing collected packages: PyYAML, packaging, numpy, astropy-iers-data, pyerfa, astropy
Successfully installed PyYAML-6.0.2 astropy-6.1.3 astropy-iers-data-0.2024.9.23.0.31.43 numpy-2.1.1 packaging-24.1 pyerfa-2.0.1.4
(junk) ➜  ~ 
(junk) ➜  ~ 
(junk) ➜  ~ 
(junk) ➜  ~ 
(junk) ➜  ~ pip uninstall pyyaml
Found existing installation: PyYAML 6.0.2
Uninstalling PyYAML-6.0.2:
  Would remove:
    /Users/aldcroft/miniconda3-arm/envs/junk/lib/python3.12/site-packages/PyYAML-6.0.2.dist-info/*
    /Users/aldcroft/miniconda3-arm/envs/junk/lib/python3.12/site-packages/_yaml/*
    /Users/aldcroft/miniconda3-arm/envs/junk/lib/python3.12/site-packages/yaml/*
Proceed (Y/n)? y
  Successfully uninstalled PyYAML-6.0.2
(junk) ➜  ~ pip uninstall numpy
Found existing installation: numpy 2.1.1
Uninstalling numpy-2.1.1:
  Would remove:
    /Users/aldcroft/miniconda3-arm/envs/junk/bin/f2py
    /Users/aldcroft/miniconda3-arm/envs/junk/bin/numpy-config
    /Users/aldcroft/miniconda3-arm/envs/junk/lib/python3.12/site-packages/numpy-2.1.1.dist-info/*
    /Users/aldcroft/miniconda3-arm/envs/junk/lib/python3.12/site-packages/numpy/*
Proceed (Y/n)? y
  Successfully uninstalled numpy-2.1.1
(junk) ➜  ~ python
Python 3.12.6 | packaged by conda-forge | (main, Sep 22 2024, 14:07:06) [Clang 17.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import astropy
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/aldcroft/miniconda3-arm/envs/junk/lib/python3.12/site-packages/astropy/__init__.py", line 21, in <module>
    from . import config as _config
  File "/Users/aldcroft/miniconda3-arm/envs/junk/lib/python3.12/site-packages/astropy/config/__init__.py", line 8, in <module>
    from .configuration import *
  File "/Users/aldcroft/miniconda3-arm/envs/junk/lib/python3.12/site-packages/astropy/config/configuration.py", line 24, in <module>
    from astropy.utils import find_current_module, silence
  File "/Users/aldcroft/miniconda3-arm/envs/junk/lib/python3.12/site-packages/astropy/utils/__init__.py", line 21, in <module>
    from .shapes import *
  File "/Users/aldcroft/miniconda3-arm/envs/junk/lib/python3.12/site-packages/astropy/utils/shapes.py", line 8, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'

@neutrinoceros
Copy link

Well, uninstalling is not what I had in mind. Indeed it seems that conda outshines pip in this area.
I was thinking of scenarios with conflicting requirements, as, for instance

❯ pip install "astropy==6.0.*" "numpy>2"

outputs

Collecting astropy==6.0.*
  Using cached astropy-6.0.1-cp312-cp312-macosx_11_0_arm64.whl.metadata (9.8 kB)
Collecting numpy>2
  Using cached numpy-2.1.1-cp312-cp312-macosx_14_0_arm64.whl.metadata (60 kB)
INFO: pip is looking at multiple versions of astropy to determine which version is compatible with other requirements. This could take a while.
Collecting astropy==6.0.*
  Downloading astropy-6.0.0-cp312-cp312-macosx_11_0_arm64.whl.metadata (9.8 kB)
ERROR: Cannot install astropy==6.0.0, astropy==6.0.1 and numpy>2 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested numpy>2
    astropy 6.0.1 depends on numpy<2 and >=1.22
    The user requested numpy>2
    astropy 6.0.0 depends on numpy<2 and >=1.22

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip to attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

Or, to show that pip does keep track of what's already installed

❯ pip install "astropy==6.0.*"
Collecting astropy==6.0.*
  Using cached astropy-6.0.1-cp312-cp312-macosx_11_0_arm64.whl.metadata (9.8 kB)
Collecting numpy<2,>=1.22 (from astropy==6.0.*)
  Using cached numpy-1.26.4-cp312-cp312-macosx_11_0_arm64.whl.metadata (61 kB)
Collecting pyerfa>=2.0.1.1 (from astropy==6.0.*)
  Using cached pyerfa-2.0.1.4-cp39-abi3-macosx_11_0_arm64.whl.metadata (5.7 kB)
Collecting astropy-iers-data>=0.2024.2.26.0.28.55 (from astropy==6.0.*)
  Using cached astropy_iers_data-0.2024.9.23.0.31.43-py3-none-any.whl.metadata (5.1 kB)
Collecting PyYAML>=3.13 (from astropy==6.0.*)
  Using cached PyYAML-6.0.2-cp312-cp312-macosx_11_0_arm64.whl.metadata (2.1 kB)
Collecting packaging>=19.0 (from astropy==6.0.*)
  Using cached packaging-24.1-py3-none-any.whl.metadata (3.2 kB)
Using cached astropy-6.0.1-cp312-cp312-macosx_11_0_arm64.whl (6.4 MB)
Using cached astropy_iers_data-0.2024.9.23.0.31.43-py3-none-any.whl (1.9 MB)
Using cached numpy-1.26.4-cp312-cp312-macosx_11_0_arm64.whl (13.7 MB)
Using cached packaging-24.1-py3-none-any.whl (53 kB)
Using cached pyerfa-2.0.1.4-cp39-abi3-macosx_11_0_arm64.whl (329 kB)
Using cached PyYAML-6.0.2-cp312-cp312-macosx_11_0_arm64.whl (173 kB)
Installing collected packages: PyYAML, packaging, numpy, astropy-iers-data, pyerfa, astropy
Successfully installed PyYAML-6.0.2 astropy-6.0.1 astropy-iers-data-0.2024.9.23.0.31.43 numpy-1.26.4 packaging-24.1 pyerfa-2.0.1.4

then

❯ pip install "numpy>2"

outputs

Collecting numpy>2
  Using cached numpy-2.1.1-cp312-cp312-macosx_14_0_arm64.whl.metadata (60 kB)
Using cached numpy-2.1.1-cp312-cp312-macosx_14_0_arm64.whl (5.1 MB)
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 1.26.4
    Uninstalling numpy-1.26.4:
      Successfully uninstalled numpy-1.26.4
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
astropy 6.0.1 requires numpy<2,>=1.22, but you have numpy 2.1.1 which is incompatible.
Successfully installed numpy-2.1.1

as I was saying, it does let you install conflicting requirements but makes it harder (it won't do it if these requirements come from a single command), and warns about it.

@neutrinoceros
Copy link

And for completion, here are the same scenarios with uv pip

❯ uv pip install "astropy==6.0.*" "numpy>2"
  × No solution found when resolving dependencies:
  ╰─▶ Because only the following versions of astropy are available:
          astropy<6.0.dev0
          astropy>=6.0.0
      and astropy>=6.0.0,<=6.0.1 depends on numpy>=1.22,<2, we can conclude that astropy>=6.0.dev0,<6.1.dev0 depends on numpy>=1.22,<2.
      And because you require astropy>=6.0.dev0,<6.1.dev0 and numpy>2, we can conclude that your requirements are unsatisfiable.

      hint: astropy was requested with a pre-release marker (e.g., astropy>=6.0.dev0,<6.0.0), but pre-releases weren't enabled (try: `--prerelease=allow`)

-> similar to pip, with a more terse error message

❯ uv pip install "astropy==6.0.*"
Resolved 6 packages in 53ms
Prepared 2 packages in 2.95s
Installed 6 packages in 38ms
 + astropy==6.0.1
 + astropy-iers-data==0.2024.9.23.0.31.43
 + numpy==1.26.4
 + packaging==24.1
 + pyerfa==2.0.1.4
 + pyyaml==6.0.2

then

❯ uv pip install "numpy>2"
Resolved 1 package in 3ms
Uninstalled 1 package in 44ms
Installed 1 package in 25ms
 - numpy==1.26.4
 + numpy==2.1.1

-> no error message at all !

FTR this behavior is fine within the intended workflow: I believe pip (and uv pip) are designed around a workflow where you install an env from some requirement file once, and then sync it (using pip-sync from pip-tools or uv sync respectively), or just recreate it as needed.
Conda seems to be better suited for workflows that include a lot of env tinkering (for lack of a better word).

@taldcroft
Copy link
Member

where you install an env from some requirement file once

I would guess that most working researchers have one environment where they just install any package that seems useful. So if they had astropy 6.0 installed and then some other random package requires numpy>2, suddenly astropy is broken with no warning.

I'm a bit surprised that a new ground-up package like uv doesn't include the existing environment in the dependency resolution (or at least an option to do so).

@neutrinoceros
Copy link

scientists and data analysts are not the primary target demographic for uv, I suppose.

@Cadair
Copy link
Member

Cadair commented Sep 25, 2024

scientists and data analysts are not the primary target demographic for uv, I suppose.

cough conda. 😛

@taldcroft
Copy link
Member

I'm getting whiplash from trying to decide what is best to recommend for astropy users. I had come full circle to pip, but maybe with the new kitchen-sink-included astropy conda package then conda might be best. 🤯

Co-authored-by: Clément Robert <cr52@protonmail.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants