Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Pythia Foundations Environment #56

Open
jukent opened this issue May 27, 2021 · 13 comments
Open

Pythia Foundations Environment #56

jukent opened this issue May 27, 2021 · 13 comments
Labels
content Content related issue infrastructure Infrastructure related issue

Comments

@jukent
Copy link
Contributor

jukent commented May 27, 2021

Should we create a .yaml file with all the packages necessary to run through the Foundations course?

@jukent jukent added the content Content related issue label May 27, 2021
@brian-rose
Copy link
Member

We already have an environment.yml file in the root of the foundations repo that will have everything included.

Is the idea to create a simpler environment just containing run-time dependencies but excluding the jupyter-book stuff needed to render the book itself?

@brian-rose
Copy link
Member

brian-rose commented Jun 11, 2021

Revisiting this... I just submitted a PR #70 to add jupyterlab to the environment, because I'm using this environment for authoring notebook content and I want to be able to run in the lab.

It raises a question that I guess I'm not clear on: is our environment.yml meant to be just a minimal environment for building the book only (as used on CI services)?

It's possible that we could maintain two different environments:

  • one for book-building (requires jupyter-book and all its dependencies)
  • another for interactive content use / content authoring (no jupyter-book, but should contain jupyterlab)

The list of Python packages would be the same in both envs.

I would default to putting everything together in a single environment, but I wonder if others have different opinions about this.

@brian-rose brian-rose added the infrastructure Infrastructure related issue label Jun 11, 2021
@clyne
Copy link
Contributor

clyne commented Jun 14, 2021

I'd vote for keeping the environments as consistent as possible across all sites, intended uses, etc. Simpler is better.

@brian-rose
Copy link
Member

@clyne
Copy link
Contributor

clyne commented Jun 14, 2021

Not sure how practical it is, but it would be nice if there were a single conda environment for all of Pythia, whether you are a user or a contributor.

@brian-rose
Copy link
Member

brian-rose commented Jun 14, 2021

I agree, from a use perspective. However I don't think we can get away from needing a environment.yml file in every repo, to be used by CI services etc.

Maybe it's possible to set up a dependabot service to automatically keep all the environment.yml files in sync (e.g. opening PRs to update the files in other repos whenever we change one of them).

EDIT: I have no idea how to do this, it just sounds plausible.

@dopplershift
Copy link
Contributor

dopplershift commented Jun 14, 2021

I mean, you could always download an environment.yml from anywhere to use, manually as a step in CI.

But as a I look at things, I don't see a problem with having the 3 different ones. Those serve 3 different purposes:

  • Portal website
  • Actual tutorial content
  • Store Datasets

It's entirely wasteful, slows things down, and opens more opportunities for breakage to have the portal and dataset CI builds download and set up an entire NumPy, Pandas, Scipy, etc. environment every time we update some dataset or tweak the entirely non-Python portal site.

So I agree, favor simplicity--but I'd argue simplicity for keeping infrastructure working. You'd be amazed how often things break. Those environments are created a whole lot more often than (I hope) any of us are creating Pythia environments from scratch. Now, if there's not too much overhead, I'd be happy to see the environment.yml for this (foundations) repo kept up-to-date so that it has everything needed to contribute to any of the Pythia repos.

@dopplershift
Copy link
Contributor

Regarding the original part of having a separate environment.yml that has only what the user needs (with jupyterlab) vs. the full documentation build stack, that certainly seems reasonable and again would reduce the support burden (i.e. picture debugging what could go wrong on user systems). One option would be to start CI by creating an env from environment.yml, then use a separate step to install our doc build dependencies (which could be in its own file).

@brian-rose
Copy link
Member

Good points @dopplershift, although I would quibble that our portal site will not be entirely non-Python, as it is built with sphinx. But certainly won't need numpy, cartopy, etc.

I think landing on the foundations environment.yml as the "all-in" environment, while keeping the other repos more bare bones, is a good compromise. And I think that environment needs to contain the full doc build dependencies, because we are trying to build tutorial materials around making modifications to the docs themselves (i.e. the Foundations book), so we want to provide users with an environment for not just running the examples, but also building the book.

@clyne
Copy link
Contributor

clyne commented Jun 14, 2021

Good points. The consistent environment.yml file between different repos is probably less important for users that won't be bouncing back and forth between repos as maintainers have to. Hopefully, the latter are more savvy and don't get tripped up by this (as much as I do:-)

@ktyle
Copy link
Contributor

ktyle commented Jun 15, 2021

@dopplershift what's your advice on how best to keep an environment "up to date without breaking things"? When would we need to think about changing specific version requirements, such as =3.8 for python and <1.4 for sqlalchemy?

@dopplershift
Copy link
Contributor

@ktyle If I'm keeping an environment specification unbroken and up-to-date, I'm using Dependabot and pypi-style requirements.txt files (Dependabot doesn't yet support Conda 😢 ) with the version every dependenc explicitly listed ("pinned"). When there's an update, Dependabot issues a PR to update that version, which triggers CI to run whatever tests we have to validate. If it passes (e.g. all the notebooks run), the update is merged. If not, it can be examined further.

Unfortunately, for users to use such a file or files the store is more complicated, since you no longer have a single file with environment name, conda channels, and dependencies.

I got really tired of PRs broken by unrelated changes from upstream package changes. Who knows, maybe this repo will be less sensitive than MetPy's suite of checks.

@jukent
Copy link
Contributor Author

jukent commented Oct 5, 2022

Should we close this?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
content Content related issue infrastructure Infrastructure related issue
Projects
Status: Backlog
Development

No branches or pull requests

6 participants