Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Intervention to have new hubs not couple to the 2i2c-hubs-image, but something more up to date #2336

Closed
consideRatio opened this issue Mar 11, 2023 · 8 comments · Fixed by #2671
Assignees

Comments

@consideRatio
Copy link
Contributor

I want this issue to be focused around a swift intervention, not a final solution. The intervention I want to see is to not have new hubs be recommended to use the 2i2c-hubs-image, and instead use pangeo/pangeo-notebook.

I'm not sure if the 2i2c-hubs-image delivers something beyond pangeo/pangeo-notebook, but it comes a cost as we don't keep the 2i2c-hubs-image updated. Even if we tried to keep it updated, its almost impossible to have multiple communities use a single image. Some will get stuck relying on old versions, other will end up needing new versions.

Right now, this image has a a requirements.txt that includes for example pinning to the patch version of a ~two year old pinning of jupyter-resource-usage, even though there are new versions of that package. The same is true for several other packages, even dask-gateway.

In the spirit of helping open source software develop, we should strive towards using modern software and not get stuck with old versions. This way, we avoid having projects get reports about already resolved issues for example, and get reports on new issues that needs help being detected and resolved.

Hubs about to couple to a 2i2c official image

Related

@jmunroe
Copy link
Contributor

jmunroe commented Mar 11, 2023

I recently created

that is on a similar theme. I am not sure that the pangeo-notebook meets the needs of a good default (being Python only). Also pangeo-notebook is a large image tailored to earth sciences. I wonder if using base-notebook could be a better default. I also liked that 2i2c-hubs-image includes Rstudio by default.

The idea of using base-notebook to as reference does have the great benefit of being kept up to date by a larger community than just 2i2c. Maybe it's possible to track pangeo/base-notebook as a dependency for 2i2c-hubs-image

Communities need to have a pathway to modify the default image we provide. So 2i2c-hubs-image needs to follow the same instructions that we give in our template hub image repo.

Let's discuss this further at our next Product/Engineer meeting.

@pnasrat
Copy link
Contributor

pnasrat commented Mar 15, 2023

Thinking about #2360 I also believe part of providing whatever base image we seettle on it would be beneficial to involve have some tests to catch reqressions for the common workflows/featuresets that our communties use so more frequently update images are possible to deploy with safety.

This could be critical user journeys automated via python / webdriver or some other mechanism that works. However we should be able to catch things before they get out to users for common cases.

@jmunroe
Copy link
Contributor

jmunroe commented Mar 15, 2023

Test workflows sound like a great idea. I've be thinking for a while of a gallery of representative notebooks. I'll create an issue to track.

@yuvipanda
Copy link
Member

I can provide some historical context about this 'default' image that might help. I provided this to @jmunroe yesterday and it was helpful I think!

It was initially created as a data8 style image, providing two specific extra packages that were not available in any existing community maintained image: datascience and otter-grader. This was driven by the cloudbank hubs, as well as selling these hubs as an easy way to provide a 'data8 style teaching experience'. None of the existing images at that time provided these packages, so we made our own. So the originating purpose was as a data8 image. I wish I had called this '2i2c-data8-image' at that time, as that would have been a clearer title that would've avoided scope creep. It was defaulted in basehub/values.yaml because at that time, it was being overriden in all research hubs and only useful for smaller educational style hubs.

It then scoped out to add R & RStudio, becoming a bit of a 'demo' hub demonstrating what our hubs can do. It actually does a poor job of this, as RStudio plots were broken for a while (and still may be).

So it kinda became 'default' by accident, and we should currently deem it to be unmaintained. It causes problems now, and will continue to do so.

I think the steps to retire it should focus on splitting out the different roles it plays, and provide different images to play specific roles.

  1. Provide a different image for all the cloudbank hubs. These primarily are 'data8 style' hubs, and are run in partnership with UC Berkeley (which runs data8). I think @sean-morris would like this too, as it allows better experimenting with the datascience & otter-grader packages. https://github.com/berkeley-dsep-infra/datahub/tree/staging/deployments/data8/image is the actual image used by data8 that I built a while ago. We can take inspiration from that but make it easier to maintain too.
  2. Provide a different image for 'demo'ing stuff. Work with @jmunroe to see what this looks like. Add an image with R in it pangeo-data/pangeo-docker-images#163 is an option to consider.
  3. Provide a different base image for primarily R based users. My suggestion is something along the lines of https://rocker-project.org/images/versioned/binder.html, we can work with the rocker project here.
  4. Provide guidance on what kind of image choices are available, and how to make them. This is the intent behind https://jupyterhub-image.guide as well as my JupyterCon talk. The JupyterCon talk will force me to actually make the thing :) Primarily, we need to tell communities when to use a pre-existing image unmodified, when to make their own with repo2docker, when to base off a pre-existing image with changes, and when to start from scratch.
  5. Augment our 'new hub' process to include helping folks pick a base image. Have no 'default' base image, but help people go through (4).

I provided some guidance around this for the smithsonian hub to @jmunroe that might also have been helpful.

@yuvipanda
Copy link
Member

I used one of @pnasrat's scripts to figure out which clusters are currently using this image, and I found it's just the 2i2c and cloudbank clusters. Nobody else.

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Jun 19, 2023
2i2c-org#2336 lists
issues with the current 'default' image, and the history of
how it came to be. The primary reason for its existence was
addressed in 2i2c-org#2435,
so it fundamentally does not really need to exist anymore.

However, there are still 4 hubs using that image, inherited
from the default. This PR does the following:

- Switches to jupyter/scipy-notebook as the 'default', when
  nothing else is specified. Note that because all our hubs
  have an image specified *somewhere* except for these 4
  (see the spreadsheet linked to from 2i2c-org#2582),
  this will actually have *no effect* for existing hubs at
  all! Just means that *future* hubs will get this image as
  the default.
- Explicitly sets the image for the 4 hubs still using the
  old 'default' image. These communities will need to be
  reached out to, and the image changed.

After this, I believe we can archive the old 'default' image!

Ref 2i2c-org#2336
@yuvipanda
Copy link
Member

#2671 removes this as the default, and we've done enough work (particularly in #2435) that this is mostly a noop! I think once that is merged, we should still reach out to the 4 communities still using this and move them to a different, more appropriate image.

@consideRatio
Copy link
Contributor Author

#2671 removes this as the default, and we've done enough work (particularly in #2435) that this is mostly a noop! I think once that is merged, we should still reach out to the 4 communities still using this and move them to a different, more appropriate image.

I consider this as fixed by #2671 and opened #2674 to help track remaining work under a more focused title - feel free to edit!

@github-project-automation github-project-automation bot moved this from Needs Shaping / Refinement to Complete in DEPRECATED Engineering and Product Backlog Jun 20, 2023
@yuvipanda
Copy link
Member

Thanks, @consideRatio!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants