Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Upgrade pytorch to 2.6, python to 3.12 #7847

Closed
wants to merge 18 commits into from
Closed

Conversation

ebr
Copy link
Member

@ebr ebr commented Mar 26, 2025

Summary

  • upgrade pytorch to 2.6.0 and unpin some of the strict dependency pins to facilitate upgrading co-dependencies. Some depenencies are now unpinned to help continuously stay up to date; uv.lock will be used for reproducibility.
  • allow python 3.12 as we're no longer constrained by the torch version
  • use uv.lock to pin dependencies - this will help with reproducible builds
  • refactor Dockerfile; get rid of multi-stage build; upgrade to python 3.12. This should make the images smaller, the builds faster, and could potentially allow us to utilize docker layer cache in GHA. Notably, changing any application code will only update only the very last layer, and will not result in rebuilding of the entire virtualenv, which means with proper caching changes between images will be tiny and builds will be super fast
  • unpin timm and controlnet-aux - see also Change timm and controlnet-aux pins to fix LLaVA model support #7846 - may need to rebase this PR if that one is merged first
  • add python 3.12 to the test matrix, update docs to recommend python 3.12 and use uv to manage it
  • remove some obsoleted depenencies that were used by the CLI
  • reintroduce GPU_DRIVER build arg in CI container build, as it has apparently been removed, meaning our "rocm" and "cpu" docker images were just cuda images for some time

QA Instructions

  • Usual manual testing - these changes have a large surface area, so we'll rely on the community to help test all edge cases.
  • special attention should be given to testing the Docker container.

Merge Plan

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

@github-actions github-actions bot added CI-CD Continuous integration / Continuous delivery docker Root frontend-deps PRs that change frontend dependencies frontend PRs that change frontend files docs PRs that change docs python-deps PRs that change python dependencies labels Mar 26, 2025
@github-actions github-actions bot added api python PRs that change python files backend PRs that change backend files services PRs that change app services labels Mar 26, 2025
Copy link
Collaborator

@jazzhaiku jazzhaiku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@github-actions github-actions bot added the python-tests PRs that change python tests label Mar 26, 2025
@psychedelicious
Copy link
Collaborator

How do we regenerate uv.lock and how does it interact with pins in pyproject.toml?

@psychedelicious
Copy link
Collaborator

We need to be careful about this change and its interaction w/ launcher.

Though the launcher supports "pinning" versions of invoke to different versions of python, that information is hardcoded and we'll need to do a launcher release.

I suspect attempting to install the version of Invoke that enforces python 3.12 without updating the launcher beforehand will result in errors.

Maybe we should figure out how to make the launcher query PyPI to get the python version required by a package?

@ebr
Copy link
Member Author

ebr commented Mar 27, 2025

How do we regenerate uv.lock and how does it interact with pins in pyproject.toml?

uv lock re-runs dependency resolution and updates the uv.lock file. pins in pyproject.toml still guide the resolution process and can be as restrictive or loose as we want them to be, but the lockfile pins dependencies to the actual hashes in the index. Works the same way as package-lock.json.

we do still need to test how uv.lock interacts with our use of different indexes, such as for rocm and cpu-only pytorch. if there's breakage, there's a "conflicting extras" syntax we might be able to use to work around it: https://docs.astral.sh/uv/reference/settings/#conflicts

@ebr
Copy link
Member Author

ebr commented Mar 27, 2025

We need to be careful about this change and its interaction w/ launcher.

100% agreed, we'd need to enable the launcher to install different python versions

Maybe we should figure out how to make the launcher query PyPI to get the python version required by a package?

depends on whether we want to continue supporting <=3.11 ... but yes, the launcher could easily query pypi for this info:

$ curl -s https://pypi.org/pypi/invokeai/json |jq '.info.requires_python'
"<3.12,>=3.10"

we can also query for any particular release | jq '.releases where releases is a mapping of vX.Y.Z : [ { <pkg_info> }, {...} ]

@psychedelicious
Copy link
Collaborator

I believe the remaining failing python tests CI failures will be resolved by #7826. Some magic pydantic bs.

@psychedelicious
Copy link
Collaborator

psychedelicious commented Mar 31, 2025

Rebased on main & realized I didn't update the uv lockfile correctly before, which caused the typegen checks CI workflow to fail. Fixed that now

@psychedelicious
Copy link
Collaborator

I believe this PR is ready. At least, all CI is passing. We need to check on how it might interact with the launcher and existing installs. So I've marked it as draft to prevent a premature merge.

@ebr ebr force-pushed the ebr/torch-260 branch from 2bdd3f2 to 5edcccc Compare April 2, 2025 13:28
ebr and others added 18 commits April 3, 2025 08:55
…itate upgrading co-dependencies.

we will use uv.lock to ensure reproducibility
… when loading them

In `ObjectSerializerDisk`, we use `torch.load` to load serialized objects from disk. With torch 2.6.0, torch defaults to `weights_only=True`. As a result, torch will raise when attempting to deserialize anything with an unrecognized class.

For example, our `ConditioningFieldData` class is untrusted. When we load conditioning from disk, we will get a runtime error.

Torch provides a method to add trusted classes to an allowlist. This change adds an arg to `ObjectSerializerDisk` to add a list of safe globals to the allowlist and uses it for both `ObjectSerializerDisk` instances.

Note: My first attempt inferred the class from the generic type arg that `ObjectSerializerDisk` accepts, and added that to the allowlist. Unfortunately, this doesn't work.

For example, `ConditioningFieldData` has a `conditionings` attribute that may be one some other untrusted classes representing model-specific conditioning data. So, even if we allowlist `ConditioningFieldData`, loading will fail when torch deserializes the `conditionings` attribute.
In pydantic/pydantic#10029, pydantic made an improvement to its generated JSON schemas (OpenAPI schemas). The previous and new generated schemas both meet the schema spec.

When we parse the OpenAPI schema to generate node templates, we use some typeguard to narrow schema components from generic OpenAPI schema objects to a node field schema objects. The narrower node field schema objects contain extra data.

For example, they contain a `field_kind` attribute that indicates it the field is an input field or output field. These extra attributes are not part of the OpenAPI spec (but the spec allows does allow for this extra data).

This typeguard relied on a pydantic implementation detail. This was changed in the linked pydantic PR, which released with v2.9.0. With the change, our typeguard rejects input field schema objects, causing parsing to fail with errors/warnings like `Unhandled input property` in the JS console.

In the UI, this causes many fields - mostly model fields - to not show up in the workflow editor.

The fix for this is very simple - instead of relying on an implementation detail for the typeguard, we can check if the incoming schema object has any of our invoke-specific extra attributes. Specifically, we now look for the presence of the `field_kind` attribute on the incoming schema object. If it is present, we know we are dealing with an invocation input field and can parse it appropriately.
Ran `uv lock --upgrade-package pydantic`
Doing a dev build so I can test the launcher.
The launcher will query this file to get the pins needed for installation
@psychedelicious
Copy link
Collaborator

Superseded by #7873, which I needed to create as a branch on the origin repo to do testing of CI changes. I did quite a bit of extra work on it.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
api backend PRs that change backend files CI-CD Continuous integration / Continuous delivery DO NOT MERGE docker docs PRs that change docs frontend PRs that change frontend files frontend-deps PRs that change frontend dependencies python PRs that change python files python-deps PRs that change python dependencies python-tests PRs that change python tests Root services PRs that change app services
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants