Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Install from lock file always updates when using private sources, regression from 1.1.13 #5360

Closed
3 tasks done
marshallbrekka opened this issue Mar 25, 2022 · 2 comments · Fixed by #5362
Closed
3 tasks done
Labels
kind/bug Something isn't working as expected

Comments

@marshallbrekka
Copy link

marshallbrekka commented Mar 25, 2022

  • I am on the latest Poetry version.
  • I have searched the issues of this repo and believe that this is not a duplicate.
  • If an exception occurs when executing a command, I executed it again in debug mode (-vvv option).
  • OS version and name: Linux RHEL7/MacOS 12.2
  • Poetry version: 1.2.0b1
  • Link of a Gist with the contents of your pyproject.toml file:

Issue

When using a custom source, installs from lock file always perform an "update". For projects with a very large number of dependencies, this can result in an installing taking multiple minutes, vs the seconds that are expected when there hasn't been any changes to the packages in the lock file.

Example custom source:

[[tool.poetry.source]]
default = true
name = "private"
url = "http://nginx/simple/"

What did I expect to see

After installing once, subsequent installs should "skip" since the lock file hasn't changed, and the packages on disk have not changed.

Example success output:

Package operations: 0 installs, 0 updates, 0 removals, 5 skipped

  • Installing certifi (2021.10.8): Skipped for the following reason: Already installed
  • Installing charset-normalizer (2.0.12): Skipped for the following reason: Already installed
  • Installing idna (3.3): Skipped for the following reason: Already installed
  • Installing requests (2.27.1): Skipped for the following reason: Already installed
  • Installing urllib3 (1.26.9): Skipped for the following reason: Already installed

What actually happened

When using a custom source, the packages are re-downloaded every time

Package operations: 0 installs, 5 updates, 0 removals

  • Updating certifi (2021.10.8 /root/.cache/pypoetry/artifacts/fe/14/9c/1e82a2bb063d37c1044bd5e5f6f3ffdfdd7426a29731be760a08a02cd5/certifi-2021.10.8-py2.py3-none-any.whl -> 2021.10.8)
  • Updating charset-normalizer (2.0.12 /root/.cache/pypoetry/artifacts/a6/df/95/76d66bc33680604ac281d41f481c318d56a95dcb5164624ee55c07061d/charset_normalizer-2.0.12-py3-none-any.whl -> 2.0.12)
  • Updating idna (3.3 /root/.cache/pypoetry/artifacts/eb/9a/99/d47fed155e3b06394a5c1de370c70b6c4ba317f0795211c84c2b29396d/idna-3.3-py3-none-any.whl -> 3.3)
  • Updating urllib3 (1.26.9 /root/.cache/pypoetry/artifacts/7f/26/b0/947c49ee27616e9ab638e086083558af9a2bec4d786620f7e45344c91e/urllib3-1.26.9-py2.py3-none-any.whl -> 1.26.9)
  • Updating requests (2.27.1 /root/.cache/pypoetry/artifacts/eb/c5/cf/ae069a7cbec4433e4118c7d593d349ee8fbbcff39437bc01d8e315a4cb/requests-2.27.1-py2.py3-none-any.whl -> 2.27.1)

What is causing this behavior in 1.2.0

I've narrowed this down to a difference in behavior when building the package information for the packages installed on disk, which is triggering an additional conditional in the logic for determining if two packages are the same.

In the is_same_package_as method, if one of the packages has a source_type configured, it runs some additional checks to ensure they are the same

        if self._source_type:
            if self._source_type != other.source_type:
                return False

            if (
                self._source_url or other.source_url
            ) and self._source_url != other.source_url:
                return False

this method is invoked as part of the solver/Transaction, where it is comparing an installed package to a package from the lock file. Notice how if installed_package.source_type is None, this condition would have failed, and the package would be correctly skipped (because the lock file package does have the type of legacy.

                    if result_package.version != installed_package.version or (
                        (
                            installed_package.source_type
                            or result_package.source_type != "legacy"
                        )
                        and not result_package.is_same_package_as(installed_package)
                    ):
                        operations.append(
                            Update(installed_package, result_package, priority=priority)
                        )
                    else:
                        operations.append(
                            Install(result_package).skip("Already installed")
                        )

I manually added some logs to see what these inputs were, and how they differed when using a private source, vs using pypi.
First, here is how the installed packages look, regardless of the source used to install (i trimmed for brevity).

[Package('certifi', '2021.10.8', source_type='file', source_url='/root/.cache/pypoetry/artifacts/fe/14/9c/1e82a2bb063d37c1044bd5e5f6f3ffdfdd7426a29731be760a08a02cd5/certifi-2021.10.8-py2.py3-none-any.whl'), Package('charset-normalizer', '2.0.12', source_type='file', source_url='/root/.cache/pypoetry/artifacts/a6/df/95/76d66bc33680604ac281d41f481c318d56a95dcb5164624ee55c07061d/charset_normalizer-2.0.12-py3-none-any.whl'), ...]

Notice the source_type=file, source_url=....

Here is the list of packages from the lock file, first from an install that used pypi:

[Package('certifi', '2021.10.8'), Package('charset-normalizer', '2.0.12'), Package('idna', '3.3'), Package('requests', '2.27.1'), Package('urllib3', '1.26.9')]

And now that same list of packages from the lock file, but this time installed using my custom source:

[(Package('requests', '2.27.1', source_type='legacy', source_url='http://nginx/simple', source_reference='foo'), 0), (Package('certifi', '2021.10.8', source_type='legacy', source_url='http://nginx/simple', source_reference='foo'), 1), ...]

Again, notice the source_type='legacy', source_url=....

So, when using pypi, the lock file packages don't have a source_type set, and so they therefore aren't triggering the if condition that is causes the package to not be skipped.

What does the behavior look like in 1.1.13, and what changed?

I added some similar logs into the source in 1.1.13, and there are some obvious differences with the data of the packages.

I added two log lines in the solver

                    print("LOCK PACKAGE")
                    print(vars(package))
                    print("INSTALLED PACKAGE")
                    print(vars(pkg))
LOCK PACKAGE
{'_dependency': <Dependency charset-normalizer (==2.0.12)>, '_package': Package('charset-normalizer', '2.0.12', source_type='legacy', source_url='http://nginx/simple', source_reference='foo')}
INSTALLED PACKAGE
{'_pretty_name': 'charset-normalizer', '_name': 'charset-normalizer', '_source_type': None, '_source_url': None, '_source_reference': None, '_source_resolved_reference': None, '_features': frozenset(), '_version': <Version 2.0.12>, '_pretty_version': '2.0.12', 'description': 'The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.', '_authors': [], '_maintainers': [], 'homepage': None, 'repository_url': None, 'documentation_url': None, 'keywords': [], '_license': None, 'readme': None, 'requires': [], 'dev_requires': [], 'extras': {}, 'requires_extras': [], 'category': 'main', 'files': [], 'optional': False, 'classifiers': [], '_python_versions': '*', '_python_constraint': <VersionRange (*)>, '_python_marker': <AnyMarker>, 'platform': None, 'marker': <AnyMarker>, 'root_dir': None, 'develop': True}

Notice how the lock file package still has source_type='legacy', source_url...
BUT, the installed package has _source_type:None, _source_url:None.

You can see later down in the _solve source that there is a similar condition for checking the source_type, where if it is empty it will bail early.

The check in 1.1.13

                    elif pkg.source_type and package.source_type != pkg.source_type:
                        operations.append(Update(pkg, package, priority=depths[i]))

The check in 1.2.0b1

                    if ... or (
                        (
                            installed_package.source_type
                            or result_package.source_type != "legacy"
                        )
                        and not result_package.is_same_package_as(installed_package)
                    ):

The conditions themselves are very similar, and should mostly behave the same, assuming they both exit as soon as they encounter the installed package with source_type None

In summary

I believe the most important change is the non-None source_type field on installed packages.
When None in the 1.1.3 branch, it causes a miss on this condition (pkg is the installed package, package is the lock file package)

                    elif pkg.source_type and package.source_type != pkg.source_type:
                        operations.append(Update(pkg, package, priority=depths[i]))

In the 1.2.0b1 branch, if the installed package source had been None, we would have failed the none-empty check for installed_package.source_type, and the subsequent check for result_package.source_type != "legacy" would have also failed, as the custom sources DO have their source_type set to legacy.

                    if result_package.version != installed_package.version or (
                        (
                            installed_package.source_type
                            or result_package.source_type != "legacy"
                        )
                        and not result_package.is_same_package_as(installed_package)
                    ):

You can see that in 1.1.13, the InstalledRepository class does not set source_type=file

I believe this is the commit that made that change in the 1.2.X branch

If you follow the invocation paths in 1.2.0b1, we can trace back to where the installed packages are constructed from (sorry for the long list, I hope its helpful)

        if "archive_info" in url_reference:
            # File or URL distribution
            if url_reference["url"].startswith("file:"):
                # File distribution
                source_type = "file"
                source_url = url_to_path(url_reference["url"]).as_posix()

Reproducing this

If you need help creating a fully reproducible test case let me know, I used bandersnatch to create a minimal pypi mirror with requests, and a few other libraries, and then used docker compose with an nginx container to serve the repo, along with an interactive python container for invoking poetry against the project.

@marshallbrekka marshallbrekka added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Mar 25, 2022
@radoering
Copy link
Member

That would have been ugly, if we hadn't noticed before the release.
Thank you for reporting the regression and the detailed analysis. 👍

Copy link

github-actions bot commented Mar 1, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 1, 2024
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
kind/bug Something isn't working as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants