-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
BUG: pypi dependencies are not consistently identified #1598
Comments
@DennisClark The
The results are loaded in the SCIO database as provided by the toolkit. All the 52 dependencies are generated from the To get those values, a dynamic resolution is required. This is not part of the The problem is that the SCIO codebase/app uses a setup.cfg file to declare dependencies and it is not supported by the python-inspector tool. I've entered #1313 a while ago to raise this problem. Currently running the @AyanSinhaMahapatra what's your take on this? |
As a workaround, I've extracted the list of dependencies into a requirements.txt file, as it should be supported by the python-inspector tool. Using this file as the input for the
|
@TG1999 See the above issues with the |
Similar issue for the DejaCode requirements.txt. We should not fail the whole pipeline for a single problematic entry in the resolution, but rather continue and log errors as ProjectMessage.
|
@tdruez if there is a valid name and a version, there should be a download URL for any pypi package, as you pointed out, the dependency object only has purl fields and a couple other things.
Or we can use a static resolver, if the dependencies are pinned, i.e. they have a version. I'm thinking now that maybe it is better to have a add-on pipeline for dependency resolution than having a seperate pipeline, so it can be plugged in optionally after all the pipelines that detect package/dependency info (and deprecate the |
@AyanSinhaMahapatra could you expand on that? How do you generate a download URL for pypi package from a name/version? |
@tdruez actually it's the source download URL for pypi, see https://github.com/aboutcode-org/scancode-toolkit/blob/develop/src/packagedcode/pypi.py#L2259 where we get the data API url, source download URL (repository_download_url) and repo homepage URL. Note tha it could be also possible to write something minimal to get this from the URL field of API data for a project: https://pypi.org/pypi/scancode-toolkit/32.3.2/json, but this type of processing is currently only done in purldb so could be something we support with |
Let's take the first entry of the discovered dependencies
I do not think we can generate reliable pypi download URLs from that static data.
Isn't it the whole purpose of the |
Actually yeah, I think this code is obsolete because they have changed their URLs a little bit, using the hashes for the download URL. For example in https://files.pythonhosted.org/packages/b1/6a/d16cd64a786c3264266d740279af96113f849e9c5110bcc1891553fe5ae0/aboutcode_hashid-0.2.0.tar.gz they use: The BLAKE2b-256 hash for that file which is But there is no way to get the URL without querying the API so this is not great.
Yeah we do get all the metadata associated with python packages after doing the package resolving and that contains all the URLs, but we don't have inspectors for all the package ecosystems, and python-inspector is also quite limited in support of python package manifest so that's probably not a great general solution either 😅 |
Signed-off-by: tdruez <tdruez@nexb.com>
I used the scan_single_package pipeline to scan the package (source code) available at
https://github.com/aboutcode-org/scancode.io/archive/refs/tags/v34.9.5.tar.gz
and SCIO v34.9.5 found 52 dependencies but many of them are very incomplete, and have what appear to be a valid PURL but do not have any Download URL or License. The problem packages are all from pypi.
It seems very strange that SCIO is able to identify a specific, valid version of these problem packages, which can be found online, but it is not getting a Download URL, suggesting that there are special aspects of the pypi repo that it is not handling very well. Please see the attached scan results.
scancodeio_scio-v34.9.5.json
The text was updated successfully, but these errors were encountered: