Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improve identification of conda package files #4083

Open
mjherzog opened this issue Jan 9, 2025 · 4 comments
Open

Improve identification of conda package files #4083

mjherzog opened this issue Jan 9, 2025 · 4 comments

Comments

@mjherzog
Copy link
Member

mjherzog commented Jan 9, 2025

Working with SCTK v32.3.1 (running in SCIO v34.9.3), SCTK does not currently identify the installed files for a conda package in the Resources for_packages field. This data seems to be readily available in a set of .json files located under /conda-meta/ directory where conda is installed - typically opt/conda. The file names are in the format <package name>-<package-version>.json
This pattern is present for both Anaconda and miniconda distributions.

AyanSinhaMahapatra added a commit that referenced this issue Jan 13, 2025
Parse conda metadata JSON manifests and use the package data
and files information present to improve conda package assembly.

Reference: #4083
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
@AyanSinhaMahapatra
Copy link
Member

Before scanning a docker image: docker://continuumio/miniconda3 resulted in:

218 packages
388 dependencies
18355 files

  • 9213 files in a package
  • 9142 files not in a package

With #4089 above:

295 packages
388 dependencies
18355 files

  • 16469 files in a package
  • 1886 files not in a package

So we should do much better conda resource assigning with this PR merged and released to SCIO

@mjherzog
Copy link
Member Author

Excellent

@simrancharde
Copy link

simrancharde commented Jan 18, 2025

Before scanning a docker image: docker://continuumio/miniconda3 resulted in:

218 packages 388 dependencies 18355 files

  • 9213 files in a package
  • 9142 files not in a package

With #4089 above:

295 packages 388 dependencies 18355 files

  • 16469 files in a package
  • 1886 files not in a package

So we should do much better conda resource assigning with this PR merged and released to SCIO

how to solve #4083 issue? we need to add scanner.py fike for_package and then parse it in json file ? anyrhing else

@AyanSinhaMahapatra AyanSinhaMahapatra self-assigned this Jan 20, 2025
@AyanSinhaMahapatra
Copy link
Member

@simrancharde thanks for your interest, but this already has a fix at #4089, could you check out our open good first issues instead: https://github.com/aboutcode-org/scancode-toolkit/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22, this is where we need help mostly.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants