Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add status or details for package #880

Open
pombredanne opened this issue Aug 20, 2023 · 7 comments
Open

Add status or details for package #880

pombredanne opened this issue Aug 20, 2023 · 7 comments

Comments

@pombredanne
Copy link
Member

When I see a package in a scan, I would like to know where we got it from.

  • Was it obtained from a scan of the code?
  • Or was it matched to the purlDB? and which one?
  • Or was it found from mapped file to corresponding source code in a d2d pipeline?
  • Or does it come from some other procedure (like a dependency resolution)
  • It this a skiny, partial package (with only PURL data?) or more a more comprehensive set of metadata?
  • Or does it come from a curated ABOUT file or some other curated data source?
@tdruez
Copy link
Contributor

tdruez commented Aug 21, 2023

@pombredanne Could you provide the status values for each case?

@pombredanne
Copy link
Member Author

pombredanne commented Aug 21, 2023

Could you provide the status values for each case?

This requires a bit of thinking, first what would the data structure be.
@DennisClark ping, your help is much welcomed to design how we could track how and based on what clue we created a package during a ScanCode.io pipeline... like a scan or purldb match, or both, or a manifest or an SBOM...

@mjherzog
Copy link
Member

We also need to plan ahead for when we may have status codes entered by a person
The current status codes from SCIO are already confusing because they span multiple concepts:

  • application-package or system-package
  • file type or content - e.g. ignored-empty and other ignored codes
  • scan status - e.g. scanned
    Now is the time to design the use of status codes for automated and for manual processing

@DennisClark
Copy link
Member

DennisClark commented Aug 21, 2023

Two new fields, I think:

pkg_origin list of values.
The database or process that identified the package.

  • code-scan
  • purlDB match
  • d2d-map
  • dependency
  • about-file

is_scanned yes/no/unknown
Indicates if the package code was scanned by ScanCode Toolkit.

@DennisClark
Copy link
Member

DennisClark commented Aug 21, 2023

perhaps one more:

sctk_version
the version of ScanCode Toolkit used to scan the package.

tdruez added a commit that referenced this issue Oct 11, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
@pombredanne
Copy link
Member Author

I think we need to revisit this as we may have tried to pack too many things in one field:

  • an indication of how we obtain the package origin and metadata.... there can be more than one way (matching, scanning, etc.) therefore this may be more a "log/history"-like field
  • some indication on a package metadata of the status of the completion and quality of data and possible issues/TODO that need review and all this needs some design: this could be some score about the breadth and depth and quality of a package data of some status.

@AyanSinhaMahapatra
Copy link
Member

AyanSinhaMahapatra commented May 22, 2024

From a discussion with @pombredanne

This should be best implemented by a status/origin log which is a list of status values (similar to how we have detection logs in LicenseDetection objects)

This is a list and not a single value because, a package can have multiple data sources and origin, like in the following flow (keeping in mind future plans too):

  1. A package is created from a scan
  2. package is enriched by data from purldb/full scans in purldb
  3. package data is added for vulneribilities/quality

Suggesting the values (not exhaustive, please add and update) for this based on the list above by @DennisClark:

  • code-scan
  • purldb-match-archive
  • purldb-match-file
  • purldb-metadata (metadata fetched by purl)
  • static-resolved (resolved from lockfiles/metadata)
  • dynamic-resolved (resolved by inspectors like python inspector)
  • d2d-map
  • about-file

Suggestions on attribute name: origin/origin_log

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants