Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Consider tracking provenance of build-time dependencies #15

Open
ogrisel opened this issue Feb 6, 2025 · 7 comments
Open

Consider tracking provenance of build-time dependencies #15

ogrisel opened this issue Feb 6, 2025 · 7 comments

Comments

@ogrisel
Copy link

ogrisel commented Feb 6, 2025

The list of https://github.com/psf/sboms-for-python-packages?tab=readme-ov-file#phantom-dependencies does not explicitly list build time dependencies for inclusion in the SBOM. I am thinking of:

  • compiler toolchains (gcc, llvm, msvc, gfortran...) (including linkers);
  • wheel repair tools or their system dependencies (such as patchelf which is a system dependency of auditwheel);
  • cibuildwheel itself;
  • even the manylinux docker image;
  • github actions components (possibly including container images) or other versioned CI components used to CI workflows to build the wheel.

Tracking the name, digest and version number of those build-time tools could be useful even if the wheel does not directly include files coming from those tools. In particular, this would allow tracing the downstream impact of a tampered build tool that has been discovered to inject malware hidden into compiled extensions shipped in the generated Python package (see compiler backdoors).

Furthermore, it would be helpful to track build tool versioning info to help automate independent reproducible build attempts as a proactive way to detect previously unreported tampered components in a software supply chain.

@jkowalleck
Copy link
Contributor

jkowalleck commented Feb 6, 2025

like MBOM: https://en.wikipedia.org/wiki/Manufacturing_bill_of_materials ?
like CycloneDX formulation feature: https://cyclonedx.org/docs/1.6/json/#formulation ?
like SPDX's {sorry, no idea how they do it - but i know they have this somehwere.}

@sethmlarson
Copy link
Member

This can certainly be done, it would be a feature request to projects which are building Python packages. Likely a good place to start would be cibuildwheel?

If I'm understanding the CycloneDX forumation correctly, this would take a snapshot of the entire environment of packages/tools for the cibuildwheel container and then list them all under:

  • top-level package -> formulations -> workflows -> {taskTypes: [build], resourceReferences: [component-ids...]}
  • and then add all the snapshotted packages/tools under top-level-package -> formulations -> workflows -> components

Does that match your expectations?

@ogrisel
Copy link
Author

ogrisel commented Feb 7, 2025

This can certainly be done, it would be a feature request to projects which are building Python packages. Likely a good place to start would be cibuildwheel?

I believe so. Do you want me to do it? Or do you prefer to handle that discussion yourself?

For context, I am exploring the current state of tooling and specs to see what's doable to achieve traceable and bitwise reproducible builds for scikit-learn, its dependencies and other scientific Python packages that ship native extensions in their wheels:

If I'm understanding the CycloneDX forumation correctly, this would take a snapshot of the entire environment of packages/tools for the cibuildwheel container and then list them all under [...].

I am no CycloneDX expert, but that sounds reasonable. Maybe we can try to find other software communities who are more mature in that respect and see how they use the CycloneDX formulation field in practice (assuming some do).

EDIT: I could not find any usage example for the "formulation" field of CycloneDX on GitHub: https://github.com/search?q=path%3A*.cdx.json+formulation&type=code

@ogrisel
Copy link
Author

ogrisel commented Feb 7, 2025

@jkowalleck I find the MBOM concept as described in Wikipedia a bit fuzzy. CycloneDX's formulation, on the other hand, seems relevant. I am not familiar with the SPDX spec either, so I am not sure if what I described above can be naturally mapped to it with an explicit distinction between runtime and build time dependencies.

@sethmlarson
Copy link
Member

@ogrisel The PEP itself isn't yet provisional, it will hopefully be sometime next week. But we can certainly get the ball rolling. I can create a few top-level issues there.

@jkowalleck
Copy link
Contributor

jkowalleck commented Feb 7, 2025

here is a guide on MBOM: https://github.com/CycloneDX/guides/tree/main/MBOM/en
It is pretty fresh. so if you have any concerns, feel free to open a ticket for that.

THere are also guides on SBOM and OBOM.

I am not sure if what I described above can be naturally mapped to it with an explicit distinction between runtime and build time dependencies.

Runtime-depednencies -> SBOM/OBOM
BuildTime -> MBOM
Connect an MBOM to an SBOM: Bom-Link

@ogrisel
Copy link
Author

ogrisel commented Feb 19, 2025

Note that SPDX also has a build profile that could serve this purpose:

https://github.com/spdx/spdx-3-model/blob/8caa3fb87d53b356b211a64aca5f6901e14be441/model/Build/Build.md

But the interpretation of the environment and parameter fields depends on a specific instances of buildType and I couldn't find a meaningful usage example on github.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants