Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Docker base images should be included in the BOM #1199

Open
captn3m0 opened this issue Sep 9, 2022 · 15 comments
Open

Docker base images should be included in the BOM #1199

captn3m0 opened this issue Sep 9, 2022 · 15 comments
Labels
blocked Progress is being stopped by something enhancement New feature or request needs-investigation

Comments

@captn3m0
Copy link

captn3m0 commented Sep 9, 2022

What would you like to be added: A simple docker image with the following Dockerfile:

FROM php:7.4-cli

COPY scan.php /

should result in a SBOM that includes the base image as a component:

pkg:docker/library/php@7.4-cli

Why is this needed: A container image base image is also a "dependency". For popular base-images, this carries a lot of information, and this can be used to recursively look up other dependencies (that might have been included in the build process, but might not be part of the final image).

I'm not sure how feasible this is, considering docker doesn't seem to store the base image names, but this would be a great addition.

@captn3m0 captn3m0 added the enhancement New feature or request label Sep 9, 2022
@spiffcs spiffcs added this to OSS Sep 19, 2022
@kzantow
Copy link
Contributor

kzantow commented Oct 6, 2022

Hi @captn3m0 -- are you really looking to get php:7.4 properly cataloged with this request, and it's a duplicate of #1197? Or is this actually a request to get the base image container added as a component?

@captn3m0
Copy link
Author

captn3m0 commented Oct 7, 2022

PHP here is just an example - this is a request for the latter (base images are ingredients, and should be included in a SBOM).

@captn3m0
Copy link
Author

captn3m0 commented Oct 7, 2022

Investigated this a bit. Docker does not return the base image ID, just the relevant statements from the docker base image. For eg, the amazoncorretto:8u342-alpine3.16-jre image includes the following information about the upstream:

ADD file:2a949686d9886ac7c10582a6c29116fd29d3077d02755e87e111870d63607725 in /

The corresponding dockerfile has:

FROM alpine:3.16

And the hash can actually be found in the alpine:3.6 image: https://github.com/docker-library/repo-info/blob/master/repos/alpine/remote/3.16.md#alpine316---linux-amd64

I'm thinking about generating such common hashes, and publishing them on Rekor so this would get picked up via #1159.

The intended mapping here would be

2a949686d9886ac7c10582a6c29116fd29d3077d02755e87e111870d63607725 ->
  pkg:docker/library/alpine@3.16

Which files should be looked up could be left to syft, or perhaps I can publish a bloom-filter that helps with quick evaluation for that locally. (Is this a "relevant" base image file).

@spiffcs spiffcs moved this to Parking Lot (Comments or Progress) in OSS Oct 13, 2022
@khan-a1
Copy link

khan-a1 commented Feb 3, 2023

Hi team, any update on this feature request? it will be great if docker images can be added to SBOM

@tgerla tgerla removed the status in OSS Sep 14, 2023
@tgerla
Copy link
Contributor

tgerla commented Sep 14, 2023

Hi @khan-a1 and @captn3m0, sorry for the very long delay replying. We would like to understand a bit better your use case for including a reference to a docker image in the SBOM itself. Are you familiar with the different scoping options you can specify, with --scope?

We also have an open issue discussing ideas to expand the different scoping selections: #15

Happy to re-engage on this issue and figure out how to move forward. Would you be able to join our community meeting at some point? It might be easier to talk things over live. https://github.com/anchore/syft/#join-our-community-meetings

@tgerla tgerla moved this to Awaiting Response in OSS Sep 14, 2023
@captn3m0
Copy link
Author

captn3m0 commented Sep 14, 2023 via email

@captn3m0
Copy link
Author

I've looked at the scoping options, and the various feature requests for that, and that doesn't fit this use-case.

An SBOM should be an actual artifact of all the components that went in building the final image. Docker base images are a relevant artifact imo.

The primary usecase for this comes from current limitations around Syft's binary matching capabilities, which result in not everything in base images being detected. If anything is installed in the base image outside a "package" - this is very common behavior for official base images - Syft cannot detect it easily.

In such cases, the name of the base image itself is a huge helper in the SBOM. At endoflife.date, we provide EOL information for various products alongside their PURLs. These include PURLs for docker images. See these search results. For example, for composer, we provide the following PURLs:

-   purl: pkg:composer/composer/composer
-   repology: php:composer # this expands to various packages listed at https://repology.org/project/php:composer/versions
-   purl: pkg:docker/library/composer
-   purl: pkg:github/composer/composer

Of these, the pkg:docker one is the relevant one. Say I have a PHP application that uses the official composer base image:

FROM composer:2.6.2
ADD . /src

If you were to build such a dockerfile, Syft would not include the version of composer in the SBOM, because Syft currently does not detect composer. The official composer dockerfile relies on a bash installer for composer, which drops a few binaries in the image. I've reported such issues in the past, but I believe the binary classifier can only get us so far.

In such a scenario, since the SBOM doesn't include it, the usage (potentially EOL) goes unnoticed and undetected.

However, if Syft were to report the base image used here (pkg:docker/library/composer@2.6.2), it would provide a secondary means of such detection.

tl;dr: Providing base images in the SBOM acts as a decent fallback, and includes important information (such as repository names, organization name, image version/tag) that is relevant to security teams.

@noqcks
Copy link
Contributor

noqcks commented Nov 9, 2023

@captn3m0 can this issue be closed now after #2267 has been merged, or did you have more in mind for this issue?

@spiffcs
Copy link
Contributor

spiffcs commented Jan 18, 2024

@captn3m0 What else do you have in mind? Now that we have the annotations do we want to try and build the base image "package" into the other formats? What's your end Ideal state for syft in how it surfaces base images now that #2267 has been merged?

For best results so consumers of the document can find the base image via relationships we should use:
https://spdx.github.io/spdx-spec/v2.3/relationships-between-SPDX-elements/

Cyclonedx:
https://cyclonedx.org/docs/1.5/json/#metadata_component

The other outstanding question is are the annotations the best source of truth for discovering this information? Can there be multiple images that would build the full chain from image:primary -> image:base1 -> image:base2 -> scratch

The properties of the annotations also need more information to properly identify the image. ubuntu:xx.xx today can be different from ubuntu:xx.xx one month ago. We need both the digest and the version to pin down the exact image used.

@captn3m0
Copy link
Author

What's your end Ideal state for syft in how it surfaces base images

A PURL that points to the correct base image. While #2294 is great, those are not components. Anything that is outside of the "components" part of the BOM will not get picked up by any other tooling.

Ideally, this would use the OCI PURL type, with the optional tag attribute (https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#oci).

I like the idea of using relationships to document this better, but I'm not sure which of the available relationships will work best here. Base images can be counted as build dependencies, composition primitives, or even ancestors. Hard to pick something that works best for all cases.

Can there be multiple images that would build the full chain from image:primary -> image:base1 -> image:base2 -> scratch

Yes, this is another reason I'd prefer using components as well, since there the BOM could all of the known base images (although finding them is a much harder problem).

We need both the digest and the version to pin down the exact image used.

This should be solvable with oci PURLs. Sample PURL from the spec, that includes both digest and tag: pkg:oci/static@sha256%3A244fd47e07d10?repository_url=gcr.io/distroless/static&tag=latest

@wagoodman wagoodman removed the status in OSS Feb 7, 2024
@spiffcs spiffcs added the blocked Progress is being stopped by something label Jul 29, 2024
@spiffcs
Copy link
Contributor

spiffcs commented Jul 29, 2024

I've added the blocked label to this. There is still currently no agreed on trusted space for the base images SBOM or package information to be accessed from.

Annotations is not where the syft project wants to pull this data from as it's too reliant on the user input as far as "trusting" what the contents of a given base is.

I've added needs discussion to this for our livestream this week so that the team can discuss the future of this:

https://youtube.com/live/T9OkSGu23j4?feature=share

@wagoodman
Copy link
Contributor

Note for later: is there any OCI attestations for base images in docker hub that we could leverage here?

@willmurphyscode
Copy link
Contributor

What we need mechanism for going from the layer digest of an image to the tag or tags that point at it. If such data source existed, we'd be open to making Syft query it at runtime, similar to querying maven central to identify a JAR by its digest. However, right now, we don't know of such a data source.

The needs-investigation label means someone should go an look for a mechanism to sort of revers the lookup that Docker does when it sees FROM node:lts-alpine3.19 and decides which bytes to download. It might be possible this dataset exists somewhere, or that we can compute it.

@bureado
Copy link
Contributor

bureado commented Aug 5, 2024

In case it helps the ongoing research:

  1. https://stackoverflow.com/a/67927907
  2. https://docs.docker.com/build/attestations/

@willmurphyscode willmurphyscode moved this to Backlog in OSS Oct 14, 2024
@willmurphyscode
Copy link
Contributor

I'm adding this to the backlog column with a needs-investigation label, to reflect the state of this issue:

  1. We would like to add this
  2. We do not know a good way to add this

If you have ideas or want to try something, let us know!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
blocked Progress is being stopped by something enhancement New feature or request needs-investigation
Projects
Status: Backlog
Development

No branches or pull requests

10 participants