Support to stack multiple Syft SBOM files into a single one #617

hectorj2f · 2021-11-05T12:40:54Z

What would you like to be added:
CycloneDX supports merging multiple SBOM files into a single one. However Syft SBOM does not support merging multiple Syft SBOM files. One way could be done by manual merging the artifacts however that is limited to SBOM files whose distro is the same for all the different files.

We would like that Syft SBOM can be merged into a single one without lacking of accuracy to identify vulnerabilities.

Why is this needed:

We generate a bundle and store it in a registry, but the artifacts linked to that bundle are composed of multiple SBOMS from each component that forms that bundle. When using CycloneDX, those SBOM files can be merged into one without losing accuracy to identify vulnerabilities. This can be achieved because the purl contains the arch (or distro) injected into the value. We would like to find a solution to satisfy the same scenario with Syft SBOM format.

Additional context:

The text was updated successfully, but these errors were encountered:

luhring · 2021-11-07T22:28:43Z

Would JSON Lines work for this situation? The idea would be to keep the change pretty simple, and support processing concatenated instances of the existing Syft JSON output.

One way could be done by manual merging the artifacts however that is limited to SBOM files whose distro is the same for all the different files.

❤️ I like the idea of applying this constraint, especially to the first iteration.

Would we have to support differing schema versions among the received JSON objects? Perhaps a constraint similar to the above could be applied here (i.e. require homogeneous JSON objects).

Would Syft's existing explicit sbom: be how users also specify this new merged SBOM file format? Or should a new scheme be introduced? In a somewhat analogous situation, the jq tool does require the user's command to be adapted to handle JSON Lines, by adding the --slurp flag. I'm not proposing that Syft add a CLI flag for this, but we should think about how broadly sbom: can be used to set Syft's expectations on input format.

wagoodman · 2021-11-08T16:27:53Z

@hectorj2f I 100% agree with the observations on the distro section and how we could leverage reading of pURLs here to help (there could be a lot of overlap with anchore/grype#481 (comment) here). This helps grype be able to match with pURLs generically on each package and further support input from other SBOM tools in the future.

I think source is the last field that is a problem (both on input SBOMs and output grype JSON documents).

Do we require a single input document to be a "merge" of multiple documents? Or can it be a concatenation of multiple documents? (this is essentially the same question @luhring has here on JSONL being a possible direction)
Do we want to preserve multiple source input fields? Or is it OK for this to be lossy?
Do we want the output to be able to reflect the matches that belong to specific sources? Or more generically, if we have an input that represents multiple inputs, do we want a "merged" output? Or multiple separated outputs?

My take on possible answers here:

Could we support both? I have an observation for the case of merging documents into one: the input the caller crafts already takes on the burden of making an input SBOM that is lossy (where there may be no source section, or one of these values is selected).
I would prefer that grype not be the decision maker for what input fields are lossy when it comes to the source section. This implies for possibly changing the grype output JSON shape to account for preserving fields (or to group together multiple grype output JSON documents).
I don't have enough information to answer here other than "what might be easier for a consumer" perspective. It is always easier to deal with a single document over multiple output documents. However, in the case of JSONL as input I can also see it being reasonable to output as many documents as you accepted as input and that be "expected".

spiffcs · 2021-11-08T19:01:40Z

If we're getting into the business of merging .syft.json SBOM I have a question regarding our new ID generation process.

Example:

Here is the stereoscope artifact as seen generated under the grype SBOM.

  {
   "id": "17857980119146230574",
   "name": "github.com/anchore/stereoscope",
   "version": "v0.0.0-20211024152658-003132a67c10",
   "type": "go-module",
   "foundBy": "",
   "locations": [
    {
     "path": "/grype"
    }
   ],
   "licenses": [],
   "language": "go",
   "cpes": [
    "cpe:2.3:a:anchore:stereoscope:v0.0.0-20211024152658-003132a67c10:*:*:*:*:*:*:*"
   ],
   "purl": "pkg:golang/github.com/anchore/stereoscope@v0.0.0-20211024152658-003132a67c10",
   "metadataType": "GolangBinMetadata",
   "metadata": {
    "GoCompiledVersion": "go1.16.9",
    "H1Digest": "h1:BmK/CgNlu+X9foWK2ZAIehxzYws760AZSGVNamQZpiw="
   }
  },

Here is the same artifact as seen generated under the syft SBOM

  {
   "id": "9509475418339205315",
   "name": "github.com/anchore/stereoscope",
   "version": "v0.0.0-20211024152658-003132a67c10",
   "type": "go-module",
   "foundBy": "",
   "locations": [
    {
     "path": "/syft"
    }
   ],
   "licenses": [],
   "language": "go",
   "cpes": [
    "cpe:2.3:a:anchore:stereoscope:v0.0.0-20211024152658-003132a67c10:*:*:*:*:*:*:*"
   ],
   "purl": "pkg:golang/github.com/anchore/stereoscope@v0.0.0-20211024152658-003132a67c10",
   "metadataType": "GolangBinMetadata",
   "metadata": {
    "GoCompiledVersion": "go1.16.9",
    "H1Digest": "h1:BmK/CgNlu+X9foWK2ZAIehxzYws760AZSGVNamQZpiw="
   }
  },

The ID are different in this case because the locations field contains different paths for where the artifact was discovered.

In the merged SBOM do we want to try and account for these being basically identical and merging the locations field under a single artifact, or do we want to keep things totally separate?

This also adds more weight to the mergeByLocations functionality we considered adding in this previous PR where an Artifact can be discovered at multiple locations making the merged document much more efficient to consume.

#595

Or... more generally. How could we merge artifacts that have the same ID?

spiffcs · 2021-11-08T21:25:37Z

Also just validated - Removing the meta top-level fields from *.syft.json still produces a usable json structure.

grype only needs:

{
  artifacts: []
}

Valid Input to grype grype test.syft.json

// test.syft.json
{
 "artifacts": [
  {
   "id": "11941904915510831158",
   "name": "github.com/docker/cli",
   "version": "v0.0.0-20191017083524-a8ff7f821017",
   "type": "go-module",
   "foundBy": "",
   "locations": [
    {
     "path": "/syft",
     "layerID": "sha256:f935f03ffddb44eccea2457b11b6ffc04a336a76a57648dc78518b79b1c523b4"
    }
   ],
   "licenses": [],
   "language": "go",
   "cpes": [
    "cpe:2.3:a:docker:cli:v0.0.0-20191017083524-a8ff7f821017:*:*:*:*:*:*:*"
   ],
   "purl": "pkg:golang/github.com/docker/cli@v0.0.0-20191017083524-a8ff7f821017",
   "metadataType": "GolangBinMetadata",
   "metadata": {
    "GoCompiledVersion": "go1.16.9",
    "H1Digest": "h1:2HQmlpI3yI9deH18Q6xiSOIjXD4sLI55Y/gfpa8/558="
   }
  },
  {
   "id": "3512196908125928147",
   "name": "github.com/docker/distribution",
   "version": "v2.7.1+incompatible",
   "type": "go-module",
   "foundBy": "",
   "locations": [
    {
     "path": "/syft",
     "layerID": "sha256:f935f03ffddb44eccea2457b11b6ffc04a336a76a57648dc78518b79b1c523b4"
    }
   ],
   "licenses": [],
   "language": "go",
   "cpes": [
    "cpe:2.3:a:docker:distribution:v2.7.1+incompatible:*:*:*:*:*:*:*"
   ],
   "purl": "pkg:golang/github.com/docker/distribution@v2.7.1+incompatible",
   "metadataType": "GolangBinMetadata",
   "metadata": {
    "GoCompiledVersion": "go1.16.9",
    "H1Digest": "h1:a5mlkVzth6W5A4fOsS3D2EO5BUmsJpcB+cRlLU7cSug="
   }
  },
  {
   "id": "4268275803743278876",
   "name": "github.com/docker/docker",
   "version": "v17.12.0-ce-rc1.0.20200309214505-aa6a9891b09c+incompatible",
   "type": "go-module",
   "foundBy": "",
   "locations": [
    {
     "path": "/syft",
     "layerID": "sha256:f935f03ffddb44eccea2457b11b6ffc04a336a76a57648dc78518b79b1c523b4"
    }
   ],
   "licenses": [],
   "language": "go",
   "cpes": [
    "cpe:2.3:a:docker:docker:v17.12.0-ce-rc1.0.20200309214505-aa6a9891b09c+incompatible:*:*:*:*:*:*:*"
   ],
   "purl": "pkg:golang/github.com/docker/docker@v17.12.0-ce-rc1.0.20200309214505-aa6a9891b09c+incompatible",
   "metadataType": "GolangBinMetadata",
   "metadata": {
    "GoCompiledVersion": "go1.16.9",
    "H1Digest": "h1:G2hY8RD7jB9QaSmcb8mYEIg8QbEvVAB7se8+lXHZHfg="
   }
  }
 ]
}

I'll follow here with a PR so the json presenter can parse this as well. It only works for table output.

luhring · 2021-11-08T21:32:46Z

In the merged SBOM do we want to try and account for these being basically identical and merging the locations field under a single artifact, or do we want to keep things totally separate?

Are we asking this from the perspective of Grype? Does Grype have a need to see a given package only a single time? My two cents would be not to introduce any deduplication — particularly in a first iteration — unless we need to.

luhring · 2021-11-08T21:34:47Z

Open question: Should this issue move to anchore/grype?

I know the implementation might involve Syft's decoders, but it seems like a Grype feature that we're talking about from the user's perspective. Is that a correct read on this?

spiffcs · 2021-11-09T16:29:35Z

Another question I found - Is schema a required field for merge, or can/should we update the validator code so that if schema is not detected then we still select off artifacts existing?

https://github.com/anchore/syft/blob/main/internal/formats/syftjson/validator.go

This would make the JSON API a little simpler since then both syft and grype would need just artifacts: []

We're at 1.1.0 so until we decide to move to 2.0.0 all changes should be backward compatible

hectorj2f · 2021-12-20T12:41:52Z

I know the implementation might involve Syft's decoders, but it seems like a Grype feature that we're talking about from the user's perspective. Is that a correct read on this?

In my opinion, it won't be bad a cli command to merge or append multiple sboms files into one.

Dentrax · 2022-03-28T12:10:04Z

We (@developer-guy) have been trying to figure it out how we can merge multiple sbom results into single one and finally ended up here.

We're generating 2 SBOMs by issuing following commands:

$ syft packages dir:"$SCAN_DIRECTORY" -o cyclonedx-json > sbom-results-build-time.json
$ syft packages docker:"$DOCKER_IMAGE" -o cyclonedx-json > sbom-results-container-image.json

We found the following workaround:

      jq -s 'def deepmerge(a;b):
      reduce b[] as $item (a;
        reduce ($item | keys_unsorted[]) as $key (.;
          $item[$key] as $val | ($val | type) as $type | .[$key] = if ($type == "object") then
            deepmerge({}; [if .[$key] == null then {} else .[$key] end, $val])
          elif ($type == "array") then
            (.[$key] + $val | unique)
          else
            $val
          end)
        );
      deepmerge({}; .)' sbom-results-build-time.json sbom-results-container-image.json > sbom-results-merged.json

Ref: https://e.printstacktrace.blog/merging-json-files-recursively-in-the-command-line

It just works!

It would be nice to have a subcommand, merge, for example:

$ syft merge sbom-results-build-time.json sbom-results-container-image.json -o sbom-results-merged.json

We assumed here both SBOMs are same type. In case one of is different, then we should probably convert it: #563

cc @luhring

spiffcs · 2022-03-30T13:45:57Z

@Dentrax thanks so much for the follow-up here with the CLI version of merging SBOM.

I like the suggestion of syft taking in two files and doing the merge itself.

Is there another interaction we want to explore here where syft generates the SBOM from the artifacts and merges in the same command?

syft --merge -o json build-time container-image > sbom-results-merged.json

Dentrax · 2022-06-16T06:18:38Z

How should we proceed here? There is another way to handle this by accepting multiple args in packages subcommand instead of introduce new merge command:

$ syft packages dir:"$SCAN_DIRECTORY" docker:"$DOCKER_IMAGE" ... -o cyclonedx-json

Also merge sub command is nice to have:

$ syft merge foo.json bar.json baz.json ... -o json > merged.json

Dentrax · 2022-06-22T10:43:49Z

Kind ping 🤞 @luhring @spiffcs

wagoodman · 2022-06-22T20:24:25Z

...pong!

[is] there another way to handle this by accepting multiple args in packages subcommand instead of introduce new merge command?

This is something we're interested in doing (see #562).

The issue isn't really how to scan multiple targets at a time, and how to specify it on the CLI --this is the easy part. The real problem is how should an SBOM describe multiple sources? Today we have a singular source block that describes what was cataloged such that it can be assumed that all other parts of the SBOM were found within what's described in the source block.

If we scan multiple targets then the SBOM will need to additionally convey:
a. all sources scanned (maybe change source to a list?)
b. which packages/files belong to which sources

Item b can be done in a few ways:

keep all artifacts (packages, files, etc) from all sources in a single container, add individual package-to-source relationships to figure which artifact came from which source

Pros:

artifacts and sources are highly decoupled

Cons:

this will add a considerable number of relations to even "simple" SBOMs
consumers wanting to ask questions about packages from source X would need to sift through all of the relationship data to answer even simple questions, which seems really complicated (and worse, that complication is being pushed onto the consumer... ideally syft would handle this if it were possible)

partition artifacts based on which source it is from.

Pros:

scales better than relationships

Cons:

cannot easily ask questions about artifacts across source barriers . I feel that this is a common use case for merged SBOMs... or to put it another way, if we are separating SBOMs out logically by which document they reside in then whats the difference between this option and taring up multiple documents?

Open questions:

how do we support SPDX/CycloneDX in these cases? Potentially output multiple files (which would defeat the point I feel)?

Previous questions that have been answered by this point:

at one point when addressing this for OS packages that depended on distro information there wasn't a path to scan multiple targets and still retain the correct distro info for the set of packages it would affect in vulnerability matching in grype (e.g. the distro version and which namespace for RHEL packages to look up). We've now solved this by including this extra information in the pURL and we use pURLs to extract this information in grype while doing vulnerability matching.

I think that we need to answer how will the format address these problems / use cases before diving into how the command will look / feel.

kzantow · 2022-08-30T21:52:09Z

FYI -- we are actively discussing this and for anyone interested it would be a great topic for the next community meeting, which will be Sep. 1 at 12 noon ET: https://calendar.google.com/calendar/u/0/r?cid=Y182OTM4dGt0MjRtajI0NnNzOThiaGtnM29qNEBncm91cC5jYWxlbmRhci5nb29nbGUuY29t

Dentrax · 2022-09-30T13:13:47Z

Kind ping 🤞

kzantow · 2022-09-30T13:30:09Z

@Dentrax we are actively working on this, it's a complicated change and will need to get support implemented in Grype, too before we get everything merged 👍

kzantow · 2022-12-06T15:55:43Z

An update here: we are currently putting this on hold, as there have been a number of discussions which highlight making this a particularly challenging thing to implement properly. We do plan on revisiting this in the future when some of the roadblocks are lifted.

spiffcs · 2023-02-09T21:21:17Z

Just a small update on this issue - We've marked this as blocked since there is not a clear way forward on the representation of multiple SBOM across all the formats syft supports.

There is currently some work being done on representing SBOM connections via relationships that are surfaced when running the SBOM cataloger. There is a clear way forward to represent this internally with the syft SBOM format, but larger questions exist on how to do this via the different standard formats of SPDX and CycloneDX.

If the SBOMS all have the same source node there is a straightforward path for all 3 formats given that we can associate all packages to the same source node with the correct relationships.

The path forward via the SBOM cataloger currently combines all packages with NO source relationships which we believe is not the desired end state of how the data should be represented or what is being asked for on this thread.

I'm dropping this into the backlog of our team's board while we work through the details surrounding the graph representation of what multiple sources for multiple sbom look like.

hectorj2f added the enhancement New feature or request label Nov 5, 2021

hectorj2f changed the title ~~Support for stacks of Syft SBOM files merged into a single one~~ Support to stack multiple Syft SBOM files into a single one Nov 6, 2021

luhring assigned spiffcs Nov 9, 2021

wagoodman mentioned this issue May 4, 2022

BuildKit + SBOM integration moby/buildkit#2773

Closed

spiffcs added this to OSS May 31, 2022

spiffcs moved this to Triage (Comments or Progress Made) in OSS May 31, 2022

mdeicas mentioned this issue Aug 11, 2022

Discovery of SBOMs on the Rekor transparency log #1159

Closed

kzantow assigned kzantow and unassigned spiffcs Nov 8, 2022

kzantow moved this from Parking Lot (Comments or Progress) to In Progress (Actively Resolving) in OSS Nov 8, 2022

kzantow moved this from In Progress (Actively Resolving) to Backlog (Pulled Forward for Priority) in OSS Dec 6, 2022

kzantow removed their assignment Dec 6, 2022

kzantow added the multiple-sources Issues that are dependent on supporting multiple sources label Dec 6, 2022

spiffcs moved this from Backlog (Pulled Forward for Priority) to Parking Lot (Comments or Progress) in OSS Dec 13, 2022

tgerla removed the status in OSS Jan 31, 2023

tgerla moved this to Backlog in OSS Feb 2, 2023

tgerla removed the status in OSS Feb 2, 2023

spiffcs added the blocked Progress is being stopped by something label Feb 9, 2023

kzantow moved this to Backlog in OSS Feb 27, 2023

kzantow mentioned this issue Apr 27, 2023

Support SBOM creation for container image indexes #1683

Open

wagoodman mentioned this issue Jul 27, 2023

Add ability to combine multiple SBOMs into a single file #1711

Open

ihavespoons mentioned this issue Jan 16, 2024

Example needed. How to use rules_syft to generate SBOM to attest multiarchitecture container image? ihavespoons/rules_syft#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support to stack multiple Syft SBOM files into a single one #617

Support to stack multiple Syft SBOM files into a single one #617

hectorj2f commented Nov 5, 2021

luhring commented Nov 7, 2021

wagoodman commented Nov 8, 2021 •

edited

Loading

spiffcs commented Nov 8, 2021 •

edited

Loading

spiffcs commented Nov 8, 2021 •

edited

Loading

luhring commented Nov 8, 2021 •

edited

Loading

luhring commented Nov 8, 2021

spiffcs commented Nov 9, 2021 •

edited

Loading

hectorj2f commented Dec 20, 2021

Dentrax commented Mar 28, 2022

spiffcs commented Mar 30, 2022

Dentrax commented Jun 16, 2022

Dentrax commented Jun 22, 2022

wagoodman commented Jun 22, 2022 •

edited

Loading

kzantow commented Aug 30, 2022

Dentrax commented Sep 30, 2022

kzantow commented Sep 30, 2022

kzantow commented Dec 6, 2022

spiffcs commented Feb 9, 2023

Support to stack multiple Syft SBOM files into a single one #617

Support to stack multiple Syft SBOM files into a single one #617

Comments

hectorj2f commented Nov 5, 2021

luhring commented Nov 7, 2021

wagoodman commented Nov 8, 2021 • edited Loading

spiffcs commented Nov 8, 2021 • edited Loading

spiffcs commented Nov 8, 2021 • edited Loading

luhring commented Nov 8, 2021 • edited Loading

luhring commented Nov 8, 2021

spiffcs commented Nov 9, 2021 • edited Loading

hectorj2f commented Dec 20, 2021

Dentrax commented Mar 28, 2022

spiffcs commented Mar 30, 2022

Dentrax commented Jun 16, 2022

Dentrax commented Jun 22, 2022

wagoodman commented Jun 22, 2022 • edited Loading

kzantow commented Aug 30, 2022

Dentrax commented Sep 30, 2022

kzantow commented Sep 30, 2022

kzantow commented Dec 6, 2022

spiffcs commented Feb 9, 2023

wagoodman commented Nov 8, 2021 •

edited

Loading

spiffcs commented Nov 8, 2021 •

edited

Loading

spiffcs commented Nov 8, 2021 •

edited

Loading

luhring commented Nov 8, 2021 •

edited

Loading

spiffcs commented Nov 9, 2021 •

edited

Loading

wagoodman commented Jun 22, 2022 •

edited

Loading