-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Supply "depth" information when including relationships #3010
Comments
Related to #572 We want to be able to describe the topology and limitations of any dependency graph that an SBOM is producing. This isn't based on the SBOM as a whole, a language or packaging ecosystem, but really a package at a time based on the evidence we found and what we know about the kind of files that make up that evidence (e.g. package.json vs package-lock.json provide different answers here, which also differ when there is the existence of a populated node_modules dir from a previously run npm install command). I feel that on a per-package bases we're looking for the following description:
So how should we start expressing these topologies? I have an early/incomplete thought about a new field onto the
One question that comes to mind: what about cases where we can partition nodes into direct/indirect dependencies but it is still a flat list (like go.mod)? We can only say While I'm not sold on the specifics of the field, I think I'm becoming more convinced that describing the node and edge qualities separately is more valuable then attempting to combine them into a single enum field. Another consideration is that there are nodes in the graph that cross ecosystems, combining nodes making up dependency graphs in one ecosystem with another dependency graph for another ecosystem. One example of this is with binary packages: these may relate to any number of other ecosystems based on file ownership overlap and dynamic imports (and soon dlopen descriptions) from that binary. So it may not be as simple as having an ecosystem cataloger make a claim on a package about it's node/edge/capability conclusion... this may additionally be a post-cataloging analysis that further annotates these qualities based on the final graph captured. Thoughts to be continued in another post soon... |
From a discussion with the team on this one, we nudged this into a different direction. The conclusive point of discussion was: when asking a single package node information about dependencies it shouldn't attempt to answer anything outside it's immediate dependencies. That is, asking a node to describe the graph isn't really correct. We should instead limit the answer to only the immediate part of the graph that the node is privy to. This somewhat eliminates the need to describe edges in such depth. The current suggestion from the team is to have a single
Furthermore, to open back up a conversation from #572, we should be qualifying edges that are known direct dependencies vs are known transitive (indirect) dependencies. In the common case of direct dependencies, using the edit: see the final names used in the PR description #3402 (comment) |
I'm not sure why I hadn't looked this up before, but I should also note the related SPDX 3 field: https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Vocabularies/RelationshipCompleteness/. This is defined on a one-to-many relationship element and isn't exactly the same thing as we were talking about but is very closely related, I think. |
I've been thinking about this a bit and discussing it with some folks off line, and I don't think we can get one-word enum names to carry all the info. Proposal from weekend discussion is:
What do you all think of these field names @wagoodman @kzantow ? |
This is proposing a little more than a rename -- this is introducing new states. I'm trying to understand how a user would use the information of On your other point, single word enums, yeah I get the same sense that single words wont quite cut it. Ideally there would be a way to indicate with a name that
Maybe leave |
I disagree. It's not much more typing, and it makes the distinction that other types might mix direct and indirect dependencies more obvious.
Is there a compelling reason not to represent the complete state space of direct-only/mixed X complete/incomplete? We don't need to know what exactly downstream users want with each thing. We should just say true things like, "this dependency list is incomplete and mixes direct and indirect dependencies." It seems like the only reason for excluding it is that we don't have any examples of where we'd say it yet (though I think python |
@wagoodman and I talked offline, and we think these values will work:
|
What would you like to be added:
Relationship depth information, when Syft is unable to provide a full transitive dependency graph.
Why is this needed:
One of the data elements mentioned in the NTIA minimum requirements is the depth of relationships. If Syft is able to build an accurate SBOM with a full transitive-dependency graph, that would be ideal, but different scenarios prevent this information from being included or accurately depicting the transitive graph. Some examples are Python
requirements.txt
and Go binary mod information, which only provide a flat list of dependencies. Or binaries which are only directly identified without dependent component information.One solution is to provide an "unknown" indicator that Syft was unable to determine a full transitive dependency graph, or Syft stopped after 5-levels deep resolving online parent references. These can be returned as "unknowns" from catalogers where appropriate to be associated with the file(s) where package graph information originated.
Additional context:
This is likely to be dependent the PR for known unknowns getting merged.
This is a part of #632
The text was updated successfully, but these errors were encountered: