Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Support generating sbom for specific pnpm workspace packages #2574

Open
mc-alt opened this issue Feb 1, 2024 · 4 comments
Open

Support generating sbom for specific pnpm workspace packages #2574

mc-alt opened this issue Feb 1, 2024 · 4 comments
Labels
ecosystem:javascript relating to the javascript ecosystem enhancement New feature or request needs-investigation

Comments

@mc-alt
Copy link

mc-alt commented Feb 1, 2024

What would you like to be added:

(Hi, loving the tool, thanks for all your efforts)

We have a pnpm monorepo using the workspaces functionality.

I am able to prepare an sbom at the top level, but I need to be able to limit the sbom contents to only those materials that go into a specific sub package (this would include materials for the other workspace packages that get included in this package, but not anything belonging to packages that don't get included).

Why is this needed:

I am not currently able to generate an accurate sbom for the different sub packages in my project

Additional context:

# pnpm-workspace.yaml

packages:
  - "packages/*"

Folder structure:

project-root/
|-- node_modules/
|-- package.json
|-- pnpm-lock.yaml
|-- pnpm-workspace.yaml
|-- packages/
    |-- shared-sub-package/
    |   |-- package.json
    |   |-- node_modules/
    |
    |-- sub-package-1/
    |   |-- package.json
    |    |-- node_modules/
    |
    |-- sub-package-2/
    |   |-- package.json
    |    |-- node_modules/

I would like to be able to create separate sboms for sub-package-1 and sub-package-2

Sorry if there is already a way to do this, or some post-processing of the top level sbom I could be doing - I haven't been able to find anything

@mc-alt mc-alt added the enhancement New feature or request label Feb 1, 2024
@kzantow
Copy link
Contributor

kzantow commented Feb 21, 2024

Hi, @mc-alt, there are a few things to mention here, so let me start by suggesting a few options with what is available today: are you able to scan the subdirectories directly? If you wanted separate SBOMs, I'd think just scanning like syft project-root/packages/sub-package-1 could do the trick. If you tried this, I suspect the challenge you ran into is since this is a directory scan it doesn't pick up any package.json information by default; you'd need to enable the javascript-package-cataloger (by using the flag --select-catalogers +javascript-package-cataloger). This isn't perfect, though, as Syft won't do transitive dependencies, only read what it finds on the filesystem -- this is one reason Syft prefers lock files, but what I understand about this setup is that the lock file only exists as the top-level pnpm-lock.yaml so it wouldn't be read when scanning a subdirectory and syft wouldn't necessarily know how to determine which packages to exclude anyway. If, however, you had all the appropriate dependencies in the node_modules installed, these would show up as you expect using the javascript-package-cataloger with another caveat that it will also include build-time dependencies that are downloaded into node_modules. I don't claim to have a lot of familiarity with PNPM; does this option get you close to having something usable?

I could definitely see some sort of enhancements we could implement -- namely looking outside the requested directory to attempt to find some additional pnpm-lock.yaml, node_modules, or other pertinent files. But we haven't done a lot of this and it's a little unclear to me if this should be the default behavior -- in other words: if I scan a directory, did I mean to treat it as a directory or as part of a larger workspace? Another option to explore is to add some sort of --workspace or similar flag that can be used by catalogers that have knowledge of workspaces. PNPM certainly isn't the only one that does something like this and perhaps we can find some commonality across different package managers. The last thing I'd note is that, we've also had some investigation shelling out to tools (such as mvn dependency:tree or a similar pnpm call), but would very much like Syft to avoid doing this as much as possible.

That said, would you be able to provide some public repo(s) with a similar setup that we could have a look at?

@mc-alt
Copy link
Author

mc-alt commented Mar 4, 2024

Thanks for the response

I ended up creating a script that

  • parses the selected package
  • walks its dependencies (including packages from the workspace, and their dependencies etc.)
  • creates a new temp package which represents those dependencies, but flattened
  • copies in the pnpm lock file from the top level of the monorepo
  • performs a pnpm install (which is quite fast because it's just setting up links, and the copy of the lock file makes it use the already established versions)
  • generates the SBOM

Hacky script here:
https://gist.github.com/mc-alt/b0c27dd7621b3ea2f984b43a619877c2

This seems to work for us

Note: this would not be performant if not for the way pnpm's cache and linking approach works

Unfortunately I've been pulled on to other things, but I will try and find time to prepare a public example repository for a setup like ours

@tgerla tgerla added the ecosystem:javascript relating to the javascript ecosystem label May 9, 2024
@wagoodman
Copy link
Contributor

@mc-alt glad you are figured with your script!

I think the interesting thing to take out of this is that there may be something missing in the syft ecosystem in terms of "scanning 1 thing and generating N many SBOMs", which is outside of the scope of syft, but may be hinting at a separate tool that wraps syft. This is similar to (but not the same as) #562 .

The new use case highlight here is "what is the prescription for using syft in a mono repo setting?". This probably warrants some discussion.

@kzantow
Copy link
Contributor

kzantow commented Aug 15, 2024

Another example of something that a user might want to perform a similar scan is a maven multi-module project, where a subdirectory contains something like a deployable web application and a user wants to include parent and sibling directories to properly resolve modules and parent poms with relative paths.

This seems to boil down to separating the set of files included in the source from the target directory to catalog. Today, for example, a user running a directory scan uses: syft my/dir and syft indexes, and scans everything within that directory only. If there was a way to specify a different directory to scan while retaining the larger set of files for context it could be possible do accomplish what's asked for here, with some work in the catalogers to follow relative links. For example: syft /some/root/path --only-catalog sub/dir or syft /some/root/path/sub/dir --root /some/root/path to select a subset of files the cataloging functions when an alternate root is provided.

It seems there may be a path forward for this, but certainly more investigation is needed.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
ecosystem:javascript relating to the javascript ecosystem enhancement New feature or request needs-investigation
Projects
Status: Backlog
Development

No branches or pull requests

5 participants