-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Support scanning filesystems without building an index #3145
Comments
It ends up that using the unindexed approach is much slower on large filesystems relative to the time it takes to index the filesystem. This is because using globs like That being said, there are probably ways forward here, here are some high-level thoughts:
Assuming that one day we'll land 3 in the long-term and 4 in the near-term, I think 4 has the most promise and is really down to: how can we provide that functionality while keeping it simple, flexible and safe. |
Thanks for your input! 4a seems the most promising start, since the OS catalogers already have parts of their paths to If I want the 4a feature implemented, what's the best course of action? Implementing it myself and sending a PR or is someone else interested in working on it? Btw, what's the use case for searching in every path with |
We discussed this issue on our recent team live steam. syft scan dir:/var/lib/dpkg/
✔ Indexed file system /var/lib/dpkg
✔ Cataloged contents 7b17bf219a79afea6fe9e2246f855b09a89f50e773a06e231d2eb91ecad88359
├── ✔ Packages [5,259 packages]
└── ✔ Executables [0 executables]
[0000] WARN no explicit name and version provided for directory source, deriving artifact ID from the given path (which is not ideal)
NAME VERSION TYPE
7zip 23.01+dfsg-11 deb
aardvark-dns 1.4.0-5 deb
accountsservice 23.13.9-2ubuntu6 deb
acl 2.3.2-1build1 deb
acpi 1.7-1.3build1 deb
acpid 1:2.0.34-1ubuntu2 deb
adb 1:34.0.4-1build3 deb
adduser 3.137ubuntu1 deb
adwaita-icon-theme 46.0-1 deb
afnix 3.8.0-1 deb
aglfn 1.7+git20191031.4036a9c-2 deb
aha 0.5.1-3build1 deb
aisleriot 1:3.22.31-1build2 deb
algol68g 3.1.2-1 deb etc. If you know which distro (and, better, which package manager, and the index location), perhaps simply using This isn't to say we're not looking at the other options in an earlier post, just wanted to check. |
Well, I would have hoped that syft would do the distro identification for me, and also the package manager identification and the related databases. Otherwise, I end up duplicating the logic that syft already has, which is not ideal.
Maybe there's an issue that
So I thought I'll just scan
So it seems that I need to scan the entire filesystem if I want to get information about the distro. Ideally I would do a |
To expand on option 1 a bit: the suggestion is instead of indexing the entire filesystem, we could potentially only index/catalog a specific subset of paths as described in this comment. Why would we need this? Because today, Syft catalogers look in specific locations for certain things like linux OS distro info, but if a subdirectory is used, this is considered the "root" of the file scan. So, let's say we scan |
Yes, I've also noticed this problem and I thought the
and get the distro identification because the files in |
How do you feel about exposing a
Then I could do:
to identify the Linux release without building the index of the entire filesystem. |
any updates on this feature request? |
What would you like to be added:
My use case is to scan my host filesystem and get the Linux distro information alongside with the host packages and their versions. Unfortunately, indexing the entire filesystem takes too much time (~7 minutes to call
directorySource.FileResolver
fromdirectory_source.go
).I would like to avoid the indexing step and directly read the necessary files required by the
os
cataloger. I noticed there is anunindexed_directory.go
file in thefileresolver
package, but that's an internal package and the functions there cannot be used.Is there a plan to expose the functionality from
unindexed_directory.go
as a public API? Or is there another way to speed up the scanning process / avoid indexing the entire filesystem beforehand?Why is this needed:
This would bring a significant performance improvement to the filesystem scanning, especially when the filesystem is very large and thus building the entire index would take too much time.
Additional context:
The text was updated successfully, but these errors were encountered: