Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Extract VM images #16

Open
pombredanne opened this issue Feb 11, 2021 · 3 comments
Open

Extract VM images #16

pombredanne opened this issue Feb 11, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@pombredanne
Copy link
Member

We should be able to extract VMDK, VDI and similar qcow images, as well as ext2, ext3 and ext4 (and ideally some squashfs too?)

@pombredanne pombredanne added the enhancement New feature or request label Feb 11, 2021
@pombredanne
Copy link
Member Author

Note that one immediate application would be in scancode.io rootfs pipeline

@pombredanne
Copy link
Member Author

I played with a few tools and there is one that shines brightly by @rwmjones and that's https://libguestfs.org/
It works beautifully using the tar-out format.

pombredanne added a commit that referenced this issue Apr 6, 2021
THis is a two step extraction using libguestfs to get a FS to a tarball
which is then extractcode normally (hence dealing with links, device
files and other permission oddities as a side effect).

We support VDI (VirtualBox, VMDK (VMware) and QCOW2 (QEMU)

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Apr 22, 2021
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jun 1, 2021
@tdruez
Copy link

tdruez commented Jun 1, 2021

The --all-formats is required for the new extraction features but not documented.

More importantly, the all_formats=False was added as an argument of the extract_file function but is not used, see https://github.com/nexB/extractcode/blob/main/src/extractcode/extract.py#L230

Also, why would we want such option in the first place?

pombredanne added a commit that referenced this issue Jun 2, 2021
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jun 2, 2021
- This is to extract a single archive file of any supported format
  non recursively.
- Also apply minor formatting and refactoring for readability
- Improve docstrings
- Add tests

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jun 2, 2021
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants