Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Proposal: Read/calculate and store uncompressed size of compressed datasets #19280

Open
natefoo opened this issue Dec 6, 2024 · 0 comments
Open

Comments

@natefoo
Copy link
Member

natefoo commented Dec 6, 2024

A lot of TPV rules are based off input size, but input size is often a lie due to compression. This is likely a significant cause of over or underallocation of memory for jobs. I opened an issue over on the TPV repo but suspect this should probably be done internally in Galaxy especially if we don't want TPV to be expecting data to exist on disk.

As noted in the issue, the uncompressed size of gzipped data can be stored in the last 4 bytes of the file, but it is sometimes empty, in which case the only way to get it is to decompress.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants