Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Remove hardcoded compression type and add python-snappy package #7

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

shchen-idmod
Copy link

@shchen-idmod shchen-idmod commented Feb 12, 2025

This PR is fixing following 2 issues:

  1. Write function will fail if dtk content is too big with lz4 compression
    def write(self, output_file: str = "my_sp_file.dtk"):

    Traceback (most recent call last): File "C:\Program Files\JetBrains\PyCharm 2023.3.5\plugins\python\helpers\pydev\pydevd.py", line 1534, in _exec pydev_imports.execfile(file, globals, locals) # execute the script ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\JetBrains\PyCharm 2023.3.5\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "C:\git_emodhub\emodpy-malaria\josh_post_process\Assets\python\dtk_post_process.py", line 837, in <module> application(output_folder="output") File "C:\git_emodhub\emodpy-malaria\josh_post_process\Assets\python\dtk_post_process.py", line 829, in application modify_serialized_files() File "C:\git_emodhub\emodpy-malaria\josh_post_process\Assets\python\dtk_post_process.py", line 552, in modify_serialized_files adjust_genomes_to_balance_per_locus(input_file, output_file, node1_AF=node1_AF, node2_AF=node2_AF) File "C:\git_emodhub\emodpy-malaria\josh_post_process\Assets\python\dtk_post_process.py", line 803, in adjust_genomes_to_balance_per_locus pop.write(output_file) File "C:\emodhub-malaria_2.0.2\Lib\site-packages\emod_api\serialization\SerializedPopulation.py", line 86, in write self.dtk.compression = dft.LZ4 ^^^^^^^^^^^^^^^^^^^^ File "C:\emodhub-malaria_2.0.2\Lib\site-packages\emod_api\serialization\dtkFileTools.py", line 158, in compression self.__set_compression__(engine.upper()) File "C:\emodhub-malaria_2.0.2\Lib\site-packages\emod_api\serialization\dtkFileTools.py", line 226, in __set_compression__ chunk = compress(self.contents[index], engine) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\emodhub-malaria_2.0.2\Lib\site-packages\emod_api\serialization\dtkFileTools.py", line 35, in compress return __engines__[engine].compress(data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\emodhub-malaria_2.0.2\Lib\site-packages\emod_api\serialization\dtkFileSupport.py", line 28, in compress return lz4.block.compress(data if type(data) is bytes else data.encode()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ OverflowError: Input too large for LZ4 API
  2. Missed required python-snappy lib when dtk file is great than certain value (repro it with sif file which contain emod-api):
    /etc/singularity/ exists; cleanup by system administrator is not complete (see https://apptainer.org/docs/admin/latest/singularity_migration.html) Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/emod_api/serialization/dtkFileTools.py", line 109, in __getitem__ contents = self.__parent__.contents[index] File "/usr/local/lib/python3.9/site-packages/emod_api/serialization/dtkFileTools.py", line 80, in __getitem__ data = str(uncompress(self.__parent__.chunks[index], self.__parent__.compression), 'utf-8') File "/usr/local/lib/python3.9/site-packages/emod_api/serialization/dtkFileTools.py", line 28, in uncompress return __engines__[engine].uncompress(data) File "/usr/local/lib/python3.9/site-packages/emod_api/serialization/dtkFileSupport.py", line 47, in uncompress raise UserWarning("Snappy [de]compression not available.") UserWarning: Snappy [de]compression not available.

I think the reason compression type switching between lz4 and snappy is due to this:
https://github.com/InstituteforDiseaseModeling/DtkTrunk/blob/f8dc7f8417927e3e6543facb124f9a193096e313/Eradication/SerializedPopulation.cpp#L313

By removing the hardcoded compression type in the write function, it will respect the original compression type in the DTK content

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant