Remove hardcoded compression type and add python-snappy package #7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is fixing following 2 issues:
emod-api/emod_api/serialization/SerializedPopulation.py
Line 75 in 99896c1
Traceback (most recent call last): File "C:\Program Files\JetBrains\PyCharm 2023.3.5\plugins\python\helpers\pydev\pydevd.py", line 1534, in _exec pydev_imports.execfile(file, globals, locals) # execute the script ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\JetBrains\PyCharm 2023.3.5\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "C:\git_emodhub\emodpy-malaria\josh_post_process\Assets\python\dtk_post_process.py", line 837, in <module> application(output_folder="output") File "C:\git_emodhub\emodpy-malaria\josh_post_process\Assets\python\dtk_post_process.py", line 829, in application modify_serialized_files() File "C:\git_emodhub\emodpy-malaria\josh_post_process\Assets\python\dtk_post_process.py", line 552, in modify_serialized_files adjust_genomes_to_balance_per_locus(input_file, output_file, node1_AF=node1_AF, node2_AF=node2_AF) File "C:\git_emodhub\emodpy-malaria\josh_post_process\Assets\python\dtk_post_process.py", line 803, in adjust_genomes_to_balance_per_locus pop.write(output_file) File "C:\emodhub-malaria_2.0.2\Lib\site-packages\emod_api\serialization\SerializedPopulation.py", line 86, in write self.dtk.compression = dft.LZ4 ^^^^^^^^^^^^^^^^^^^^ File "C:\emodhub-malaria_2.0.2\Lib\site-packages\emod_api\serialization\dtkFileTools.py", line 158, in compression self.__set_compression__(engine.upper()) File "C:\emodhub-malaria_2.0.2\Lib\site-packages\emod_api\serialization\dtkFileTools.py", line 226, in __set_compression__ chunk = compress(self.contents[index], engine) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\emodhub-malaria_2.0.2\Lib\site-packages\emod_api\serialization\dtkFileTools.py", line 35, in compress return __engines__[engine].compress(data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\emodhub-malaria_2.0.2\Lib\site-packages\emod_api\serialization\dtkFileSupport.py", line 28, in compress return lz4.block.compress(data if type(data) is bytes else data.encode()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ OverflowError: Input too large for LZ4 API
/etc/singularity/ exists; cleanup by system administrator is not complete (see https://apptainer.org/docs/admin/latest/singularity_migration.html) Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/emod_api/serialization/dtkFileTools.py", line 109, in __getitem__ contents = self.__parent__.contents[index] File "/usr/local/lib/python3.9/site-packages/emod_api/serialization/dtkFileTools.py", line 80, in __getitem__ data = str(uncompress(self.__parent__.chunks[index], self.__parent__.compression), 'utf-8') File "/usr/local/lib/python3.9/site-packages/emod_api/serialization/dtkFileTools.py", line 28, in uncompress return __engines__[engine].uncompress(data) File "/usr/local/lib/python3.9/site-packages/emod_api/serialization/dtkFileSupport.py", line 47, in uncompress raise UserWarning("Snappy [de]compression not available.") UserWarning: Snappy [de]compression not available.
I think the reason compression type switching between lz4 and snappy is due to this:
https://github.com/InstituteforDiseaseModeling/DtkTrunk/blob/f8dc7f8417927e3e6543facb124f9a193096e313/Eradication/SerializedPopulation.cpp#L313
By removing the hardcoded compression type in the write function, it will respect the original compression type in the DTK content