Option to write uncompressed catalogs #419

aulemahal · 2021-12-16T18:18:32Z

Is your feature request related to a problem? Please describe.
The default csv file written by cat.serialize is compressed with gzip. This is quite cool for large catalogs, but it looses some interoperability with low-level editing tools (text editors) as it's now a binary file.

Describe the solution you'd like
An option to control the compression of the catalog file. It could even be interesting for some to control the codec used? Like a compression argument, the same as pd.read_csv, but without the "infer" option, since the catalog filename is not passed explicitly. So one of { ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}.

Describe alternatives you've considered
Serializing the catalog myself with cat.df.to_csv, but that's just not ideal.

The text was updated successfully, but these errors were encountered:

aulemahal · 2021-12-16T18:22:01Z

EDIT: In fact, there could simply be a **csv_kwargs passed directly to df.to_csv?
I am ok with changing the default value of compression and injecting index=False, I just think it could be interesting to have finer control.

andersy005 · 2021-12-16T18:52:57Z

EDIT: In fact, there could simply be a **csv_kwargs passed directly to df.to_csv?
I just think it could be interesting to have finer control.

👍🏽

How about adding a to_csv_kwargs argument to

intake-esm/intake_esm/core.py

Lines 362 to 367 in 40fe3a7

    
           @pydantic.validate_arguments 
        
           def serialize( 
        
               self, 
        
               name: pydantic.StrictStr, 
        
               directory: typing.Union[pydantic.DirectoryPath, pydantic.StrictStr] = None, 
        
               catalog_type: str = 'dict',

and passing these keyword arguments down to

intake-esm/intake_esm/cat.py

Line 118 in 40fe3a7

    
           def save(self, name: str, *, directory: str = None, catalog_type: str = 'dict') -> None:

?

andersy005 added the enhancement Issues that are found to be a reasonable candidate feature additions label Dec 16, 2021

andersy005 added this to Xdev Dec 17, 2021

andersy005 moved this to 🌳 Todo in Xdev Dec 17, 2021

andersy005 mentioned this issue Dec 17, 2021

Expose pd.DataFrame.to_csv and json.dump keyword arguments #421

Merged

3 tasks

andersy005 moved this from 🌳 Todo to ▶ In Progress in Xdev Dec 17, 2021

andersy005 closed this as completed in #421 Dec 17, 2021

Repository owner moved this from ▶ In Progress to ✅ Done in Xdev Dec 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to write uncompressed catalogs #419

Option to write uncompressed catalogs #419

aulemahal commented Dec 16, 2021

aulemahal commented Dec 16, 2021 •

edited

Loading

andersy005 commented Dec 16, 2021

Option to write uncompressed catalogs #419

Option to write uncompressed catalogs #419

Comments

aulemahal commented Dec 16, 2021

aulemahal commented Dec 16, 2021 • edited Loading

andersy005 commented Dec 16, 2021

aulemahal commented Dec 16, 2021 •

edited

Loading