Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Option to write uncompressed catalogs #419

Closed
aulemahal opened this issue Dec 16, 2021 · 2 comments · Fixed by #421
Closed

Option to write uncompressed catalogs #419

aulemahal opened this issue Dec 16, 2021 · 2 comments · Fixed by #421
Labels
enhancement Issues that are found to be a reasonable candidate feature additions

Comments

@aulemahal
Copy link
Contributor

Is your feature request related to a problem? Please describe.
The default csv file written by cat.serialize is compressed with gzip. This is quite cool for large catalogs, but it looses some interoperability with low-level editing tools (text editors) as it's now a binary file.

Describe the solution you'd like
An option to control the compression of the catalog file. It could even be interesting for some to control the codec used? Like a compression argument, the same as pd.read_csv, but without the "infer" option, since the catalog filename is not passed explicitly. So one of { ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}.

Describe alternatives you've considered
Serializing the catalog myself with cat.df.to_csv, but that's just not ideal.

@aulemahal
Copy link
Contributor Author

aulemahal commented Dec 16, 2021

EDIT: In fact, there could simply be a **csv_kwargs passed directly to df.to_csv?
I am ok with changing the default value of compression and injecting index=False, I just think it could be interesting to have finer control.

@andersy005 andersy005 added the enhancement Issues that are found to be a reasonable candidate feature additions label Dec 16, 2021
@andersy005
Copy link
Member

EDIT: In fact, there could simply be a **csv_kwargs passed directly to df.to_csv?
I just think it could be interesting to have finer control.

👍🏽

How about adding a to_csv_kwargs argument to

@pydantic.validate_arguments
def serialize(
self,
name: pydantic.StrictStr,
directory: typing.Union[pydantic.DirectoryPath, pydantic.StrictStr] = None,
catalog_type: str = 'dict',

and passing these keyword arguments down to

def save(self, name: str, *, directory: str = None, catalog_type: str = 'dict') -> None:

?

@andersy005 andersy005 added this to Xdev Dec 17, 2021
@andersy005 andersy005 moved this to 🌳 Todo in Xdev Dec 17, 2021
@andersy005 andersy005 moved this from 🌳 Todo to ▶ In Progress in Xdev Dec 17, 2021
Repository owner moved this from ▶ In Progress to ✅ Done in Xdev Dec 17, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement Issues that are found to be a reasonable candidate feature additions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants