Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Parquet][Python] Read and write file/column metadata using pandas attrs #28558

Open
asfimport opened this issue May 18, 2021 · 1 comment
Open

Comments

@asfimport
Copy link
Collaborator

Related: pandas-dev/pandas#20521

What the general thoughts are to use DataFrame.attrs and Series.attrs for reading and writing metadata to/from parquet?

For example, here is how the metadata would be written:

pdf = pandas.DataFrame({"a": [1]})
pdf.attrs = {"name": "my custom dataset"}
pdf.a.attrs = {"long_name": "Description about data", "nodata": -1, "units": "metre"}
pdf.to_parquet("file.parquet")

Then, when loading in the data:

pdf = pandas.read_parquet("file.parquet")
pdf.attrs

{"name": "my custom dataset"}

pdf.a.attrs

{"long_name": "Description about data", "nodata": -1, "units": "metre"}

 

 

Reporter: Alan Snow

Note: This issue was originally created as ARROW-12823. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Alan Snow:
Seems like writing metadata could happen in get_column_metadata

Possibly add an "attrs" item so it doesn't conflict with "metadata".

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

1 participant