Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[C++][Parquet] Encryption test files are generated with invalid repetition levels #45073

Closed
adamreeve opened this issue Dec 19, 2024 · 1 comment

Comments

@adamreeve
Copy link
Contributor

Describe the bug, including details regarding any error messages, version, and platform.

As part of adding Parquet encryption to arrow-rs (apache/arrow-rs#6637), @rok and I found that arrow-rs could not read the example files in parquet-testing due to invalid repetition levels. arrow-rs complains that:

Parquet error: first repetition level of batch must be 0

This is due to the int64 list column data being written with the repetition levels flipped, 0 should indicate the start of a new list but 1 is used:

repetition_level = 1; // start of a new record

Related to this, is it also a bug that Arrow would read these files without complaining? If I test reading one of these files into Arrow format with PyArrow, the first leaf value is skipped.

Component(s)

C++, Parquet

@adamreeve adamreeve self-assigned this Dec 19, 2024
pitrou pushed a commit that referenced this issue Jan 6, 2025
…yption test data (#45074)

### Rationale for this change

This makes the test data readable by other Parquet implementations that validate the repetition levels.

### What changes are included in this PR?

* Corrects the generation of encryption test files so that the int64 list columns correctly start lists with repetition level 0.
* Updates the parquet-testing submodule to use the corrected files.

### Are these changes tested?

Yes, covered by existing tests.

### Are there any user-facing changes?

No
* GitHub Issue: #45073

Authored-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@pitrou pitrou added this to the 19.0.0 milestone Jan 6, 2025
@pitrou
Copy link
Member

pitrou commented Jan 6, 2025

Issue resolved by pull request 45074
#45074

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants