Skip to content

parquet: Add tests for pruning on Int8/Int16/Int64 columns #9778

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 3 commits into from
Mar 31, 2024

Conversation

progval
Copy link
Contributor

@progval progval commented Mar 24, 2024

[Do not merge this PR, it highlights a bug in Int8 and Int16 columns, through correct_bloom_filters: false. See #9779 for a discussion]

Which issue does this PR close?

Closes #9777.

Rationale for this change

What changes are included in this PR?

Generalizes the Int32 tests, using a macro.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Mar 24, 2024
@progval progval changed the title parquet: Add tests for Bloom filters on Int8/Int16/Int64 columns parquet: Add tests for pruning on Int8/Int16/Int64 columns Mar 24, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @progval

[Do not merge this PR, it highlights a bug in Int8 and Int16 columns, through correct_bloom_filters: false. See https://github.com//issues/9779 for a discussion]

What would you think about merging this PR (but keeping #9779 open)?

Then the fix for #9779 could just update the tests to set correct_bloom_filters?

@progval
Copy link
Contributor Author

progval commented Mar 24, 2024

What would you think about merging this PR

Makes sense, I guess. I was hoping you would find how to solve #9779, because I can't :)

(but keeping #9779 open)?

Well, it should be removed when fixed, there is no point to keeping it.

@alamb
Copy link
Contributor

alamb commented Mar 24, 2024

What would you think about merging this PR

Makes sense, I guess. I was hoping you would find how to solve #9779, because I can't :)

(but keeping #9779 open)?

Well, it should be removed when fixed, there is no point to keeping it.

Agreed.

For others following along, the issue is upstream in parquet-rs: #9779 (comment)

I think what we should do with this PR is to update the comments explaining that the tests now demonstrate there is a bug in DataFusion and link to the upstream issue.

Once the upstream issue is fixed, we can then update the tests / close the false nagatives bug #9779

@alamb
Copy link
Contributor

alamb commented Mar 28, 2024

As a follow up here, I plan to add the comments explaining what is going on here (and that there are tests that show incorrect results, that are tracked by a ticket) and then merge it in.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @progval -- I somehow missed that you had updated this PR. It looks really good to me.

Thanks again. Very much appreciated

@alamb
Copy link
Contributor

alamb commented Mar 30, 2024

I merged up from main to make sure we have a clean CI run and then I think we can merge this one in

@alamb alamb merged commit 2cb6f73 into apache:main Mar 31, 2024
@alamb
Copy link
Contributor

alamb commented Mar 31, 2024

Thanks again @progval

@progval progval deleted the int-bloom-tests branch March 31, 2024 09:12
Lordworms pushed a commit to Lordworms/arrow-datafusion that referenced this pull request Apr 1, 2024
* parquet: Add tests for Bloom filters on Int8/Int16/Int64 columns

* Document int_tests macro

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

core/tests/parquet/row_group_pruning.rs is missing tests for Int8/Int16/Int64 columns
2 participants