-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
fix: Validate column names in unique()
for empty DataFrames
#20411
fix: Validate column names in unique()
for empty DataFrames
#20411
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #20411 +/- ##
=======================================
Coverage 79.02% 79.03%
=======================================
Files 1563 1563
Lines 220587 220594 +7
Branches 2502 2502
=======================================
+ Hits 174324 174336 +12
+ Misses 45689 45684 -5
Partials 574 574 ☔ View full report in Codecov by Sentry. |
These error should be raised during conversion to IR, not at the implementation level. |
@ritchie46 thanks for having a look. I am new to contributing to |
Ensures that column names in the subset parameter are validated even when the dataframe is empty, maintaining consistent behavior with non-empty dataframes.
Add test cases to verify that unique() properly handles invalid column names in subset parameter for both empty and non-empty dataframes.
c982905
to
7c62814
Compare
Refactored column validation to use a `for` loop with `map_err` instead of `try_for_each`. This enhances readability and maintains consistent error handling when checking if subset columns exist in the dataframe during conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
unique()
for empty DataFramesunique()
for empty DataFrames
This PR addresses an issue where the
unique() function
in Polars does not raise aColumnNotFoundError
when called on an empty DataFrame with an unknown subset of column names. The changes ensure that column names in the subset are validated before proceeding, thereby raising the appropriate exception.Changes Made:
Rust:
UniqueExec
executor to check the subset of column names provided exists in an empty DataFrame.Python Tests:
test_unique_with_bad_subset
intest_unique.py
to handle scenarios whereSubset column name(s) do not exist
.ColumnNotFoundError
with appropriate message.Linked Issue:
Closes #20209
Checklist:
main
branch.pytest
for Python tests.