Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

fix: Validate column names in unique() for empty DataFrames #20411

Merged
merged 6 commits into from
Dec 29, 2024

Conversation

Biswas-N
Copy link
Contributor

This PR addresses an issue where the unique() function in Polars does not raise a ColumnNotFoundError when called on an empty DataFrame with an unknown subset of column names. The changes ensure that column names in the subset are validated before proceeding, thereby raising the appropriate exception.

Changes Made:

  1. Rust:

    • Added validation logic in UniqueExec executor to check the subset of column names provided exists in an empty DataFrame.
  2. Python Tests:

    • Introduced a new test method, test_unique_with_bad_subset in test_unique.py to handle scenarios where Subset column name(s) do not exist.
    • Ensured invalid subset(s) raise a ColumnNotFoundError with appropriate message.

Linked Issue:

Closes #20209

Checklist:

  • Changes rebased against the latest main branch.
  • All new and existing tests pass.
  • Verified using pytest for Python tests.
  • Code adheres to the repository's contribution guidelines.

@github-actions github-actions bot added fix Bug fix rust Related to Rust Polars labels Dec 23, 2024
@Biswas-N Biswas-N marked this pull request as ready for review December 23, 2024 06:33
Copy link

codecov bot commented Dec 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.03%. Comparing base (ef32c9a) to head (263b12b).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #20411   +/-   ##
=======================================
  Coverage   79.02%   79.03%           
=======================================
  Files        1563     1563           
  Lines      220587   220594    +7     
  Branches     2502     2502           
=======================================
+ Hits       174324   174336   +12     
+ Misses      45689    45684    -5     
  Partials      574      574           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46
Copy link
Member

These error should be raised during conversion to IR, not at the implementation level.

@Biswas-N
Copy link
Contributor Author

@ritchie46 thanks for having a look. I am new to contributing to pola-rs, could you help me by pointing at some code that raises issues during conversion to IR. It could help me understand pola-rs way of doing things.

@Biswas-N Biswas-N requested a review from ritchie46 December 24, 2024 14:49
Ensures that column names in the subset parameter are validated even
when the dataframe is empty, maintaining consistent behavior with
non-empty dataframes.
Add test cases to verify that unique() properly handles invalid column
names in subset parameter for both empty and non-empty dataframes.
@Biswas-N Biswas-N force-pushed the fix/unique_raises_for_bad_subset branch from c982905 to 7c62814 Compare December 27, 2024 21:05
Refactored column validation to use a `for` loop with `map_err` instead
of `try_for_each`. This enhances readability and maintains consistent
error handling when checking if subset columns exist in the dataframe
during conversion.
Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ritchie46 ritchie46 merged commit b430f64 into pola-rs:main Dec 29, 2024
26 checks passed
@ritchie46 ritchie46 changed the title fix(rust): Validate column names in unique() for empty DataFrames fix: Validate column names in unique() for empty DataFrames Dec 29, 2024
@github-actions github-actions bot added the python Related to Python Polars label Dec 29, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame.unique() should raise if any subset column doesn't exist on empty frame.
2 participants