Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

fix: Fix more global categorical issues #20547

Merged
merged 3 commits into from
Jan 3, 2025
Merged

fix: Fix more global categorical issues #20547

merged 3 commits into from
Jan 3, 2025

Conversation

ritchie46
Copy link
Member

@@ -114,7 +114,7 @@ jobs:
const previousSizeMB = previousSize !== 'Unknown' ? (previousSize / 1024 / 1024).toFixed(4) : 'Unknown';
const currentSizeMB = currentSize !== 'Unknown' ? (currentSize / 1024 / 1024).toFixed(4) : 'Unknown';

let commentBody = `The previous wheel size was **${previousSizeMB} MB**.\nThe current wheel size after this PR is **${currentSizeMB} MB**.`;
let commentBody = `The uncompressed binary size was **${previousSizeMB} MB**.\nThe uncompressed binary size after this PR is **${currentSizeMB} MB**.`;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

driveby

@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Jan 3, 2025
@mcrumiller
Copy link
Contributor

@ritche the two failing tests in #20514 are fixed with this. Let me do a temp merge with that branch to do a full test.

Copy link
Contributor

github-actions bot commented Jan 3, 2025

The uncompressed binary size was Unknown MB.
The uncompressed binary size after this PR is 1119.4055 MB.

@mcrumiller
Copy link
Contributor

mcrumiller commented Jan 3, 2025

Almost there, test_fast_unique_flag_from_arrow is still failing on global:

import polars as pl
pl.enable_string_cache()
pl.Series(["aaa", "bbb", "ccc"], dtype=pl.Categorical)  # pre-fill cache

# test_fast_unique_flag_from_arrow
df = pl.DataFrame({
    "colB": pl.Series(["1", "2", "3", "4", "5", "5", "5", "5"], dtype=pl.Categorical)
})
filtered = df.to_arrow().filter([True, False, True, True, False, True, True, True])
pl.from_arrow(filtered).select(pl.col("colB").n_unique())
# pyo3_runtime.PanicException: assertion failed: TotalOrd::tot_le(v, self.range.end())

@mcrumiller
Copy link
Contributor

mcrumiller commented Jan 3, 2025

@ritchie46 here is a simpler repro:

import polars as pl
pl.enable_string_cache()
pl.Series(["aaa", "bbb", "ccc"], dtype=pl.Categorical)  # pre-fill cache

s = pl.Series(["1", "2", "3"], dtype=pl.Categorical)
s = s.filter([True, False, True])
s.n_unique()
# pyo3_runtime.PanicException: assertion failed: TotalOrd::tot_le(v, self.range.end())

A few notes:

  • pre-filling the cache with only two items doesn't cause the error.
  • if s has only two items with a True/False mask it doesn't cause error.

Copy link

codecov bot commented Jan 3, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.00%. Comparing base (ca36b66) to head (bd4deaa).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #20547      +/-   ##
==========================================
- Coverage   79.05%   79.00%   -0.06%     
==========================================
  Files        1564     1564              
  Lines      220627   220618       -9     
  Branches     2502     2502              
==========================================
- Hits       174413   174291     -122     
- Misses      45640    45753     +113     
  Partials      574      574              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46
Copy link
Member Author

Got it

@mcrumiller
Copy link
Contributor

@ritchie46 all pass now! Nice.

I will note that I don't think we have any tests that cover the Seen::Large case, which requires > 128 categories. Should I try to whip up a test for that, or deal with it later?

Copy link
Contributor

github-actions bot commented Jan 3, 2025

The uncompressed binary size was Unknown MB.
The uncompressed binary size after this PR is 1119.5188 MB.

@ritchie46
Copy link
Member Author

I removed the uses of that kernel, so I think that's the reason it isn't hit.

@ritchie46 ritchie46 merged commit 1517599 into main Jan 3, 2025
26 checks passed
@ritchie46 ritchie46 deleted the cats branch January 3, 2025 17:32
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants