-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[refurb] Count codepoints not bytes for slice-to-remove-prefix-or-suffix (FURB188)
#13631
Conversation
|
Another subtlety worth testing is strings with surrogates. In Python, each surrogate counts as 1 and surrogate pairs are not special so they count as 2; for example, |
TIL @dscorbett - neat! Added a test for this, and it appears to be handled correctly (I think this happens in the guts of the parser, so by the time I'm looking at |
I think the reason it works is that Ruff’s representation of a Python string as a Rust string replaces surrogates with replacement characters. That is fine for counting the code points but could be a problem for other rules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks. I only have two nit comments.
.and_then(ast::Int::as_u32) | ||
.and_then(|x| usize::try_from(x).ok()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest converting to a u64
considering that you have to use usize::try_from
anyways (for 32 bit platforms)
.and_then(ast::Int::as_u32) | |
.and_then(|x| usize::try_from(x).ok()) | |
.and_then(ast::Int::as_u64) | |
.and_then(|x| usize::try_from(x).ok()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or you could consider adding a as_usize
method to ast::Int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
went with the latter
// Only support prefix removal for size at most `u32::MAX` | ||
.and_then(ast::Int::as_u32) | ||
.and_then(|x| usize::try_from(x).ok()) | ||
.is_some_and(|x| x == string_val.to_str().chars().count()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.is_some_and(|x| x == string_val.to_str().chars().count()), | |
.is_some_and(|x| x == string_val.chars().count()), |
@@ -370,7 +372,8 @@ fn affix_matches_slice_bound(data: &RemoveAffixData, semantic: &SemanticModel) - | |||
value | |||
.as_int() | |||
.and_then(ast::Int::as_u32) | |||
.is_some_and(|x| x == string_val.to_str().text_len().to_u32()) | |||
.and_then(|x| usize::try_from(x).ok()) | |||
.is_some_and(|x| x == string_val.to_str().chars().count()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.is_some_and(|x| x == string_val.to_str().chars().count()) | |
.is_some_and(|x| x == string_val.chars().count()) |
This PR fixes the calculation of string length for the purposes of verifying when to suggest
removeprefix
/removesuffix
(FURB188). Before, we usedtext_len
which was counting bytes rather than codepoints (chars) and therefore disagreed with Python'slen
for non-ASCII text.Closes #13620