[`flake8-implicit-str-concat`] Normalize octals before merging concatenated strings in `single-line-implicit-string-concatenation` (`ISC001`) #13118

dylwil3 · 2024-08-26T22:16:47Z

This PR pads the last octal (if there is one) to three digits before concatenating strings in the fix for ISC001.

For example: "\12""0" is fixed to "\0120".

Closes #12936.

github-actions · 2024-08-26T22:30:42Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

AlexWaygood

This is very clever. Good job! Two small points:

The .to_string() call on line 175 will clone the underlying data, which could be quite costly -- and is unnecessary in the happy path, since most strings don't have octal escapes in them. We can avoid this by using a Cow -- something like this? (The diff is relative to your branch)

--- a/crates/ruff_linter/src/rules/flake8_implicit_str_concat/rules/implicit.rs
+++ b/crates/ruff_linter/src/rules/flake8_implicit_str_concat/rules/implicit.rs
@@ -1,3 +1,5 @@
+use std::borrow::Cow;
+
 use itertools::Itertools;
 
 use ruff_diagnostics::{Diagnostic, Edit, Fix, FixAvailability, Violation};
@@ -173,11 +175,11 @@ fn concatenate_strings(a_range: TextRange, b_range: TextRange, locator: &Locator
     }
 
     let mut a_body =
-        a_text[a_leading_quote.len()..a_text.len() - a_trailing_quote.len()].to_string();
+        Cow::Borrowed(&a_text[a_leading_quote.len()..a_text.len() - a_trailing_quote.len()]);
     let b_body = &b_text[b_leading_quote.len()..b_text.len() - b_trailing_quote.len()];
 
     if a_leading_quote.find(['r', 'R']).is_none() {
-        a_body = normalize_ending_octal(&a_body);
+        normalize_ending_octal(&mut a_body);
     }
 
     let concatenation = format!("{a_leading_quote}{a_body}{b_body}{a_trailing_quote}");
@@ -191,10 +193,10 @@ fn concatenate_strings(a_range: TextRange, b_range: TextRange, locator: &Locator
 
 /// Pads an octal at the end of the string
 /// to three digits, if necessary.
-fn normalize_ending_octal(text: &str) -> String {
+fn normalize_ending_octal(text: &mut Cow<'_, str>) {
     // Early return for short strings
     if text.len() < 2 {
-        return text.to_string();
+        return;
     }
 
     let mut rev_bytes = text.bytes().rev();
@@ -202,20 +204,19 @@ fn normalize_ending_octal(text: &str) -> String {
         // "\y" -> "\00y"
         if has_odd_consecutive_backslashes(&rev_bytes) {
             let prefix = &text[..text.len() - 2];
-            return format!("{prefix}\\00{}", last_byte as char);
+            *text = Cow::Owned(format!("{prefix}\\00{}", last_byte as char));
         }
         // "\xy" -> "\0xy"
-        if let Some(penultimate_byte @ b'0'..=b'7') = rev_bytes.next() {
+        else if let Some(penultimate_byte @ b'0'..=b'7') = rev_bytes.next() {
             if has_odd_consecutive_backslashes(&rev_bytes) {
                 let prefix = &text[..text.len() - 3];
-                return format!(
+                *text = Cow::Owned(format!(
                     "{prefix}\\0{}{}",
                     penultimate_byte as char, last_byte as char
-                );
+                ));
             }
         }
     }
-    text.to_string()
 }

I wonder if it's necessary to normalize the ending octal in the first string if the second string doesn't start with a digit. E.g. if I understand correctly, something like "\12" "foo" will be fixed as "\012foo" according to the logic in your PR -- but I think it's safe to not apply the normalization logic in this case, and instead fix it as \12foo? What do you think?

crates/ruff_linter/src/rules/flake8_implicit_str_concat/rules/implicit.rs

dylwil3 · 2024-08-27T17:17:27Z

Re putting the Cows to pasture:

The borrow checker complained when I tried to implement the fix you suggested (assuming I did it correctly), because I borrow text here:

let mut rev_bytes = text.bytes().rev();

and then try to modify it with *text = Cow::Owned(...). I can try to mess around, but it's not immediately obvious how to avoid a clone/copy/some other memory allocation.

More fundamentally, I'm hoping you could help with a Rust confusion here. I would've thought that the .to_string wouldn't be much added cost because we eventually convert everything to a String (even on the happy path) here:

let concatenation = format!("{a_leading_quote}{a_body}{b_body}{a_trailing_quote}");

Maybe a more minimal version of my question is: Do you gain much in terms or memory/performance by doing

let a = &"xyz";
let s = format!("{a}")
// then a drops out of scope

vs

let a = "xyz".to_string();
let s = format!("{a}")
// then a drops out of scope

?

Or does the compiler end up making those roughly the same?

(Obviously the first code is better in this contrived example, but the question remains.)

Sorry for the long-ish question, and thanks for the very helpful review!

AlexWaygood · 2024-08-27T17:48:09Z

Or does the compiler end up making those roughly the same?

Ummmm... I'm not entirely sure exactly what optimisations are permitted here! You may well be right that the compiler is smart enough to "see through" the allocation and optimise it away -- but I'm not sure if that's a permitted optimisation, or if it's one the compiler's smart enough to make. I could go and try to investigate exactly whether it does make this optimisation or not -- but I'm not sure the exact optimisations the compiler is likely to make here are things we should be relying on anyway ;) So I think it's better to use a Cow here!

AlexWaygood · 2024-08-27T17:50:16Z

The borrow checker complained when I tried to implement the fix you suggested (assuming I did it correctly), because I borrow text here:

I pushed the change I was suggesting to your PR branch in 25805a2 :-) the key to avoiding the borrow-checker complaints is to have normalize_ending_octal() mutate the existing value in-place rather than returning a new value

AlexWaygood

Thanks again!

dylwil3 · 2024-08-27T17:51:52Z

Makes sense, and thank you that was very helpful!

dylwil3 added 4 commits August 26, 2024 17:09

add test fixture

90a99ad

update rule

40856ad

update snapshots

5b23d3d

Merge branch 'main' into octal-escape

551ff7a

AlexWaygood reviewed Aug 27, 2024

View reviewed changes

crates/ruff_linter/src/rules/flake8_implicit_str_concat/rules/implicit.rs Outdated Show resolved Hide resolved

dylwil3 added 4 commits August 27, 2024 11:40

update test fixture

c8b059b

only pad octal when second string starts with octal digit

1e3bdf7

update snapshot

426ff17

change signature for odd backslash check

30cce39

Avoid an allocation and add another edge-case test

25805a2

AlexWaygood approved these changes Aug 27, 2024

View reviewed changes

AlexWaygood added bug Something isn't working fixes Related to suggested fixes for violations labels Aug 27, 2024

AlexWaygood merged commit 483748c into astral-sh:main Aug 27, 2024
19 checks passed

dylwil3 deleted the octal-escape branch August 27, 2024 17:53

BrewTestBot mentioned this pull request Aug 29, 2024

ruff 0.6.3 Homebrew/homebrew-core#182869

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`flake8-implicit-str-concat`] Normalize octals before merging concatenated strings in `single-line-implicit-string-concatenation` (`ISC001`) #13118

[`flake8-implicit-str-concat`] Normalize octals before merging concatenated strings in `single-line-implicit-string-concatenation` (`ISC001`) #13118

dylwil3 commented Aug 26, 2024

github-actions bot commented Aug 26, 2024 •

edited

Loading

AlexWaygood left a comment

dylwil3 commented Aug 27, 2024

AlexWaygood commented Aug 27, 2024

AlexWaygood commented Aug 27, 2024 •

edited

Loading

AlexWaygood left a comment

dylwil3 commented Aug 27, 2024

[flake8-implicit-str-concat] Normalize octals before merging concatenated strings in single-line-implicit-string-concatenation (ISC001) #13118

[flake8-implicit-str-concat] Normalize octals before merging concatenated strings in single-line-implicit-string-concatenation (ISC001) #13118

Conversation

dylwil3 commented Aug 26, 2024

github-actions bot commented Aug 26, 2024 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

AlexWaygood left a comment

Choose a reason for hiding this comment

dylwil3 commented Aug 27, 2024

AlexWaygood commented Aug 27, 2024

AlexWaygood commented Aug 27, 2024 • edited Loading

AlexWaygood left a comment

Choose a reason for hiding this comment

dylwil3 commented Aug 27, 2024

[`flake8-implicit-str-concat`] Normalize octals before merging concatenated strings in `single-line-implicit-string-concatenation` (`ISC001`) #13118

[`flake8-implicit-str-concat`] Normalize octals before merging concatenated strings in `single-line-implicit-string-concatenation` (`ISC001`) #13118

github-actions bot commented Aug 26, 2024 •

edited

Loading

`ruff-ecosystem` results

AlexWaygood commented Aug 27, 2024 •

edited

Loading