Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[flake8-implicit-str-concat] Normalize octals before merging concatenated strings in single-line-implicit-string-concatenation (ISC001) #13118

Merged
merged 9 commits into from
Aug 27, 2024

Conversation

dylwil3
Copy link
Collaborator

@dylwil3 dylwil3 commented Aug 26, 2024

This PR pads the last octal (if there is one) to three digits before concatenating strings in the fix for ISC001.

For example: "\12""0" is fixed to "\0120".

Closes #12936.

Copy link
Contributor

github-actions bot commented Aug 26, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very clever. Good job! Two small points:

  1. The .to_string() call on line 175 will clone the underlying data, which could be quite costly -- and is unnecessary in the happy path, since most strings don't have octal escapes in them. We can avoid this by using a Cow -- something like this? (The diff is relative to your branch)
--- a/crates/ruff_linter/src/rules/flake8_implicit_str_concat/rules/implicit.rs
+++ b/crates/ruff_linter/src/rules/flake8_implicit_str_concat/rules/implicit.rs
@@ -1,3 +1,5 @@
+use std::borrow::Cow;
+
 use itertools::Itertools;
 
 use ruff_diagnostics::{Diagnostic, Edit, Fix, FixAvailability, Violation};
@@ -173,11 +175,11 @@ fn concatenate_strings(a_range: TextRange, b_range: TextRange, locator: &Locator
     }
 
     let mut a_body =
-        a_text[a_leading_quote.len()..a_text.len() - a_trailing_quote.len()].to_string();
+        Cow::Borrowed(&a_text[a_leading_quote.len()..a_text.len() - a_trailing_quote.len()]);
     let b_body = &b_text[b_leading_quote.len()..b_text.len() - b_trailing_quote.len()];
 
     if a_leading_quote.find(['r', 'R']).is_none() {
-        a_body = normalize_ending_octal(&a_body);
+        normalize_ending_octal(&mut a_body);
     }
 
     let concatenation = format!("{a_leading_quote}{a_body}{b_body}{a_trailing_quote}");
@@ -191,10 +193,10 @@ fn concatenate_strings(a_range: TextRange, b_range: TextRange, locator: &Locator
 
 /// Pads an octal at the end of the string
 /// to three digits, if necessary.
-fn normalize_ending_octal(text: &str) -> String {
+fn normalize_ending_octal(text: &mut Cow<'_, str>) {
     // Early return for short strings
     if text.len() < 2 {
-        return text.to_string();
+        return;
     }
 
     let mut rev_bytes = text.bytes().rev();
@@ -202,20 +204,19 @@ fn normalize_ending_octal(text: &str) -> String {
         // "\y" -> "\00y"
         if has_odd_consecutive_backslashes(&rev_bytes) {
             let prefix = &text[..text.len() - 2];
-            return format!("{prefix}\\00{}", last_byte as char);
+            *text = Cow::Owned(format!("{prefix}\\00{}", last_byte as char));
         }
         // "\xy" -> "\0xy"
-        if let Some(penultimate_byte @ b'0'..=b'7') = rev_bytes.next() {
+        else if let Some(penultimate_byte @ b'0'..=b'7') = rev_bytes.next() {
             if has_odd_consecutive_backslashes(&rev_bytes) {
                 let prefix = &text[..text.len() - 3];
-                return format!(
+                *text = Cow::Owned(format!(
                     "{prefix}\\0{}{}",
                     penultimate_byte as char, last_byte as char
-                );
+                ));
             }
         }
     }
-    text.to_string()
 }
  1. I wonder if it's necessary to normalize the ending octal in the first string if the second string doesn't start with a digit. E.g. if I understand correctly, something like "\12" "foo" will be fixed as "\012foo" according to the logic in your PR -- but I think it's safe to not apply the normalization logic in this case, and instead fix it as \12foo? What do you think?

@dylwil3
Copy link
Collaborator Author

dylwil3 commented Aug 27, 2024

Re putting the Cows to pasture:

The borrow checker complained when I tried to implement the fix you suggested (assuming I did it correctly), because I borrow text here:

let mut rev_bytes = text.bytes().rev();

and then try to modify it with *text = Cow::Owned(...). I can try to mess around, but it's not immediately obvious how to avoid a clone/copy/some other memory allocation.

More fundamentally, I'm hoping you could help with a Rust confusion here. I would've thought that the .to_string wouldn't be much added cost because we eventually convert everything to a String (even on the happy path) here:

let concatenation = format!("{a_leading_quote}{a_body}{b_body}{a_trailing_quote}");

Maybe a more minimal version of my question is: Do you gain much in terms or memory/performance by doing

let a = &"xyz";
let s = format!("{a}")
// then a drops out of scope

vs

let a = "xyz".to_string();
let s = format!("{a}")
// then a drops out of scope

?

Or does the compiler end up making those roughly the same?

(Obviously the first code is better in this contrived example, but the question remains.)

Sorry for the long-ish question, and thanks for the very helpful review!

@AlexWaygood
Copy link
Member

Or does the compiler end up making those roughly the same?

Ummmm... I'm not entirely sure exactly what optimisations are permitted here! You may well be right that the compiler is smart enough to "see through" the allocation and optimise it away -- but I'm not sure if that's a permitted optimisation, or if it's one the compiler's smart enough to make. I could go and try to investigate exactly whether it does make this optimisation or not -- but I'm not sure the exact optimisations the compiler is likely to make here are things we should be relying on anyway ;) So I think it's better to use a Cow here!

@AlexWaygood
Copy link
Member

AlexWaygood commented Aug 27, 2024

The borrow checker complained when I tried to implement the fix you suggested (assuming I did it correctly), because I borrow text here:

I pushed the change I was suggesting to your PR branch in 25805a2 :-) the key to avoiding the borrow-checker complaints is to have normalize_ending_octal() mutate the existing value in-place rather than returning a new value

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again!

@dylwil3
Copy link
Collaborator Author

dylwil3 commented Aug 27, 2024

Makes sense, and thank you that was very helpful!

@AlexWaygood AlexWaygood changed the title [flake8-implicit-str-concat] Normalize octals before concatenating strings in single-line-implicit-string-concatenation (ISC001) [flake8-implicit-str-concat] Normalize octals before merging concatenated strings in single-line-implicit-string-concatenation (ISC001) Aug 27, 2024
@AlexWaygood AlexWaygood added bug Something isn't working fixes Related to suggested fixes for violations labels Aug 27, 2024
@AlexWaygood AlexWaygood merged commit 483748c into astral-sh:main Aug 27, 2024
19 checks passed
@dylwil3 dylwil3 deleted the octal-escape branch August 27, 2024 17:53
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working fixes Related to suggested fixes for violations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ISC001 fix can modify octal escape sequences
2 participants