Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Enable token-based rules on source with syntax errors #11950

Merged
merged 1 commit into from
Jul 2, 2024

Conversation

dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Jun 20, 2024

Summary

This PR updates the linter, specifically the token-based rules, to work on the tokens that come after a syntax error.

For context, the token-based rules only diagnose the tokens up to the first lexical error. This PR builds up an error resilience by introducing a TokenIterWithContext which updates the nesting level and tries to reflect it with what the lexer is seeing. This isn't 100% accurate because if the parser recovered from an unclosed parenthesis in the middle of the line, the context won't reduce the nesting level until it sees the newline token at the end of the line.

resolves: #11915

Test Plan

  • Add test cases for a bunch of rules that are affected by this change.
  • Run the fuzzer for a long time, making sure to fix any other bugs.

@dhruvmanila dhruvmanila added the rule Implementing or modifying a lint rule label Jun 20, 2024
@dhruvmanila dhruvmanila force-pushed the dhruv/token-rules-with-syntax-errors branch from b7134c9 to 7db979b Compare June 20, 2024 12:16
Comment on lines -96 to 97
for token in tokens.up_to_first_unknown() {
for token in tokens {
pylint::rules::invalid_string_characters(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking at string tokens and the lexer doesn't emit them if it's unterminated. So, we might get away with not doing anything in this case for now.

Comment on lines 24 to 29
impl<'a> DocLines<'a> {
fn new(tokens: &'a Tokens) -> Self {
Self {
inner: tokens.up_to_first_unknown().iter(),
inner: tokens.iter(),
prev: TextSize::default(),
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extracts a specific set of comments so it doesn't require any specific update.

Comment on lines 256 to 260
_ => {
kind => {
if matches!(kind, TokenKind::Newline if fstrings > 0) {
// The parser recovered from an unterminated f-string.
fstrings = 0;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work as the newline tokens within f-strings are actually NonLogicalNewline, I'll move this into TokenIterWithContext. I'll test this a lot because f-strings are complex.

@dhruvmanila dhruvmanila force-pushed the dhruv/token-rules-with-syntax-errors branch from 7db979b to 1961406 Compare June 28, 2024 05:58
Comment on lines +110 to -115
for token in tokens {
match token.kind() {
TokenKind::EndOfFile => {
break;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The token stream doesn't contain the EndOfFile token.

Copy link
Contributor

github-actions bot commented Jun 28, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check encountered linter errors. (no lint changes; 1 project error)

demisto/content (error)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

warning: The top-level linter settings are deprecated in favour of their counterparts in the `lint` section. Please update the following options in `pyproject.toml`:
  - 'ignore' -> 'lint.ignore'
  - 'select' -> 'lint.select'
  - 'unfixable' -> 'lint.unfixable'
  - 'per-file-ignores' -> 'lint.per-file-ignores'
warning: `PGH001` has been remapped to `S307`.
warning: `PGH002` has been remapped to `G010`.
warning: `PLR1701` has been remapped to `SIM101`.
ruff failed
  Cause: Selection of deprecated rule `E999` is not allowed when preview is enabled.

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

warning: Detected debug build without --no-cache.
error: Failed to parse examples/gpt_actions_library/.gpt_action_getting_started.ipynb:11:1:1: Expected an expression
error: Failed to parse examples/gpt_actions_library/gpt_action_bigquery.ipynb:13:1:1: Expected an expression

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 2 project errors)

demisto/content (error)

ruff format --preview --exclude Packs/ThreatQ/Integrations/ThreatQ/ThreatQ.py

warning: The top-level linter settings are deprecated in favour of their counterparts in the `lint` section. Please update the following options in `pyproject.toml`:
  - 'ignore' -> 'lint.ignore'
  - 'select' -> 'lint.select'
  - 'unfixable' -> 'lint.unfixable'
  - 'per-file-ignores' -> 'lint.per-file-ignores'
warning: `PGH001` has been remapped to `S307`.
warning: `PGH002` has been remapped to `G010`.
warning: `PLR1701` has been remapped to `SIM101`.
ruff failed
  Cause: Selection of deprecated rule `E999` is not allowed when preview is enabled.

openai/openai-cookbook (error)

ruff format --preview

warning: Detected debug build without --no-cache.
error: Failed to parse examples/gpt_actions_library/.gpt_action_getting_started.ipynb:11:1:1: Expected an expression
error: Failed to parse examples/gpt_actions_library/gpt_action_bigquery.ipynb:13:1:1: Expected an expression

@dhruvmanila dhruvmanila force-pushed the dhruv/token-rules-with-syntax-errors branch from 6e96839 to f3bbacd Compare June 28, 2024 11:26
@dhruvmanila dhruvmanila changed the base branch from main to dhruv/revert June 28, 2024 11:26
@dhruvmanila dhruvmanila force-pushed the dhruv/token-rules-with-syntax-errors branch from f3bbacd to 7b42997 Compare June 28, 2024 11:27
dhruvmanila added a commit that referenced this pull request Jun 28, 2024
This PR reverts #12016 with a
small change where the error location points to the continuation
character only. Earlier, it would also highlight the whitespace that
came before it.

The motivation for this change is to avoid panic in
#11950. For example:

```py
\)
```

Playground: https://play.ruff.rs/87711071-1b54-45a3-b45a-81a336a1ea61

The range of `Unknown` token and `Rpar` is the same. Once #11950 is
enabled, the indexer would panic. It won't panic in the stable version
because we stop at the first `Unknown` token.
Base automatically changed from dhruv/revert to main June 28, 2024 12:40
@dhruvmanila dhruvmanila force-pushed the dhruv/token-rules-with-syntax-errors branch from 7b42997 to eeb24b1 Compare June 28, 2024 13:51
@dhruvmanila dhruvmanila changed the base branch from main to dhruv/disable-autofix July 1, 2024 11:18
@dhruvmanila dhruvmanila force-pushed the dhruv/token-rules-with-syntax-errors branch from eeb24b1 to 4019ca4 Compare July 1, 2024 11:18
Copy link

codspeed-hq bot commented Jul 1, 2024

CodSpeed Performance Report

Merging #11950 will not alter performance

Comparing dhruv/token-rules-with-syntax-errors (2e932e3) with main (88a4cc4)

Summary

✅ 30 untouched benchmarks

@dhruvmanila dhruvmanila force-pushed the dhruv/token-rules-with-syntax-errors branch from 4019ca4 to 27f494e Compare July 1, 2024 11:40
@dhruvmanila dhruvmanila force-pushed the dhruv/token-rules-with-syntax-errors branch 2 times, most recently from 6bb916f to 85baab7 Compare July 1, 2024 14:38
@dhruvmanila dhruvmanila marked this pull request as ready for review July 1, 2024 14:39
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice!

if !line_is_comment_only {
self.max_preceding_blank_lines = BlankLines::Zero;
}
if kind.is_any_newline() && !self.tokens.in_parenthesized_context() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an improvement even without the error recoverability :)

Comment on lines +562 to +564
TokenKind::Newline if self.nesting > 0 => {
self.nesting = 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's simpler than I expected. Nice

dhruvmanila added a commit that referenced this pull request Jul 2, 2024
## Summary

This PR updates Ruff to **not** generate auto-fixes if the source code
contains syntax errors as determined by the parser.

The main motivation behind this is to avoid infinite autofix loop when
the token-based rules are run over any source with syntax errors in
#11950.

Although even after this, it's not certain that there won't be an
infinite autofix loop because the logic might be incorrect. For example,
#12094 and
#12136.

This requires updating the test infrastructure to not validate for fix
availability status when the source contained syntax errors. This is
required because otherwise the fuzzer might fail as it uses the test
function to run the linter and validate the source code.

resolves: #11455 

## Test Plan

`cargo insta test`
Base automatically changed from dhruv/disable-autofix to main July 2, 2024 08:52
@dhruvmanila dhruvmanila force-pushed the dhruv/token-rules-with-syntax-errors branch from 85baab7 to 2e932e3 Compare July 2, 2024 08:53
@dhruvmanila dhruvmanila enabled auto-merge (squash) July 2, 2024 08:56
@dhruvmanila dhruvmanila merged commit 8f40928 into main Jul 2, 2024
19 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/token-rules-with-syntax-errors branch July 2, 2024 08:57
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
rule Implementing or modifying a lint rule
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow token-based rules to work on source code with syntax errors
2 participants