Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

UP037 fix crashes when a terminal comment follows a literal #15816

Closed
dscorbett opened this issue Jan 29, 2025 · 4 comments · Fixed by #15824
Closed

UP037 fix crashes when a terminal comment follows a literal #15816

dscorbett opened this issue Jan 29, 2025 · 4 comments · Fixed by #15824
Assignees
Labels
bug Something isn't working fixes Related to suggested fixes for violations

Comments

@dscorbett
Copy link

Description

The fix for quoted-annotation (UP037) crashes Ruff 0.9.3 when a quoted annotation ends with a comment and contains a string or number literal, which can occur with Annotated or Literal.

$ cat >up037.py <<'# EOF'
from __future__ import annotations
from typing import Literal
def f() -> "Literal[0]#":
    return 0
# EOF

$ ruff check --isolated --select UP037 up037.py --diff

error: Fix introduced a syntax error. Reverting all changes.

This indicates a bug in Ruff. If you could open an issue at:

    https://github.com/astral-sh/ruff/issues/new?title=%5BFix%20error%5D

...quoting the contents of `up037.py`, the rule codes UP037, along with the `pyproject.toml` settings and executed command, we'd be very appreciative!

@charliermarsh
Copy link
Member

Why is "contains a string or number literal" required?

@ntBre
Copy link
Contributor

ntBre commented Jan 29, 2025

It looks like a bug with SimpleTokenizer. These are the tokens when the subscript is an identifier (annotation like "Literal[X]#"):

[
    SimpleToken {
        kind: Name,
        range: 0..7,
    },
    SimpleToken {
        kind: LBracket,
        range: 7..8,
    },
    SimpleToken {
        kind: Name,
        range: 8..9,
    },
    SimpleToken {
        kind: RBracket,
        range: 9..10,
    },
    SimpleToken {
        kind: Comment,
        range: 10..11,
    },
]

and this is for the original example:

[
    SimpleToken {
        kind: Name,
        range: 0..7,
    },
    SimpleToken {
        kind: LBracket,
        range: 7..8,
    },
    SimpleToken {
        kind: Other,
        range: 8..9,
    },
    SimpleToken {
        kind: Bogus,
        range: 9..11,
    },
]

@dylwil3 dylwil3 added the bug Something isn't working label Jan 29, 2025
@dylwil3
Copy link
Collaborator

dylwil3 commented Jan 29, 2025

It looks like a bug with SimpleTokenizer. These are the tokens when the subscript is an identifier (annotation like "Literal[X]#"):

Not sure if that's a bug or the intended scope for SimpleTokenizer.

In any event: we might be able to avoid tokenizing anything, I think, because the tokens for the annotation are available as checker.tokens(). In the above example you get:

Tokens {
    raw: [
        Name 76..83,
        Lsqb 83..84,
        Int 84..85,
        Rsqb 85..86,
        Comment 86..87,
        Newline 87..87,
    ],
}

(not sure why there's a newline but otherwise looks correct)

@AlexWaygood AlexWaygood added the fixes Related to suggested fixes for violations label Jan 29, 2025
@InSyncWithFoo
Copy link
Contributor

InSyncWithFoo commented Jan 29, 2025

not sure why there's a newline but otherwise looks correct

I think that's a logical newline:

// https://github.com/astral-sh/ruff/blob/3125332ec1/crates/ruff_python_parser/src/token.rs#L130-L134
pub enum TokenKind {
    // ...

    /// Token kind for a newline.
    Newline,
    /// Token kind for a newline that is not a logical line break. These are filtered out of
    /// the token stream prior to parsing.
    NonLogicalNewline,

    // ...
}

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working fixes Related to suggested fixes for violations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants