Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

The lines in tokens from tokenize.generate_tokens incorrectly indicate multiple lines. #104972

Closed
nedbat opened this issue May 26, 2023 · 6 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@nedbat
Copy link
Member

nedbat commented May 26, 2023

The line attribute in tokens returned by tokenize.generate_tokens incorrectly indicate multiple lines. The tokens should have an invariant that using the .start and .end attributes to index into the .line attribute will produce the .string attribute.

tokbug.py:

import io
import sys
import tokenize

SOURCE = r"""
a + \
b
"""

print(sys.version)
readline = io.StringIO(SOURCE).readline
for tok in tokenize.generate_tokens(readline):
    correct = (tok.string) == (tok.line[tok.start[1]: tok.end[1]])
    print(tok, "" if correct else "<*****!!!")

Run with 3.12.0a7:

% /usr/local/pyenv/pyenv/versions/3.12.0a7/bin/python tokbug.py
3.12.0a7 (main, Apr  5 2023, 05:51:58) [Clang 14.0.3 (clang-1403.0.22.14.1)]
TokenInfo(type=62 (NL), string='\n', start=(1, 0), end=(1, 1), line='\n')
TokenInfo(type=1 (NAME), string='a', start=(2, 0), end=(2, 1), line='a + \\\n')
TokenInfo(type=54 (OP), string='+', start=(2, 2), end=(2, 3), line='a + \\\n')
TokenInfo(type=1 (NAME), string='b', start=(3, 0), end=(3, 1), line='b\n')
TokenInfo(type=4 (NEWLINE), string='\n', start=(3, 1), end=(3, 2), line='b\n')
TokenInfo(type=0 (ENDMARKER), string='', start=(4, 0), end=(4, 0), line='')

Run with 3.12.0b1:

% /usr/local/pyenv/pyenv/versions/3.12.0b1/bin/python tokbug.py
3.12.0b1 (main, May 23 2023, 16:19:59) [Clang 14.0.3 (clang-1403.0.22.14.1)]
TokenInfo(type=65 (NL), string='\n', start=(1, 0), end=(1, 1), line='\n')
TokenInfo(type=1 (NAME), string='a', start=(2, 0), end=(2, 1), line='a + \\\n')
TokenInfo(type=55 (OP), string='+', start=(2, 2), end=(2, 3), line='a + \\\n')
TokenInfo(type=1 (NAME), string='b', start=(3, 0), end=(3, 1), line='a + \\\nb\n') <*****!!!
TokenInfo(type=4 (NEWLINE), string='\n', start=(3, 1), end=(3, 2), line='a + \\\nb\n') <*****!!!
TokenInfo(type=0 (ENDMARKER), string='', start=(4, 0), end=(4, 0), line='')

Related to #104825? cc @pablogsal

Linked PRs

@nedbat nedbat added the type-bug An unexpected behavior, bug, or error label May 26, 2023
@nedbat
Copy link
Member Author

nedbat commented May 26, 2023

The tip of 3.12 shows one more style of change, due to omitting the newlines from .line:

3.12.0b1+ (heads/3.12:6324458bef, May 26 2023, 06:25:21) [Clang 14.0.3 (clang-1403.0.22.14.1)]
TokenInfo(type=65 (NL), string='\n', start=(1, 0), end=(1, 1), line='') <*****!!!
TokenInfo(type=1 (NAME), string='a', start=(2, 0), end=(2, 1), line='a + \\')
TokenInfo(type=55 (OP), string='+', start=(2, 2), end=(2, 3), line='a + \\')
TokenInfo(type=1 (NAME), string='b', start=(3, 0), end=(3, 1), line='a + \\\nb') <*****!!!
TokenInfo(type=4 (NEWLINE), string='\n', start=(3, 1), end=(3, 2), line='a + \\\nb') <*****!!!
TokenInfo(type=0 (ENDMARKER), string='', start=(4, 0), end=(4, 0), line='')

@pablogsal
Copy link
Member

CC: @mgmacias95

pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023
pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023
…e module are correct

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023
…e module are correct

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023
…e module are correct

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 26, 2023
…e module are correct (pythonGH-104975)

(cherry picked from commit 3fdb55c)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
@pablogsal
Copy link
Member

@nedbat can you check with main now?

terryjreedy pushed a commit that referenced this issue May 26, 2023
…ze module are correct (GH-104975) (#104982)

gh-104972: Ensure that line attributes in tokens in the tokenize module are correct (GH-104975)
(cherry picked from commit 3fdb55c)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
@terryjreedy
Copy link
Member

If you set automerge on the backport, the usual test-hypothesis failure disabled it. I merged it.

@nedbat
Copy link
Member Author

nedbat commented May 26, 2023

@pablogsal Gorgeous! Thanks for the quick turnaround.

@pablogsal
Copy link
Member

If you set automerge on the backport, the usual test-hypothesis failure disabled it. I merged it.

Thanks a lot @terryjreedy !

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants