Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Black fails to tokenise files ending with a backslash #1012

Closed
Zac-HD opened this issue Sep 10, 2019 · 8 comments
Closed

Black fails to tokenise files ending with a backslash #1012

Zac-HD opened this issue Sep 10, 2019 · 8 comments
Labels
C: parser How we parse code. Or fail to parse it.

Comments

@Zac-HD
Copy link
Contributor

Zac-HD commented Sep 10, 2019

Given a file containing a backslash preceeded and followed by any number of newlines, Black ae5588 and 19.3b0 throw blib2to3.pgen2.tokenize.TokenError: 'EOF in multi-line statement', (2, 0).

I consider this a bug because Python is perfectly happy to execute such files, doing nothing, and compile("\\", "<string>", "exec") also works:

>>> code = compile("\\", "<string>", "exec")  # or "\\\n", or "\n\\\n", etc.
>>> import dis; dis.dis(code)
  1           0 LOAD_CONST               0 (None)
              2 RETURN_VALUE

Like #970, I found this with Hypothesmith.

@Zac-HD Zac-HD changed the title Black fails to tokenise files containing a lone backslash Black fails to tokenise files ending with a backslash Oct 29, 2019
@Zac-HD
Copy link
Contributor Author

Zac-HD commented Oct 29, 2019

This is still present in Black 19.10b0 - it's a different bug to #922/#948; Python ignores a trailing backslash but Black chokes on it.

@jayaddison
Copy link
Contributor

It looks like Python's built-in compile behaviour became stricter between py37 and py38; a trailing line continuation statement is no longer accepted.

Python 3.7.7 (default, Apr  1 2020, 13:48:52) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> compile('\\', '<STRING>', 'exec')
<code object <module> at 0x7f60bd565270, file "<STRING>", line 1>
>>>
Python 3.8.7 (default, Dec 22 2020, 10:37:26) 
[GCC 10.2.1 20201207] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> compile('\\', '<STRING>', 'exec')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<STRING>", line 1
    \
    ^
SyntaxError: unexpected EOF while parsing
>>> 

This could still be addressed; there's some work-in-progress included in #1961. Does anyone have suggestions on how best to proceed?

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Feb 15, 2021

"Ignore this until py37 reaches end of life" seems like a reasonable plan to me, and it's easy enough to adjust the tests accordingly.

@jayaddison
Copy link
Contributor

Another example where this has surfaced during fuzzer testing, after merging #1991:

https://github.com/psf/black/pull/1958/checks?check_run_id=1945936278

Falsifying example: test_idempotent_any_syntatically_valid_python(
    src_contents='\n\x0c\\\r\n',
    mode=Mode(target_versions=set(), line_length=88, string_normalization=False, magic_trailing_comma=True, experimental_string_processing=False, is_pyi=False),
)

It might be possible to adjust the special case regular expression in the exception handler to permit this too. Perhaps we should also be a bit wary of getting into an attempt to detect a universe of valid-ish programs via a regex, though.

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Feb 21, 2021

Aw, heck. Form-feed (\x0c) is always tricky... see e.g. Instagram/LibCST#446.

I think we should just check "\\" in src_contents instead of using regex 😅

@jayaddison
Copy link
Contributor

I think we should just check "\\" in src_contents instead of using regex

That's possible.. it seems like that might be quite permissive, though. That said, I suppose the EOF-in-multiline exception should be quite rare and selective.

@jayaddison
Copy link
Contributor

reaches

Just digging back through some old issue threads.. Py3.7 is EOL nowadays, so perhaps this issue can be closed? (backslash at end-of-file causes a black parser error -- and since Py3.8, the Python parser considers that invalid too)

@JelleZijlstra
Copy link
Collaborator

I like it when the universe fixes the bug for you.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
C: parser How we parse code. Or fail to parse it.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants