Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

AST parsing fails on non-utf8 setup.py files #205

Closed
techalchemy opened this issue Mar 6, 2020 · 0 comments · Fixed by #210 or #211
Closed

AST parsing fails on non-utf8 setup.py files #205

techalchemy opened this issue Mar 6, 2020 · 0 comments · Fixed by #210 or #211
Labels
bug Something isn't working

Comments

@techalchemy
Copy link
Member

The AST parser simply dies on non-utf8 setup.py files. One package I've encountered with this issue is azure-storage:

>>> ast_parse_setup_py("/tmp/pkgs/azure-storage-0.36.0/setup.py")                                                                                                             
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/techalchemy/.virtualenvs/tempenv-74f7279776e1c/lib/python3.8/site-packages/requirementslib/models/setup_info.py", line 827, in ast_parse_setup_py
    ast_analyzer = ast_parse_file(path)
  File "/home/techalchemy/.virtualenvs/tempenv-74f7279776e1c/lib/python3.8/site-packages/requirementslib/models/setup_info.py", line 819, in ast_parse_file
    tree = ast.parse(read_source(path))
  File "/home/techalchemy/.pyenv/versions/3.8.1/lib/python3.8/ast.py", line 47, in parse
    return compile(source, filename, mode, flags,
  File "<unknown>", line 1
    #!/usr/bin/env python
    ^
SyntaxError: invalid character in identifier

invalid character in identifier (<unknown>, line 1)

I checked the encoding of the given file, revealing:

>>> import chardet                                                                                                                                                            
>>> with open("/tmp/pkgs/azure-storage-0.36.0/setup.py", "rb") as fh: 
  2     encoding = chardet.detect(fh.read())                                                                                                                                  
>>> encoding                                                                                                                                                                  
{'encoding': 'UTF-8-SIG', 'confidence': 1.0, 'language': ''}

So it seems it's not a safe assumption that these are utf-8 encoded files and we should build in a fallback

This is a blocker on pypa/pipenv#3369

@techalchemy techalchemy added the bug Something isn't working label Mar 6, 2020
techalchemy added a commit that referenced this issue Mar 11, 2020
- Add encoding detection fallback for ast paring of setup.py
- Fixes #205

Signed-off-by: Dan Ryan <dan.ryan@canonical.com>
techalchemy added a commit that referenced this issue Mar 11, 2020
- Add encoding detection fallback for ast paring of setup.py
- Fixes #205

Signed-off-by: Dan Ryan <dan.ryan@canonical.com>
techalchemy added a commit that referenced this issue Mar 31, 2020
- Add encoding detection fallback for ast paring of setup.py
- Fixes #205

Signed-off-by: Dan Ryan <dan.ryan@canonical.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
1 participant