Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

vmprofshow breaks on multiline dictionary comprehensions #118

Closed
jogo opened this issue Feb 17, 2017 · 5 comments · Fixed by #119
Closed

vmprofshow breaks on multiline dictionary comprehensions #118

jogo opened this issue Feb 17, 2017 · 5 comments · Fixed by #119

Comments

@jogo
Copy link
Contributor

jogo commented Feb 17, 2017

Using cpython 2.7

test.py:

def foo():
    a = {
        i: i**i
        for i in range(10000)}
    print "END"

foo()
$ python -m vmprof --lines -o test.prof test.py
END
$ vmprofshow --lines test.prof
Total hits: 1480 s
File: test.py
Function: <dictcomp> at line 3
Traceback (most recent call last):
  File "/usr/local/bin/vmprofshow", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/dist-packages/vmprof/show.py", line 159, in main
    pp.show(args.profile)
  File "/usr/local/lib/python2.7/dist-packages/vmprof/show.py", line 182, in show
    self.show_func(filename, funline, funname, line_stats)
  File "/usr/local/lib/python2.7/dist-packages/vmprof/show.py", line 234, in show_func
    sublines = inspect.getblock(all_lines[start_lineno-1:])
  File "/usr/lib/python2.7/inspect.py", line 677, in getblock
    tokenize.tokenize(iter(lines).next, blockfinder.tokeneater)
  File "/usr/lib/python2.7/tokenize.py", line 169, in tokenize
    tokenize_loop(readline, tokeneater)
  File "/usr/lib/python2.7/tokenize.py", line 175, in tokenize_loop
    for token_info in generate_tokens(readline):
  File "/usr/lib/python2.7/tokenize.py", line 357, in generate_tokens
    raise TokenError, ("EOF in multi-line statement", (lnum, 0))
tokenize.TokenError: ('EOF in multi-line statement', (6, 0))

Specifically the line i: i**i takes the most time. But when handling a tokenize error the same way as a file missing I get:

Total hits: 1480 s
File: test.py
Function: <dictcomp> at line 3

Line #     Hits   % Hits  Line Contents
=======================================
     3        1      0.1
     4     1479     99.9

Total hits: 1481 s
File: test.py
Function: <module> at line 1

Line #     Hits   % Hits  Line Contents
=======================================
     1                    def foo():
     2                        a = {
     3                            i: i**i
     4                            for i in range(10000)}
     5                        print "END"

Total hits: 2 s

Could not find file -
Are you sure you are running this program from the same directory
that you ran the profiler from?
Continuing without the function's contents.

Line #     Hits   % Hits  Line Contents
=======================================
     0        2    100.0

Total hits: 1480 s
File: test.py
Function: foo at line 1

Line #     Hits   % Hits  Line Contents
=======================================
     1                    def foo():
     2                        a = {
     3                            i: i**i
     4     1480    100.0          for i in range(10000)}
     5                        print "END"

Total hits: 1 s

I tried playing around with inspect.getblock for dictionary comprehension by hand and was unable to get it to work properly.

So I am not sure that multiline dictionary comprehensions are measured correctly per line.

At the very least we can prevent vmprofshow --lines test.prof from crashing by handling the tokenize error in vmprof.show.

@jogo
Copy link
Contributor Author

jogo commented Feb 22, 2017

diff --git a/vmprof/show.py b/vmprof/show.py
index 68d1701..e35a456 100644
--- a/vmprof/show.py
+++ b/vmprof/show.py
@@ -224,7 +224,7 @@ class LinesPrinter(object):
             return

         stream.write("Total hits: %g s\n" % total_hits)
-        if os.path.exists(filename) or filename.startswith("<ipython-input-"):
+        if (os.path.exists(filename) or filename.startswith("<ipython-input-")) and func_name != "<dictcomp>":
             stream.write("File: %s\n" % filename)
             stream.write("Function: %s at line %s\n" % (func_name, start_lineno))
             if os.path.exists(filename):

Fixes the issue for me, although there may be a better way.

@planrich
Copy link
Contributor

I think that your patch filters dict comprehensions, the error indicates up the following issue:

inspect.getblock rightfully raises a tokenize error, because in the example you provided, it cannot parse a full python block. it is missing the starting bracket { for the dict comprehensions.

This means that multiline list comprehensions are also broken.

I think a better fix would be to parse the whole file, and iterate each syntax element and check if the startline <= line <= endline, where line is the line you want to show.

@jogo
Copy link
Contributor Author

jogo commented Feb 22, 2017

I tested a multi line list comprehension and it appears to work.

Total hits: 747 s
File: test.py
Function: foo at line 7

Line #     Hits   % Hits  Line Contents
=======================================
     7                    def foo():
     8                        a = [
     9                            i**i
    10      747    100.0          for i in range(10000)]
    11                        print "END"
>>> import linecache, inspect
>>> all_lines = linecache.getlines('test.py')
>>> all_lines[5:]
['\n', 'def foo():\n', '    a = {\n', '        i: i**i\n', '        for i in range(10000)}\n', '    print "END"\n', '\n', 'foo()\n', 'bar()\n']
>>> all_lines[7:]
['    a = {\n', '        i: i**i\n', '        for i in range(10000)}\n', '    print "END"\n', '\n', 'foo()\n', 'bar()\n']
>>> inspect.getblock(all_lines[7:])
['    a = {\n']
>>> inspect.getblock(all_lines[8:])
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    inspect.getblock(all_lines[8:])
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/inspect.py", line 677, in getblock
    tokenize.tokenize(iter(lines).next, blockfinder.tokeneater)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tokenize.py", line 170, in tokenize
    tokenize_loop(readline, tokeneater)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tokenize.py", line 176, in tokenize_loop
    for token_info in generate_tokens(readline):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tokenize.py", line 363, in generate_tokens
    raise TokenError, ("EOF in multi-line statement", (lnum, 0))
TokenError: ('EOF in multi-line statement', (7, 0))

I am not sure how to get the endline, but here is a pull request that I think is close to what you describe: #119

@planrich
Copy link
Contributor

Yes, that looks better. I tried to use the ast module to find the lines by walking the graph nodes. The reason I failed is: There is no endlineno attritube on those nodes, and I dont see an easy way to compute those.

As I see this now, this gives a better result than previously, so lets merge it.

@jogo
Copy link
Contributor Author

jogo commented Feb 23, 2017

@planrich thanks!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants