vmprofshow breaks on multiline dictionary comprehensions #118

jogo · 2017-02-17T23:17:54Z

Using cpython 2.7

test.py:

def foo():
    a = {
        i: i**i
        for i in range(10000)}
    print "END"

foo()

$ python -m vmprof --lines -o test.prof test.py
END
$ vmprofshow --lines test.prof
Total hits: 1480 s
File: test.py
Function: <dictcomp> at line 3
Traceback (most recent call last):
  File "/usr/local/bin/vmprofshow", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/dist-packages/vmprof/show.py", line 159, in main
    pp.show(args.profile)
  File "/usr/local/lib/python2.7/dist-packages/vmprof/show.py", line 182, in show
    self.show_func(filename, funline, funname, line_stats)
  File "/usr/local/lib/python2.7/dist-packages/vmprof/show.py", line 234, in show_func
    sublines = inspect.getblock(all_lines[start_lineno-1:])
  File "/usr/lib/python2.7/inspect.py", line 677, in getblock
    tokenize.tokenize(iter(lines).next, blockfinder.tokeneater)
  File "/usr/lib/python2.7/tokenize.py", line 169, in tokenize
    tokenize_loop(readline, tokeneater)
  File "/usr/lib/python2.7/tokenize.py", line 175, in tokenize_loop
    for token_info in generate_tokens(readline):
  File "/usr/lib/python2.7/tokenize.py", line 357, in generate_tokens
    raise TokenError, ("EOF in multi-line statement", (lnum, 0))
tokenize.TokenError: ('EOF in multi-line statement', (6, 0))

Specifically the line i: i**i takes the most time. But when handling a tokenize error the same way as a file missing I get:

Total hits: 1480 s
File: test.py
Function: <dictcomp> at line 3

Line #     Hits   % Hits  Line Contents
=======================================
     3        1      0.1
     4     1479     99.9

Total hits: 1481 s
File: test.py
Function: <module> at line 1

Line #     Hits   % Hits  Line Contents
=======================================
     1                    def foo():
     2                        a = {
     3                            i: i**i
     4                            for i in range(10000)}
     5                        print "END"

Total hits: 2 s

Could not find file -
Are you sure you are running this program from the same directory
that you ran the profiler from?
Continuing without the function's contents.

Line #     Hits   % Hits  Line Contents
=======================================
     0        2    100.0

Total hits: 1480 s
File: test.py
Function: foo at line 1

Line #     Hits   % Hits  Line Contents
=======================================
     1                    def foo():
     2                        a = {
     3                            i: i**i
     4     1480    100.0          for i in range(10000)}
     5                        print "END"

Total hits: 1 s

I tried playing around with inspect.getblock for dictionary comprehension by hand and was unable to get it to work properly.

So I am not sure that multiline dictionary comprehensions are measured correctly per line.

At the very least we can prevent vmprofshow --lines test.prof from crashing by handling the tokenize error in vmprof.show.

The text was updated successfully, but these errors were encountered:

jogo · 2017-02-22T00:37:32Z

diff --git a/vmprof/show.py b/vmprof/show.py
index 68d1701..e35a456 100644
--- a/vmprof/show.py
+++ b/vmprof/show.py
@@ -224,7 +224,7 @@ class LinesPrinter(object):
             return

         stream.write("Total hits: %g s\n" % total_hits)
-        if os.path.exists(filename) or filename.startswith("<ipython-input-"):
+        if (os.path.exists(filename) or filename.startswith("<ipython-input-")) and func_name != "<dictcomp>":
             stream.write("File: %s\n" % filename)
             stream.write("Function: %s at line %s\n" % (func_name, start_lineno))
             if os.path.exists(filename):

Fixes the issue for me, although there may be a better way.

planrich · 2017-02-22T09:36:53Z

I think that your patch filters dict comprehensions, the error indicates up the following issue:

inspect.getblock rightfully raises a tokenize error, because in the example you provided, it cannot parse a full python block. it is missing the starting bracket { for the dict comprehensions.

This means that multiline list comprehensions are also broken.

I think a better fix would be to parse the whole file, and iterate each syntax element and check if the startline <= line <= endline, where line is the line you want to show.

jogo · 2017-02-22T18:42:29Z

I tested a multi line list comprehension and it appears to work.

Total hits: 747 s
File: test.py
Function: foo at line 7

Line #     Hits   % Hits  Line Contents
=======================================
     7                    def foo():
     8                        a = [
     9                            i**i
    10      747    100.0          for i in range(10000)]
    11                        print "END"

>>> import linecache, inspect
>>> all_lines = linecache.getlines('test.py')
>>> all_lines[5:]
['\n', 'def foo():\n', '    a = {\n', '        i: i**i\n', '        for i in range(10000)}\n', '    print "END"\n', '\n', 'foo()\n', 'bar()\n']
>>> all_lines[7:]
['    a = {\n', '        i: i**i\n', '        for i in range(10000)}\n', '    print "END"\n', '\n', 'foo()\n', 'bar()\n']
>>> inspect.getblock(all_lines[7:])
['    a = {\n']
>>> inspect.getblock(all_lines[8:])
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    inspect.getblock(all_lines[8:])
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/inspect.py", line 677, in getblock
    tokenize.tokenize(iter(lines).next, blockfinder.tokeneater)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tokenize.py", line 170, in tokenize
    tokenize_loop(readline, tokeneater)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tokenize.py", line 176, in tokenize_loop
    for token_info in generate_tokens(readline):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tokenize.py", line 363, in generate_tokens
    raise TokenError, ("EOF in multi-line statement", (lnum, 0))
TokenError: ('EOF in multi-line statement', (7, 0))

I am not sure how to get the endline, but here is a pull request that I think is close to what you describe: #119

planrich · 2017-02-23T13:02:32Z

Yes, that looks better. I tried to use the ast module to find the lines by walking the graph nodes. The reason I failed is: There is no endlineno attritube on those nodes, and I dont see an easy way to compute those.

As I see this now, this gives a better result than previously, so lets merge it.

jogo · 2017-02-23T17:15:48Z

@planrich thanks!

jogo mentioned this issue Feb 22, 2017

Fallback in case of TokenError during vmprofshow #119

Merged

planrich closed this as completed in #119 Feb 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vmprofshow breaks on multiline dictionary comprehensions #118

vmprofshow breaks on multiline dictionary comprehensions #118

jogo commented Feb 17, 2017

jogo commented Feb 22, 2017

planrich commented Feb 22, 2017

jogo commented Feb 22, 2017

planrich commented Feb 23, 2017

jogo commented Feb 23, 2017

vmprofshow breaks on multiline dictionary comprehensions #118

vmprofshow breaks on multiline dictionary comprehensions #118

Comments

jogo commented Feb 17, 2017

jogo commented Feb 22, 2017

planrich commented Feb 22, 2017

jogo commented Feb 22, 2017

planrich commented Feb 23, 2017

jogo commented Feb 23, 2017