Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implement an alternative approach to resolving inline frames
The idea is to find (and cache) the Dwarf_Die's with DW_TAG_subprogram and then look for inline frames in them directly. Previously, we relied on `dwarf_getscopes{,_die}` for this, but that potentially needs to traverse the full DWARF die tree and can thus be quite time consuming. The following shows the performance impact of this patch for a perf.data file with about 6M samples. Many frames in the callstacks point to a self-compiled libclang.so with debug symbols. That library alone is roughly 600MB large. This makes finding inline frames quite slow. Before: ``` Performance counter stats for '/home/milian/projects/kdab/rnd/hotspot/build/lib/libexec/hotspot-perfparser --input perf.data.in --output /dev/null': 80.159,75 msec task-clock # 0,984 CPUs utilized 4.075 context-switches # 0,051 K/sec 1 cpu-migrations # 0,000 K/sec 152.257 page-faults # 0,002 M/sec 346.071.892.881 cycles # 4,317 GHz (83,33%) 1.940.060.936 stalled-cycles-frontend # 0,56% frontend cycles idle (83,33%) 38.399.679.774 stalled-cycles-backend # 11,10% backend cycles idle (83,34%) 999.298.133.335 instructions # 2,89 insn per cycle # 0,04 stalled cycles per insn (83,31%) 239.561.868.424 branches # 2988,556 M/sec (83,34%) 1.163.589.915 branch-misses # 0,49% of all branches (83,34%) 81,497496973 seconds time elapsed 79,554970000 seconds user 0,404933000 seconds sys ``` After: ``` Performance counter stats for '/home/milian/projects/kdab/rnd/hotspot/build/lib/libexec/hotspot-perfparser --input perf.data.in --output /dev/null': 24.283,03 msec task-clock # 1,000 CPUs utilized 94 context-switches # 0,004 K/sec 0 cpu-migrations # 0,000 K/sec 168.432 page-faults # 0,007 M/sec 105.147.091.148 cycles # 4,330 GHz (83,33%) 957.491.830 stalled-cycles-frontend # 0,91% frontend cycles idle (83,33%) 10.502.770.200 stalled-cycles-backend # 9,99% backend cycles idle (83,34%) 295.684.447.780 instructions # 2,81 insn per cycle # 0,04 stalled cycles per insn (83,33%) 71.359.066.393 branches # 2938,639 M/sec (83,34%) 337.918.049 branch-misses # 0,47% of all branches (83,33%) 24,285108368 seconds time elapsed 23,901378000 seconds user 0,318950000 seconds sys ``` So this is looking quite promising. The biggest offender after this patch is then the dwarf_getscopes_die call in prependScopeNames. We can probably add a better caching there too, i.e. we should be able to cache the scope name while building the subprogram range mapping. A follow up patch will look into that. Relates-To: KDAB/hotspot#192 Change-Id: I0669cc3aad886b22165eaf1d0836a56e5183898d
- Loading branch information