Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add return const instruction #101632

Closed
penguin-wwy opened this issue Feb 7, 2023 · 2 comments
Closed

Add return const instruction #101632

penguin-wwy opened this issue Feb 7, 2023 · 2 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@penguin-wwy
Copy link
Contributor

penguin-wwy commented Feb 7, 2023

From the pystats doc (pystats-2023-02-05-python-5a2b984.md), I find that LOAD_CONST + RETURN_VALUE is a very high frequency (Because the default return of the function is None).

Successors for LOAD_CONST

Successors Count Percentage
RETURN_VALUE 969,173,651 21.8%
BINARY_OP_ADD_INT 418,647,997 9.4%
LOAD_CONST 403,185,774 9.1%
COMPARE_AND_BRANCH_INT 314,633,792 7.1%
STORE_FAST 295,563,626 6.6%

And predecessors for RETURN_VALUE

Predecessors Count Percentage
LOAD_CONST 969,173,651 29.9%
LOAD_FAST 505,933,343 15.6%
RETURN_VALUE 382,698,373 11.8%
BUILD_TUPLE 328,532,240 10.1%
COMPARE_OP 107,210,803 3.3%

This means that if we add a RETURN_CONST, we can reduce the RETURN_VALUE instruction by 30% and the LOAD_CONST instruction by 20%.

./bin/python3 -m pyperf timeit -w 3 --compare-to ../python-3.12/bin/python3 -s "
def test():
    return 10000
" "test()"

/python-3.12/bin/python3: ..................... 27.0 ns +- 0.3 ns
/cpython/bin/python3: ..................... 25.0 ns +- 0.5 ns
Mean +- std dev: [/python-3.12/bin/python3] 27.0 ns +- 0.3 ns -> [/cpython/bin/python3] 25.0 ns +- 0.5 ns: 1.08x faster

./bin/python3 -m pyperf timeit -w 3 --compare-to ../python-3.12/bin/python3 -s "
def test():
    return None
" "test()"

/python-3.12/bin/python3: ..................... 27.2 ns +- 1.3 ns
/cpython/bin/python3: ..................... 25.1 ns +- 0.6 ns
Mean +- std dev: [/python-3.12/bin/python3] 27.2 ns +- 1.3 ns -> [/cpython/bin/python3] 25.1 ns +- 0.6 ns: 1.08x faster

From the microbenchmark that there is indeed a ~10% improvement (considering the interference of function calls, I think 10% should be there), which is not very high, but it should be an optimization without adverse effects.

Linked PRs

@penguin-wwy
Copy link
Contributor Author

Execution counts for all instructions in the main branch

Name Count Self Cumulative Miss ratio
LOAD_CONST 4,447,532,233 4.7% 24.4%
RETURN_VALUE 3,241,585,933 3.4% 43.2%

And execution counts for all instructions in my branch

Name Count Self Cumulative Miss ratio
LOAD_CONST 3,477,437,602 3.7% 31.5%
RETURN_VALUE 2,268,077,678 2.4% 47.3%
RETURN_CONST 967,592,289 1.0% 69.3%

@penguin-wwy
Copy link
Contributor Author

penguin-wwy commented Feb 7, 2023

I have executed pyperformance on my server, compare to commit d3e2dd6

python3 -m pyperf compare_to --table --min-speed 5 /python-3.12.0/results.json results.json

+---------------------+---------------------------------------+-----------------------+
| Benchmark           | python-3.12.0/results.json            | results.json          |
+=====================+=======================================+=======================+
| coverage            | 378 ms                                | 360 ms: 1.05x faster  |
+---------------------+---------------------------------------+-----------------------+
| crypto_pyaes        | 87.2 ms                               | 82.5 ms: 1.06x faster |
+---------------------+---------------------------------------+-----------------------+
| logging_silent      | 108 ns                                | 94.3 ns: 1.15x faster |
+---------------------+---------------------------------------+-----------------------+
| pickle_pure_python  | 352 us                                | 330 us: 1.07x faster  |
+---------------------+---------------------------------------+-----------------------+
| regex_v8            | 26.8 ms                               | 23.8 ms: 1.12x faster |
+---------------------+---------------------------------------+-----------------------+
| richards            | 54.6 ms                               | 51.7 ms: 1.06x faster |
+---------------------+---------------------------------------+-----------------------+
| scimark_lu          | 125 ms                                | 133 ms: 1.07x slower  |
+---------------------+---------------------------------------+-----------------------+
| scimark_monte_carlo | 83.3 ms                               | 74.3 ms: 1.12x faster |
+---------------------+---------------------------------------+-----------------------+
| spectral_norm       | 118 ms                                | 112 ms: 1.06x faster  |
+---------------------+---------------------------------------+-----------------------+
| sqlglot_parse       | 1.71 ms                               | 1.61 ms: 1.06x faster |
+---------------------+---------------------------------------+-----------------------+
| sqlglot_transpile   | 2.00 ms                               | 1.89 ms: 1.06x faster |
+---------------------+---------------------------------------+-----------------------+
| unpickle_list       | 4.73 us                               | 5.06 us: 1.07x slower |
+---------------------+---------------------------------------+-----------------------+
| Geometric mean      | (ref)                                 | 1.01x faster          |
+---------------------+---------------------------------------+-----------------------+

Benchmark hidden because not significant (68): 2to3, async_generators, async_tree_none, async_tree_cpu_io_mixed, async_tree_io, async_tree_memoization, chameleon, chaos, bench_mp_pool, bench_thread_pool, coroutines, deepcopy, deepcopy_reduce, deepcopy_memo, deltablue, django_template, docutils, dulwich_log, fannkuch, float, generators, genshi_text, genshi_xml, go, hexiom, html5lib, json_dumps, json_loads, logging_format, logging_simple, mako, mdp, meteor_contest, nbody, nqueens, pathlib, pickle, pickle_dict, pickle_list, pidigits, pprint_safe_repr, pprint_pformat, pyflate, python_startup, python_startup_no_site, raytrace, regex_compile, regex_dna, regex_effbot, scimark_fft, scimark_sor, scimark_sparse_mat_mult, sqlglot_optimize, sqlglot_normalize, sqlite_synth, sympy_expand, sympy_integrate, sympy_sum, sympy_str, telco, tornado_http, unpack_sequence, unpickle, unpickle_pure_python, xml_etree_parse, xml_etree_iterparse, xml_etree_generate, xml_etree_process

Although the performance gain may not be high, the side effects are minimal and should be a positive optimization.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

2 participants