Add return const instruction #101632

penguin-wwy · 2023-02-07T06:53:47Z

From the pystats doc (pystats-2023-02-05-python-5a2b984.md), I find that LOAD_CONST + RETURN_VALUE is a very high frequency (Because the default return of the function is None).

Successors for LOAD_CONST

Successors	Count	Percentage
RETURN_VALUE	969,173,651	21.8%
BINARY_OP_ADD_INT	418,647,997	9.4%
LOAD_CONST	403,185,774	9.1%
COMPARE_AND_BRANCH_INT	314,633,792	7.1%
STORE_FAST	295,563,626	6.6%

And predecessors for RETURN_VALUE

Predecessors	Count	Percentage
LOAD_CONST	969,173,651	29.9%
LOAD_FAST	505,933,343	15.6%
RETURN_VALUE	382,698,373	11.8%
BUILD_TUPLE	328,532,240	10.1%
COMPARE_OP	107,210,803	3.3%

This means that if we add a RETURN_CONST, we can reduce the RETURN_VALUE instruction by 30% and the LOAD_CONST instruction by 20%.

./bin/python3 -m pyperf timeit -w 3 --compare-to ../python-3.12/bin/python3 -s "
def test():
    return 10000
" "test()"

/python-3.12/bin/python3: ..................... 27.0 ns +- 0.3 ns
/cpython/bin/python3: ..................... 25.0 ns +- 0.5 ns
Mean +- std dev: [/python-3.12/bin/python3] 27.0 ns +- 0.3 ns -> [/cpython/bin/python3] 25.0 ns +- 0.5 ns: 1.08x faster

./bin/python3 -m pyperf timeit -w 3 --compare-to ../python-3.12/bin/python3 -s "
def test():
    return None
" "test()"

/python-3.12/bin/python3: ..................... 27.2 ns +- 1.3 ns
/cpython/bin/python3: ..................... 25.1 ns +- 0.6 ns
Mean +- std dev: [/python-3.12/bin/python3] 27.2 ns +- 1.3 ns -> [/cpython/bin/python3] 25.1 ns +- 0.6 ns: 1.08x faster

From the microbenchmark that there is indeed a ~10% improvement (considering the interference of function calls, I think 10% should be there), which is not very high, but it should be an optimization without adverse effects.

Linked PRs

gh-101632: Add a RETURN_CONST instruction #101633

The text was updated successfully, but these errors were encountered:

penguin-wwy · 2023-02-07T07:03:59Z

Execution counts for all instructions in the main branch

Name	Count	Self	Cumulative	Miss ratio
LOAD_CONST	4,447,532,233	4.7%	24.4%
RETURN_VALUE	3,241,585,933	3.4%	43.2%

And execution counts for all instructions in my branch

Name	Count	Self	Cumulative
LOAD_CONST	3,477,437,602	3.7%	31.5%
RETURN_VALUE	2,268,077,678	2.4%	47.3%
RETURN_CONST	967,592,289	1.0%	69.3%

penguin-wwy · 2023-02-07T08:06:19Z

I have executed pyperformance on my server, compare to commit d3e2dd6

python3 -m pyperf compare_to --table --min-speed 5 /python-3.12.0/results.json results.json

+---------------------+---------------------------------------+-----------------------+
| Benchmark           | python-3.12.0/results.json            | results.json          |
+=====================+=======================================+=======================+
| coverage            | 378 ms                                | 360 ms: 1.05x faster  |
+---------------------+---------------------------------------+-----------------------+
| crypto_pyaes        | 87.2 ms                               | 82.5 ms: 1.06x faster |
+---------------------+---------------------------------------+-----------------------+
| logging_silent      | 108 ns                                | 94.3 ns: 1.15x faster |
+---------------------+---------------------------------------+-----------------------+
| pickle_pure_python  | 352 us                                | 330 us: 1.07x faster  |
+---------------------+---------------------------------------+-----------------------+
| regex_v8            | 26.8 ms                               | 23.8 ms: 1.12x faster |
+---------------------+---------------------------------------+-----------------------+
| richards            | 54.6 ms                               | 51.7 ms: 1.06x faster |
+---------------------+---------------------------------------+-----------------------+
| scimark_lu          | 125 ms                                | 133 ms: 1.07x slower  |
+---------------------+---------------------------------------+-----------------------+
| scimark_monte_carlo | 83.3 ms                               | 74.3 ms: 1.12x faster |
+---------------------+---------------------------------------+-----------------------+
| spectral_norm       | 118 ms                                | 112 ms: 1.06x faster  |
+---------------------+---------------------------------------+-----------------------+
| sqlglot_parse       | 1.71 ms                               | 1.61 ms: 1.06x faster |
+---------------------+---------------------------------------+-----------------------+
| sqlglot_transpile   | 2.00 ms                               | 1.89 ms: 1.06x faster |
+---------------------+---------------------------------------+-----------------------+
| unpickle_list       | 4.73 us                               | 5.06 us: 1.07x slower |
+---------------------+---------------------------------------+-----------------------+
| Geometric mean      | (ref)                                 | 1.01x faster          |
+---------------------+---------------------------------------+-----------------------+

Benchmark hidden because not significant (68): 2to3, async_generators, async_tree_none, async_tree_cpu_io_mixed, async_tree_io, async_tree_memoization, chameleon, chaos, bench_mp_pool, bench_thread_pool, coroutines, deepcopy, deepcopy_reduce, deepcopy_memo, deltablue, django_template, docutils, dulwich_log, fannkuch, float, generators, genshi_text, genshi_xml, go, hexiom, html5lib, json_dumps, json_loads, logging_format, logging_simple, mako, mdp, meteor_contest, nbody, nqueens, pathlib, pickle, pickle_dict, pickle_list, pidigits, pprint_safe_repr, pprint_pformat, pyflate, python_startup, python_startup_no_site, raytrace, regex_compile, regex_dna, regex_effbot, scimark_fft, scimark_sor, scimark_sparse_mat_mult, sqlglot_optimize, sqlglot_normalize, sqlite_synth, sympy_expand, sympy_integrate, sympy_sum, sympy_str, telco, tornado_http, unpack_sequence, unpickle, unpickle_pure_python, xml_etree_parse, xml_etree_iterparse, xml_etree_generate, xml_etree_process

Although the performance gain may not be high, the side effects are minimal and should be a positive optimization.

bedevere-bot mentioned this issue Feb 7, 2023

gh-101632: Add a RETURN_CONST instruction #101633

Merged

arhadthedev added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Feb 7, 2023

iritkatriel pushed a commit that referenced this issue Feb 7, 2023

gh-101632: Add the new RETURN_CONST opcode (#101633)

753fc8a

penguin-wwy closed this as completed Feb 8, 2023

hugovk mentioned this issue Sep 27, 2023

GH-109190: Copyedit 3.12 What's New: Bytecode #109821

Merged

FranklinLiang mentioned this issue Jul 1, 2024

Case where LOAD_CONST and RETURN_VALUE are not combined into RETURN_CONST #121246

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add return const instruction #101632

Add return const instruction #101632

penguin-wwy commented Feb 7, 2023 •

edited by bedevere-bot

Loading

penguin-wwy commented Feb 7, 2023

penguin-wwy commented Feb 7, 2023 •

edited

Loading

Add return const instruction #101632

Add return const instruction #101632

Comments

penguin-wwy commented Feb 7, 2023 • edited by bedevere-bot Loading

Linked PRs

penguin-wwy commented Feb 7, 2023

penguin-wwy commented Feb 7, 2023 • edited Loading

penguin-wwy commented Feb 7, 2023 •

edited by bedevere-bot

Loading

penguin-wwy commented Feb 7, 2023 •

edited

Loading