Skip to content

cpython 3.13 installed with UV slow and not compiled with --enable-experimental-jit=yes-off` #535

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
paugier opened this issue Feb 20, 2025 · 17 comments

Comments

@paugier
Copy link

paugier commented Feb 20, 2025

I tried a very simple pure Python benchmark (see https://gricad-gitlab.univ-grenoble-alpes.fr/augierpi/augierpi.gricad-pages.univ-grenoble-alpes.fr/-/tree/branch/default/content/docs/2025/about-py-jit) and figured out that CPython 3.13 installed with UV is slower than CPython 3.13 installed from conda-forge.

The benchmark is very simple (the goal was to be able to observe an effect of the new JIT in CPython 3.13):

def short_calcul(n):
    result = 0
    for i in range(1, n+1):
        result += i
    return result

def long_calcul(num):
    result = 0
    for i in range(num):
        result += short_calcul(i) - short_calcul(i)
    return result

The results:

$ /usr/bin/python3 bench_loops_sum.py
3.11.2 (main, Sep 14 2024, 03:00:30) [GCC 12.2.0]
Number of long_calcul per second: 56.10

$ pypy bench_loops_sum.py
3.11.11 (b38de282cead, Feb 05 2025, 16:26:37)
[PyPy 7.3.18 with GCC 10.2.1 20210130 (Red Hat 10.2.1-11)]
Number of long_calcul per second: 1992.83

$ python bench_loops_sum.py
3.13.2 | packaged by conda-forge | (main, Feb 17 2025, 14:10:22) [GCC 13.3.0]
Number of long_calcul per second: 51.12

$ PYTHON_JIT=1 python bench_loops_sum.py
3.13.2 | packaged by conda-forge | (main, Feb 17 2025, 14:10:22) [GCC 13.3.0]
Number of long_calcul per second: 60.39

$ python bench_loops_sum.py
3.13.2 (main, Feb 12 2025, 14:51:17) [Clang 19.1.6 ]
Number of long_calcul per second: 40.91

$ python bench_loops_sum.py
3.14.0a5 (main, Feb 12 2025, 14:51:40) [Clang 19.1.6 ]
Number of long_calcul per second: 51.41

This is bad:

  • CPython 3.13 installed with UV is slow
  • CPython installed with UV are not compiled with --enable-experimental-jit=yes-off
@zanieb
Copy link
Member

zanieb commented Feb 24, 2025

Thanks for the report. Do you know if Conda builds with any particular performance flags?

@zanieb
Copy link
Member

zanieb commented Feb 25, 2025

I did an actual performance run (Linux x86-64) and it looks like there's a significant difference here

❯ uvx pyperf compare_to conda-forge-313.json pbs-313.json
All benchmarks:
===============

2to3: Mean +- std dev: [conda-forge-313] 230 ms +- 1 ms -> [pbs-313] 279 ms +- 2 ms: 1.21x slower
async_generators: Mean +- std dev: [conda-forge-313] 347 ms +- 4 ms -> [pbs-313] 451 ms +- 4 ms: 1.30x slower
async_tree_none: Mean +- std dev: [conda-forge-313] 325 ms +- 9 ms -> [pbs-313] 402 ms +- 10 ms: 1.24x slower
async_tree_cpu_io_mixed: Mean +- std dev: [conda-forge-313] 516 ms +- 6 ms -> [pbs-313] 620 ms +- 6 ms: 1.20x slower
async_tree_cpu_io_mixed_tg: Mean +- std dev: [conda-forge-313] 518 ms +- 33 ms -> [pbs-313] 635 ms +- 34 ms: 1.23x slower
async_tree_eager: Mean +- std dev: [conda-forge-313] 104 ms +- 2 ms -> [pbs-313] 138 ms +- 2 ms: 1.33x slower
async_tree_eager_cpu_io_mixed: Mean +- std dev: [conda-forge-313] 369 ms +- 9 ms -> [pbs-313] 427 ms +- 9 ms: 1.16x slower
async_tree_eager_cpu_io_mixed_tg: Mean +- std dev: [conda-forge-313] 471 ms +- 23 ms -> [pbs-313] 552 ms +- 21 ms: 1.17x slower
async_tree_eager_io: Mean +- std dev: [conda-forge-313] 780 ms +- 37 ms -> [pbs-313] 926 ms +- 41 ms: 1.19x slower
async_tree_eager_io_tg: Mean +- std dev: [conda-forge-313] 785 ms +- 51 ms -> [pbs-313] 917 ms +- 47 ms: 1.17x slower
async_tree_eager_memoization: Mean +- std dev: [conda-forge-313] 235 ms +- 13 ms -> [pbs-313] 283 ms +- 12 ms: 1.21x slower
async_tree_eager_memoization_tg: Mean +- std dev: [conda-forge-313] 321 ms +- 19 ms -> [pbs-313] 396 ms +- 19 ms: 1.23x slower
async_tree_eager_tg: Mean +- std dev: [conda-forge-313] 247 ms +- 11 ms -> [pbs-313] 302 ms +- 12 ms: 1.23x slower
async_tree_io: Mean +- std dev: [conda-forge-313] 744 ms +- 34 ms -> [pbs-313] 907 ms +- 36 ms: 1.22x slower
async_tree_io_tg: Mean +- std dev: [conda-forge-313] 746 ms +- 37 ms -> [pbs-313] 910 ms +- 37 ms: 1.22x slower
async_tree_memoization: Mean +- std dev: [conda-forge-313] 400 ms +- 43 ms -> [pbs-313] 494 ms +- 47 ms: 1.24x slower
async_tree_memoization_tg: Mean +- std dev: [conda-forge-313] 404 ms +- 5 ms -> [pbs-313] 500 ms +- 3 ms: 1.24x slower
async_tree_none_tg: Mean +- std dev: [conda-forge-313] 295 ms +- 7 ms -> [pbs-313] 372 ms +- 6 ms: 1.26x slower
asyncio_tcp: Mean +- std dev: [conda-forge-313] 365 ms +- 4 ms -> [pbs-313] 359 ms +- 3 ms: 1.02x faster
asyncio_tcp_ssl: Mean +- std dev: [conda-forge-313] 1.34 sec +- 0.01 sec -> [pbs-313] 1.35 sec +- 0.00 sec: 1.01x slower
asyncio_websockets: Mean +- std dev: [conda-forge-313] 519 ms +- 7 ms -> [pbs-313] 1.54 sec +- 0.00 sec: 2.96x slower
chameleon: Mean +- std dev: [conda-forge-313] 6.17 ms +- 0.07 ms -> [pbs-313] 8.54 ms +- 0.06 ms: 1.38x slower
chaos: Mean +- std dev: [conda-forge-313] 54.5 ms +- 1.5 ms -> [pbs-313] 72.7 ms +- 0.7 ms: 1.33x slower
comprehensions: Mean +- std dev: [conda-forge-313] 15.0 us +- 0.2 us -> [pbs-313] 19.7 us +- 0.2 us: 1.31x slower
bench_mp_pool: Mean +- std dev: [conda-forge-313] 15.4 ms +- 9.2 ms -> [pbs-313] 7.15 ms +- 1.30 ms: 2.15x faster
bench_thread_pool: Mean +- std dev: [conda-forge-313] 913 us +- 39 us -> [pbs-313] 980 us +- 30 us: 1.07x slower
coroutines: Mean +- std dev: [conda-forge-313] 21.9 ms +- 0.2 ms -> [pbs-313] 27.8 ms +- 0.2 ms: 1.27x slower
coverage: Mean +- std dev: [conda-forge-313] 74.7 ms +- 0.9 ms -> [pbs-313] 88.8 ms +- 1.3 ms: 1.19x slower
crypto_pyaes: Mean +- std dev: [conda-forge-313] 62.5 ms +- 0.5 ms -> [pbs-313] 82.2 ms +- 0.8 ms: 1.31x slower
dask: Mean +- std dev: [conda-forge-313] 301 ms +- 15 ms -> [pbs-313] 351 ms +- 14 ms: 1.17x slower
deepcopy: Mean +- std dev: [conda-forge-313] 333 us +- 5 us -> [pbs-313] 437 us +- 4 us: 1.31x slower
deepcopy_reduce: Mean +- std dev: [conda-forge-313] 3.04 us +- 0.06 us -> [pbs-313] 4.08 us +- 0.03 us: 1.35x slower
deepcopy_memo: Mean +- std dev: [conda-forge-313] 36.9 us +- 0.4 us -> [pbs-313] 47.3 us +- 0.6 us: 1.28x slower
deltablue: Mean +- std dev: [conda-forge-313] 2.74 ms +- 0.02 ms -> [pbs-313] 3.95 ms +- 0.03 ms: 1.44x slower
django_template: Mean +- std dev: [conda-forge-313] 31.3 ms +- 0.4 ms -> [pbs-313] 43.4 ms +- 0.4 ms: 1.39x slower
docutils: Mean +- std dev: [conda-forge-313] 2.05 sec +- 0.01 sec -> [pbs-313] 2.39 sec +- 0.02 sec: 1.16x slower
dulwich_log: Mean +- std dev: [conda-forge-313] 59.2 ms +- 0.5 ms -> [pbs-313] 76.3 ms +- 0.4 ms: 1.29x slower
fannkuch: Mean +- std dev: [conda-forge-313] 354 ms +- 2 ms -> [pbs-313] 475 ms +- 4 ms: 1.34x slower
float: Mean +- std dev: [conda-forge-313] 71.7 ms +- 0.9 ms -> [pbs-313] 94.7 ms +- 1.3 ms: 1.32x slower
create_gc_cycles: Mean +- std dev: [conda-forge-313] 921 us +- 6 us -> [pbs-313] 1.07 ms +- 0.00 ms: 1.16x slower
gc_traversal: Mean +- std dev: [conda-forge-313] 3.15 ms +- 0.28 ms -> [pbs-313] 3.51 ms +- 0.09 ms: 1.12x slower
generators: Mean +- std dev: [conda-forge-313] 28.2 ms +- 0.5 ms -> [pbs-313] 35.0 ms +- 0.2 ms: 1.24x slower
genshi_text: Mean +- std dev: [conda-forge-313] 20.2 ms +- 0.3 ms -> [pbs-313] 27.8 ms +- 0.3 ms: 1.37x slower
genshi_xml: Mean +- std dev: [conda-forge-313] 48.4 ms +- 0.8 ms -> [pbs-313] 67.1 ms +- 0.6 ms: 1.39x slower
go: Mean +- std dev: [conda-forge-313] 130 ms +- 1 ms -> [pbs-313] 158 ms +- 1 ms: 1.21x slower
hexiom: Mean +- std dev: [conda-forge-313] 5.57 ms +- 0.04 ms -> [pbs-313] 7.56 ms +- 0.05 ms: 1.36x slower
html5lib: Mean +- std dev: [conda-forge-313] 62.8 ms +- 0.9 ms -> [pbs-313] 69.0 ms +- 0.5 ms: 1.10x slower
json_dumps: Mean +- std dev: [conda-forge-313] 9.08 ms +- 0.16 ms -> [pbs-313] 10.4 ms +- 0.1 ms: 1.14x slower
json_loads: Mean +- std dev: [conda-forge-313] 21.2 us +- 0.2 us -> [pbs-313] 24.6 us +- 0.2 us: 1.16x slower
logging_format: Mean +- std dev: [conda-forge-313] 5.91 us +- 0.20 us -> [pbs-313] 8.46 us +- 0.12 us: 1.43x slower
logging_silent: Mean +- std dev: [conda-forge-313] 92.0 ns +- 2.3 ns -> [pbs-313] 114 ns +- 2 ns: 1.24x slower
logging_simple: Mean +- std dev: [conda-forge-313] 5.29 us +- 0.08 us -> [pbs-313] 7.55 us +- 0.09 us: 1.43x slower
mako: Mean +- std dev: [conda-forge-313] 9.36 ms +- 0.20 ms -> [pbs-313] 12.0 ms +- 0.1 ms: 1.28x slower
mdp: Mean +- std dev: [conda-forge-313] 2.27 sec +- 0.03 sec -> [pbs-313] 2.34 sec +- 0.02 sec: 1.03x slower
meteor_contest: Mean +- std dev: [conda-forge-313] 87.1 ms +- 0.6 ms -> [pbs-313] 103 ms +- 1 ms: 1.18x slower
nbody: Mean +- std dev: [conda-forge-313] 82.5 ms +- 1.3 ms -> [pbs-313] 138 ms +- 4 ms: 1.67x slower
nqueens: Mean +- std dev: [conda-forge-313] 70.7 ms +- 0.9 ms -> [pbs-313] 98.3 ms +- 0.6 ms: 1.39x slower
pathlib: Mean +- std dev: [conda-forge-313] 19.7 ms +- 0.1 ms -> [pbs-313] 22.1 ms +- 0.1 ms: 1.12x slower
pickle: Mean +- std dev: [conda-forge-313] 10.6 us +- 0.1 us -> [pbs-313] 10.7 us +- 0.1 us: 1.01x slower
pickle_dict: Mean +- std dev: [conda-forge-313] 25.6 us +- 0.3 us -> [pbs-313] 18.9 us +- 0.6 us: 1.36x faster
pickle_list: Mean +- std dev: [conda-forge-313] 3.96 us +- 0.06 us -> [pbs-313] 3.61 us +- 0.08 us: 1.10x faster
pickle_pure_python: Mean +- std dev: [conda-forge-313] 267 us +- 2 us -> [pbs-313] 362 us +- 4 us: 1.35x slower
pidigits: Mean +- std dev: [conda-forge-313] 166 ms +- 1 ms -> [pbs-313] 180 ms +- 0 ms: 1.08x slower
pprint_safe_repr: Mean +- std dev: [conda-forge-313] 667 ms +- 12 ms -> [pbs-313] 953 ms +- 5 ms: 1.43x slower
pprint_pformat: Mean +- std dev: [conda-forge-313] 1.37 sec +- 0.02 sec -> [pbs-313] 1.95 sec +- 0.02 sec: 1.43x slower
pyflate: Mean +- std dev: [conda-forge-313] 403 ms +- 2 ms -> [pbs-313] 498 ms +- 2 ms: 1.24x slower
python_startup: Mean +- std dev: [conda-forge-313] 9.68 ms +- 0.03 ms -> [pbs-313] 13.6 ms +- 0.1 ms: 1.40x slower
python_startup_no_site: Mean +- std dev: [conda-forge-313] 6.76 ms +- 0.03 ms -> [pbs-313] 10.5 ms +- 0.1 ms: 1.55x slower
raytrace: Mean +- std dev: [conda-forge-313] 241 ms +- 3 ms -> [pbs-313] 300 ms +- 4 ms: 1.24x slower
regex_compile: Mean +- std dev: [conda-forge-313] 117 ms +- 1 ms -> [pbs-313] 157 ms +- 1 ms: 1.34x slower
regex_dna: Mean +- std dev: [conda-forge-313] 153 ms +- 3 ms -> [pbs-313] 151 ms +- 1 ms: 1.01x faster
regex_effbot: Mean +- std dev: [conda-forge-313] 2.43 ms +- 0.06 ms -> [pbs-313] 2.49 ms +- 0.05 ms: 1.02x slower
regex_v8: Mean +- std dev: [conda-forge-313] 21.7 ms +- 0.6 ms -> [pbs-313] 23.0 ms +- 0.2 ms: 1.06x slower
richards: Mean +- std dev: [conda-forge-313] 46.0 ms +- 0.6 ms -> [pbs-313] 57.4 ms +- 0.4 ms: 1.25x slower
richards_super: Mean +- std dev: [conda-forge-313] 52.3 ms +- 0.9 ms -> [pbs-313] 62.9 ms +- 0.4 ms: 1.20x slower
scimark_fft: Mean +- std dev: [conda-forge-313] 325 ms +- 5 ms -> [pbs-313] 426 ms +- 17 ms: 1.31x slower
scimark_lu: Mean +- std dev: [conda-forge-313] 109 ms +- 1 ms -> [pbs-313] 119 ms +- 1 ms: 1.09x slower
scimark_monte_carlo: Mean +- std dev: [conda-forge-313] 61.2 ms +- 0.5 ms -> [pbs-313] 75.0 ms +- 2.6 ms: 1.23x slower
scimark_sor: Mean +- std dev: [conda-forge-313] 123 ms +- 1 ms -> [pbs-313] 161 ms +- 1 ms: 1.31x slower
scimark_sparse_mat_mult: Mean +- std dev: [conda-forge-313] 3.89 ms +- 0.12 ms -> [pbs-313] 5.78 ms +- 0.40 ms: 1.49x slower
spectral_norm: Mean +- std dev: [conda-forge-313] 104 ms +- 1 ms -> [pbs-313] 138 ms +- 3 ms: 1.33x slower
sqlglot_normalize: Mean +- std dev: [conda-forge-313] 261 ms +- 3 ms -> [pbs-313] 131 ms +- 1 ms: 1.99x faster
sqlglot_optimize: Mean +- std dev: [conda-forge-313] 47.5 ms +- 0.4 ms -> [pbs-313] 61.6 ms +- 0.4 ms: 1.30x slower
sqlglot_parse: Mean +- std dev: [conda-forge-313] 1.12 ms +- 0.01 ms -> [pbs-313] 1.47 ms +- 0.01 ms: 1.31x slower
sqlglot_transpile: Mean +- std dev: [conda-forge-313] 1.37 ms +- 0.01 ms -> [pbs-313] 1.77 ms +- 0.01 ms: 1.29x slower
sqlite_synth: Mean +- std dev: [conda-forge-313] 2.09 us +- 0.03 us -> [pbs-313] 3.42 us +- 0.01 us: 1.63x slower
sympy_expand: Mean +- std dev: [conda-forge-313] 409 ms +- 3 ms -> [pbs-313] 514 ms +- 3 ms: 1.26x slower
sympy_integrate: Mean +- std dev: [conda-forge-313] 16.2 ms +- 0.1 ms -> [pbs-313] 19.1 ms +- 0.1 ms: 1.18x slower
sympy_sum: Mean +- std dev: [conda-forge-313] 122 ms +- 1 ms -> [pbs-313] 149 ms +- 1 ms: 1.22x slower
sympy_str: Mean +- std dev: [conda-forge-313] 238 ms +- 3 ms -> [pbs-313] 291 ms +- 2 ms: 1.22x slower
telco: Mean +- std dev: [conda-forge-313] 7.37 ms +- 0.18 ms -> [pbs-313] 9.37 ms +- 0.27 ms: 1.27x slower
tomli_loads: Mean +- std dev: [conda-forge-313] 1.94 sec +- 0.03 sec -> [pbs-313] 2.84 sec +- 0.07 sec: 1.47x slower
tornado_http: Mean +- std dev: [conda-forge-313] 91.0 ms +- 1.0 ms -> [pbs-313] 107 ms +- 1 ms: 1.18x slower
typing_runtime_protocols: Mean +- std dev: [conda-forge-313] 146 us +- 4 us -> [pbs-313] 185 us +- 3 us: 1.27x slower
unpack_sequence: Mean +- std dev: [conda-forge-313] 35.7 ns +- 0.4 ns -> [pbs-313] 48.1 ns +- 1.8 ns: 1.35x slower
unpickle: Mean +- std dev: [conda-forge-313] 11.6 us +- 0.2 us -> [pbs-313] 14.1 us +- 0.2 us: 1.22x slower
unpickle_list: Mean +- std dev: [conda-forge-313] 4.41 us +- 0.06 us -> [pbs-313] 4.80 us +- 0.05 us: 1.09x slower
unpickle_pure_python: Mean +- std dev: [conda-forge-313] 193 us +- 1 us -> [pbs-313] 247 us +- 2 us: 1.28x slower
xml_etree_parse: Mean +- std dev: [conda-forge-313] 129 ms +- 2 ms -> [pbs-313] 254 ms +- 2 ms: 1.96x slower
xml_etree_iterparse: Mean +- std dev: [conda-forge-313] 85.0 ms +- 1.1 ms -> [pbs-313] 141 ms +- 2 ms: 1.66x slower
xml_etree_generate: Mean +- std dev: [conda-forge-313] 77.0 ms +- 0.7 ms -> [pbs-313] 98.0 ms +- 0.7 ms: 1.27x slower
xml_etree_process: Mean +- std dev: [conda-forge-313] 53.0 ms +- 0.6 ms -> [pbs-313] 69.5 ms +- 0.6 ms: 1.31x slower

Geometric mean: 1.24x slower

I consider this fairly high priority, but I don't know what the source of the difference is.

@zanieb
Copy link
Member

zanieb commented Feb 25, 2025

Looking at https://github.com/conda-forge/python-feedstock/blob/main/recipe/build_base.sh and not seeing anything obvious.

@zanieb
Copy link
Member

zanieb commented Feb 25, 2025

@paugier What platform and architecture did you run your benchmarks on?

@zanieb
Copy link
Member

zanieb commented Feb 25, 2025

@zanieb
Copy link
Member

zanieb commented Feb 25, 2025

Our v3 builds are a bit better, but that's not the bulk of it (geometric mean: 1.20x slower)

@Fidget-Spinner
Copy link

FWIW, I don't see any slowdown on conda-forge 3.13 vs uv 3.14.0a5 on my machine (AMD64 Linux) on this benchmark:

(py313-conda-forge) ken@ken-Legion-5-Pro-16IAH7H:~/Documents/GitHub/cpython$ time python ./bm_calc.py

real	0m1.492s
user	0m1.488s
sys	0m0.003s

time uv run --python 3.14.0a5 python ./bm_calc.py

real	0m1.267s
user	0m1.255s
sys	0m0.011s

In fact, the conda forge is significantly slower

@zanieb
Copy link
Member

zanieb commented Feb 25, 2025

3.14 <-> 3.13 doesn't feel like a fair comparison since we're using the tail calling interpreter.

@Fidget-Spinner
Copy link

Oh wow, I do see a significant slowdown on 3.13 (compare the previous comment)

(cpython) ken@ken-Legion-5-Pro-16IAH7H:~/Documents/GitHub/cpython$ time uv run --python 3.13 python ./bm_calc.py

real	0m1.939s
user	0m1.897s
sys	0m0.025s

zanieb added a commit that referenced this issue Feb 27, 2025
See #535 

This builds the JIT, but disables it by default. Users can opt-in to
enable it at runtime.

3.14 and macOS support will follow, there are some hiccups there.
@paugier
Copy link
Author

paugier commented Mar 3, 2025

@paugier What platform and architecture did you run your benchmarks on?

linux-x86_64-gnu. Sorry for this late answer. I was offline skying with the family.

@zanieb
Copy link
Member

zanieb commented Mar 11, 2025

Should be fixed in the next release

❯ uv run -p 3.13.2 bench.py
3.13.2 (main, Feb 12 2025, 14:59:08) [Clang 19.1.6 ]
Number of long_calcul per second: 54.34

❯ uv run -p 3.13.2 bench.py
3.13.2 (main, Mar 11 2025, 17:30:09) [Clang 20.1.0 ]
Number of long_calcul per second: 77.16

❯ /Users/zb/workspace/conda/my-env/bin/python ../uv/example/bench.py
3.13.2 | packaged by conda-forge | (main, Feb 17 2025, 14:02:48) [Clang 18.1.8 ]
Number of long_calcul per second: 69.25

@zanieb zanieb closed this as completed Mar 11, 2025
@zanieb
Copy link
Member

zanieb commented Mar 12, 2025

And using hyperfine

Benchmark 1: uv 3.13.2
  Time (mean ± σ):     279.1 ms ±   2.8 ms    [User: 275.3 ms, System: 2.8 ms]
  Range (min … max):   274.9 ms … 283.4 ms    10 runs
 
Benchmark 2: conda-forge 3.13.2
  Time (mean ± σ):     331.7 ms ±   6.3 ms    [User: 328.3 ms, System: 2.5 ms]
  Range (min … max):   324.8 ms … 347.7 ms    10 runs
 
Summary
  uv 3.13.2 ran
    1.19 ± 0.03 times faster than conda-forge 3.13.2

@paugier
Copy link
Author

paugier commented Mar 12, 2025

That's awesome but could you also tell us what has been done to get this result?

@zanieb
Copy link
Member

zanieb commented Mar 16, 2025

Yeah, it was the LLVM 20 upgrade with the backport for the LLVM 19 regression (conda-forge is on LLVM 18 on macOS and gcc on Linux) llvm/llvm-project#114990. This was merged here in #553

This is briefly discussed in https://github.com/astral-sh/python-build-standalone/releases/tag/20250311

@paugier
Copy link
Author

paugier commented Mar 20, 2025

Note that on Linux, I still get a notable difference : Python 3.13 conda-forge compiled with GGC is approximately 35% faster than Python 3.13 UV - PBS compiled with Clang 20.

Details of the results here and code here.

Good news: one can now enable the JIT with PBS Python 3.13 and 3.14! But the effect is tiny, much less than with conda-forge Python 3.13.

@zanieb
Copy link
Member

zanieb commented Mar 20, 2025

I did a comprehensive benchmark and I'm seeing about a 6% difference

❯ uvx pyperf compare_to conda-forge-313.json pbs-313.json
Benchmarks with tag 'apps':
===========================

2to3: Mean +- std dev: [conda-forge-313] 230 ms +- 1 ms -> [pbs-313] 243 ms +- 1 ms: 1.05x slower
chameleon: Mean +- std dev: [conda-forge-313] 6.17 ms +- 0.07 ms -> [pbs-313] 6.35 ms +- 0.04 ms: 1.03x slower
docutils: Mean +- std dev: [conda-forge-313] 2.05 sec +- 0.01 sec -> [pbs-313] 2.15 sec +- 0.02 sec: 1.05x slower
html5lib: Mean +- std dev: [conda-forge-313] 62.8 ms +- 0.9 ms -> [pbs-313] 58.7 ms +- 0.3 ms: 1.07x faster
tornado_http: Mean +- std dev: [conda-forge-313] 91.0 ms +- 1.0 ms -> [pbs-313] 94.4 ms +- 1.0 ms: 1.04x slower

Geometric mean: 1.02x slower

Benchmarks with tag 'asyncio':
==============================

async_tree_none: Mean +- std dev: [conda-forge-313] 325 ms +- 9 ms -> [pbs-313] 349 ms +- 9 ms: 1.07x slower
async_tree_cpu_io_mixed: Mean +- std dev: [conda-forge-313] 516 ms +- 6 ms -> [pbs-313] 557 ms +- 6 ms: 1.08x slower
async_tree_cpu_io_mixed_tg: Mean +- std dev: [conda-forge-313] 518 ms +- 33 ms -> [pbs-313] 568 ms +- 34 ms: 1.10x slower
async_tree_eager: Mean +- std dev: [conda-forge-313] 104 ms +- 2 ms -> [pbs-313] 112 ms +- 1 ms: 1.08x slower
async_tree_eager_cpu_io_mixed: Mean +- std dev: [conda-forge-313] 369 ms +- 9 ms -> [pbs-313] 395 ms +- 9 ms: 1.07x slower
async_tree_eager_cpu_io_mixed_tg: Mean +- std dev: [conda-forge-313] 471 ms +- 23 ms -> [pbs-313] 508 ms +- 22 ms: 1.08x slower
async_tree_eager_io: Mean +- std dev: [conda-forge-313] 780 ms +- 37 ms -> [pbs-313] 834 ms +- 34 ms: 1.07x slower
async_tree_eager_io_tg: Mean +- std dev: [conda-forge-313] 785 ms +- 51 ms -> [pbs-313] 833 ms +- 49 ms: 1.06x slower
async_tree_eager_memoization: Mean +- std dev: [conda-forge-313] 235 ms +- 13 ms -> [pbs-313] 245 ms +- 13 ms: 1.05x slower
async_tree_eager_memoization_tg: Mean +- std dev: [conda-forge-313] 321 ms +- 19 ms -> [pbs-313] 342 ms +- 20 ms: 1.06x slower
async_tree_eager_tg: Mean +- std dev: [conda-forge-313] 247 ms +- 11 ms -> [pbs-313] 264 ms +- 12 ms: 1.07x slower
async_tree_io: Mean +- std dev: [conda-forge-313] 744 ms +- 34 ms -> [pbs-313] 800 ms +- 35 ms: 1.08x slower
async_tree_io_tg: Mean +- std dev: [conda-forge-313] 746 ms +- 37 ms -> [pbs-313] 799 ms +- 38 ms: 1.07x slower
async_tree_memoization: Mean +- std dev: [conda-forge-313] 400 ms +- 43 ms -> [pbs-313] 429 ms +- 45 ms: 1.07x slower
async_tree_memoization_tg: Mean +- std dev: [conda-forge-313] 404 ms +- 5 ms -> [pbs-313] 429 ms +- 2 ms: 1.06x slower
async_tree_none_tg: Mean +- std dev: [conda-forge-313] 295 ms +- 7 ms -> [pbs-313] 315 ms +- 7 ms: 1.07x slower

Geometric mean: 1.07x slower

Benchmarks with tag 'math':
===========================

float: Mean +- std dev: [conda-forge-313] 71.7 ms +- 0.9 ms -> [pbs-313] 80.4 ms +- 0.8 ms: 1.12x slower
nbody: Mean +- std dev: [conda-forge-313] 82.5 ms +- 1.3 ms -> [pbs-313] 81.7 ms +- 3.0 ms: 1.01x faster
pidigits: Mean +- std dev: [conda-forge-313] 166 ms +- 1 ms -> [pbs-313] 178 ms +- 0 ms: 1.07x slower

Geometric mean: 1.06x slower

Benchmarks with tag 'regex':
============================

regex_compile: Mean +- std dev: [conda-forge-313] 117 ms +- 1 ms -> [pbs-313] 121 ms +- 1 ms: 1.04x slower
regex_dna: Mean +- std dev: [conda-forge-313] 153 ms +- 3 ms -> [pbs-313] 139 ms +- 1 ms: 1.10x faster
regex_effbot: Mean +- std dev: [conda-forge-313] 2.43 ms +- 0.06 ms -> [pbs-313] 2.56 ms +- 0.06 ms: 1.05x slower
regex_v8: Mean +- std dev: [conda-forge-313] 21.7 ms +- 0.6 ms -> [pbs-313] 21.1 ms +- 0.3 ms: 1.03x faster

Geometric mean: 1.01x faster

Benchmarks with tag 'serialize':
================================

json_dumps: Mean +- std dev: [conda-forge-313] 9.08 ms +- 0.16 ms -> [pbs-313] 9.46 ms +- 0.13 ms: 1.04x slower
json_loads: Mean +- std dev: [conda-forge-313] 21.2 us +- 0.2 us -> [pbs-313] 24.5 us +- 0.3 us: 1.16x slower
pickle: Mean +- std dev: [conda-forge-313] 10.6 us +- 0.1 us -> [pbs-313] 11.1 us +- 0.2 us: 1.05x slower
pickle_dict: Mean +- std dev: [conda-forge-313] 25.6 us +- 0.3 us -> [pbs-313] 19.3 us +- 0.5 us: 1.32x faster
pickle_list: Mean +- std dev: [conda-forge-313] 3.96 us +- 0.06 us -> [pbs-313] 3.61 us +- 0.03 us: 1.10x faster
pickle_pure_python: Mean +- std dev: [conda-forge-313] 267 us +- 2 us -> [pbs-313] 280 us +- 3 us: 1.05x slower
tomli_loads: Mean +- std dev: [conda-forge-313] 1.94 sec +- 0.03 sec -> [pbs-313] 2.03 sec +- 0.02 sec: 1.05x slower
unpickle: Mean +- std dev: [conda-forge-313] 11.6 us +- 0.2 us -> [pbs-313] 14.1 us +- 0.2 us: 1.22x slower
unpickle_list: Mean +- std dev: [conda-forge-313] 4.41 us +- 0.06 us -> [pbs-313] 4.63 us +- 0.12 us: 1.05x slower
unpickle_pure_python: Mean +- std dev: [conda-forge-313] 193 us +- 1 us -> [pbs-313] 189 us +- 2 us: 1.02x faster
xml_etree_parse: Mean +- std dev: [conda-forge-313] 129 ms +- 2 ms -> [pbs-313] 248 ms +- 1 ms: 1.92x slower
xml_etree_iterparse: Mean +- std dev: [conda-forge-313] 85.0 ms +- 1.1 ms -> [pbs-313] 135 ms +- 1 ms: 1.59x slower
xml_etree_generate: Mean +- std dev: [conda-forge-313] 77.0 ms +- 0.7 ms -> [pbs-313] 83.2 ms +- 0.6 ms: 1.08x slower
xml_etree_process: Mean +- std dev: [conda-forge-313] 53.0 ms +- 0.6 ms -> [pbs-313] 57.8 ms +- 0.6 ms: 1.09x slower

Geometric mean: 1.11x slower

Benchmarks with tag 'startup':
==============================

python_startup: Mean +- std dev: [conda-forge-313] 9.68 ms +- 0.03 ms -> [pbs-313] 13.5 ms +- 0.1 ms: 1.39x slower
python_startup_no_site: Mean +- std dev: [conda-forge-313] 6.76 ms +- 0.03 ms -> [pbs-313] 10.5 ms +- 0.1 ms: 1.55x slower

Geometric mean: 1.47x slower

Benchmarks with tag 'template':
===============================

django_template: Mean +- std dev: [conda-forge-313] 31.3 ms +- 0.4 ms -> [pbs-313] 35.0 ms +- 0.3 ms: 1.12x slower
genshi_text: Mean +- std dev: [conda-forge-313] 20.2 ms +- 0.3 ms -> [pbs-313] 21.3 ms +- 0.2 ms: 1.05x slower
genshi_xml: Mean +- std dev: [conda-forge-313] 48.4 ms +- 0.8 ms -> [pbs-313] 50.8 ms +- 0.6 ms: 1.05x slower
mako: Mean +- std dev: [conda-forge-313] 9.36 ms +- 0.20 ms -> [pbs-313] 9.87 ms +- 0.08 ms: 1.05x slower

Geometric mean: 1.07x slower

All benchmarks:
===============

2to3: Mean +- std dev: [conda-forge-313] 230 ms +- 1 ms -> [pbs-313] 243 ms +- 1 ms: 1.05x slower
async_generators: Mean +- std dev: [conda-forge-313] 347 ms +- 4 ms -> [pbs-313] 410 ms +- 3 ms: 1.18x slower
async_tree_none: Mean +- std dev: [conda-forge-313] 325 ms +- 9 ms -> [pbs-313] 349 ms +- 9 ms: 1.07x slower
async_tree_cpu_io_mixed: Mean +- std dev: [conda-forge-313] 516 ms +- 6 ms -> [pbs-313] 557 ms +- 6 ms: 1.08x slower
async_tree_cpu_io_mixed_tg: Mean +- std dev: [conda-forge-313] 518 ms +- 33 ms -> [pbs-313] 568 ms +- 34 ms: 1.10x slower
async_tree_eager: Mean +- std dev: [conda-forge-313] 104 ms +- 2 ms -> [pbs-313] 112 ms +- 1 ms: 1.08x slower
async_tree_eager_cpu_io_mixed: Mean +- std dev: [conda-forge-313] 369 ms +- 9 ms -> [pbs-313] 395 ms +- 9 ms: 1.07x slower
async_tree_eager_cpu_io_mixed_tg: Mean +- std dev: [conda-forge-313] 471 ms +- 23 ms -> [pbs-313] 508 ms +- 22 ms: 1.08x slower
async_tree_eager_io: Mean +- std dev: [conda-forge-313] 780 ms +- 37 ms -> [pbs-313] 834 ms +- 34 ms: 1.07x slower
async_tree_eager_io_tg: Mean +- std dev: [conda-forge-313] 785 ms +- 51 ms -> [pbs-313] 833 ms +- 49 ms: 1.06x slower
async_tree_eager_memoization: Mean +- std dev: [conda-forge-313] 235 ms +- 13 ms -> [pbs-313] 245 ms +- 13 ms: 1.05x slower
async_tree_eager_memoization_tg: Mean +- std dev: [conda-forge-313] 321 ms +- 19 ms -> [pbs-313] 342 ms +- 20 ms: 1.06x slower
async_tree_eager_tg: Mean +- std dev: [conda-forge-313] 247 ms +- 11 ms -> [pbs-313] 264 ms +- 12 ms: 1.07x slower
async_tree_io: Mean +- std dev: [conda-forge-313] 744 ms +- 34 ms -> [pbs-313] 800 ms +- 35 ms: 1.08x slower
async_tree_io_tg: Mean +- std dev: [conda-forge-313] 746 ms +- 37 ms -> [pbs-313] 799 ms +- 38 ms: 1.07x slower
async_tree_memoization: Mean +- std dev: [conda-forge-313] 400 ms +- 43 ms -> [pbs-313] 429 ms +- 45 ms: 1.07x slower
async_tree_memoization_tg: Mean +- std dev: [conda-forge-313] 404 ms +- 5 ms -> [pbs-313] 429 ms +- 2 ms: 1.06x slower
async_tree_none_tg: Mean +- std dev: [conda-forge-313] 295 ms +- 7 ms -> [pbs-313] 315 ms +- 7 ms: 1.07x slower
asyncio_tcp: Mean +- std dev: [conda-forge-313] 365 ms +- 4 ms -> [pbs-313] 349 ms +- 4 ms: 1.04x faster
asyncio_websockets: Mean +- std dev: [conda-forge-313] 519 ms +- 7 ms -> [pbs-313] 1.52 sec +- 0.01 sec: 2.94x slower
chameleon: Mean +- std dev: [conda-forge-313] 6.17 ms +- 0.07 ms -> [pbs-313] 6.35 ms +- 0.04 ms: 1.03x slower
chaos: Mean +- std dev: [conda-forge-313] 54.5 ms +- 1.5 ms -> [pbs-313] 55.7 ms +- 0.5 ms: 1.02x slower
comprehensions: Mean +- std dev: [conda-forge-313] 15.0 us +- 0.2 us -> [pbs-313] 15.3 us +- 0.1 us: 1.02x slower
bench_mp_pool: Mean +- std dev: [conda-forge-313] 15.4 ms +- 9.2 ms -> [pbs-313] 8.87 ms +- 2.77 ms: 1.74x faster
coroutines: Mean +- std dev: [conda-forge-313] 21.9 ms +- 0.2 ms -> [pbs-313] 22.9 ms +- 0.2 ms: 1.04x slower
coverage: Mean +- std dev: [conda-forge-313] 74.7 ms +- 0.9 ms -> [pbs-313] 76.8 ms +- 2.1 ms: 1.03x slower
crypto_pyaes: Mean +- std dev: [conda-forge-313] 62.5 ms +- 0.5 ms -> [pbs-313] 66.9 ms +- 0.6 ms: 1.07x slower
dask: Mean +- std dev: [conda-forge-313] 301 ms +- 15 ms -> [pbs-313] 317 ms +- 15 ms: 1.05x slower
deepcopy: Mean +- std dev: [conda-forge-313] 333 us +- 5 us -> [pbs-313] 336 us +- 2 us: 1.01x slower
deepcopy_reduce: Mean +- std dev: [conda-forge-313] 3.04 us +- 0.06 us -> [pbs-313] 3.18 us +- 0.02 us: 1.05x slower
deepcopy_memo: Mean +- std dev: [conda-forge-313] 36.9 us +- 0.4 us -> [pbs-313] 35.8 us +- 0.2 us: 1.03x faster
deltablue: Mean +- std dev: [conda-forge-313] 2.74 ms +- 0.02 ms -> [pbs-313] 2.76 ms +- 0.02 ms: 1.01x slower
django_template: Mean +- std dev: [conda-forge-313] 31.3 ms +- 0.4 ms -> [pbs-313] 35.0 ms +- 0.3 ms: 1.12x slower
docutils: Mean +- std dev: [conda-forge-313] 2.05 sec +- 0.01 sec -> [pbs-313] 2.15 sec +- 0.02 sec: 1.05x slower
dulwich_log: Mean +- std dev: [conda-forge-313] 59.2 ms +- 0.5 ms -> [pbs-313] 68.7 ms +- 0.2 ms: 1.16x slower
fannkuch: Mean +- std dev: [conda-forge-313] 354 ms +- 2 ms -> [pbs-313] 364 ms +- 3 ms: 1.03x slower
float: Mean +- std dev: [conda-forge-313] 71.7 ms +- 0.9 ms -> [pbs-313] 80.4 ms +- 0.8 ms: 1.12x slower
create_gc_cycles: Mean +- std dev: [conda-forge-313] 921 us +- 6 us -> [pbs-313] 1.08 ms +- 0.00 ms: 1.17x slower
gc_traversal: Mean +- std dev: [conda-forge-313] 3.15 ms +- 0.28 ms -> [pbs-313] 3.37 ms +- 0.27 ms: 1.07x slower
generators: Mean +- std dev: [conda-forge-313] 28.2 ms +- 0.5 ms -> [pbs-313] 29.2 ms +- 0.2 ms: 1.04x slower
genshi_text: Mean +- std dev: [conda-forge-313] 20.2 ms +- 0.3 ms -> [pbs-313] 21.3 ms +- 0.2 ms: 1.05x slower
genshi_xml: Mean +- std dev: [conda-forge-313] 48.4 ms +- 0.8 ms -> [pbs-313] 50.8 ms +- 0.6 ms: 1.05x slower
go: Mean +- std dev: [conda-forge-313] 130 ms +- 1 ms -> [pbs-313] 126 ms +- 1 ms: 1.03x faster
hexiom: Mean +- std dev: [conda-forge-313] 5.57 ms +- 0.04 ms -> [pbs-313] 5.76 ms +- 0.04 ms: 1.03x slower
html5lib: Mean +- std dev: [conda-forge-313] 62.8 ms +- 0.9 ms -> [pbs-313] 58.7 ms +- 0.3 ms: 1.07x faster
json_dumps: Mean +- std dev: [conda-forge-313] 9.08 ms +- 0.16 ms -> [pbs-313] 9.46 ms +- 0.13 ms: 1.04x slower
json_loads: Mean +- std dev: [conda-forge-313] 21.2 us +- 0.2 us -> [pbs-313] 24.5 us +- 0.3 us: 1.16x slower
logging_format: Mean +- std dev: [conda-forge-313] 5.91 us +- 0.20 us -> [pbs-313] 6.23 us +- 0.14 us: 1.05x slower
logging_silent: Mean +- std dev: [conda-forge-313] 92.0 ns +- 2.3 ns -> [pbs-313] 84.3 ns +- 1.4 ns: 1.09x faster
logging_simple: Mean +- std dev: [conda-forge-313] 5.29 us +- 0.08 us -> [pbs-313] 5.50 us +- 0.09 us: 1.04x slower
mako: Mean +- std dev: [conda-forge-313] 9.36 ms +- 0.20 ms -> [pbs-313] 9.87 ms +- 0.08 ms: 1.05x slower
mdp: Mean +- std dev: [conda-forge-313] 2.27 sec +- 0.03 sec -> [pbs-313] 2.34 sec +- 0.03 sec: 1.03x slower
meteor_contest: Mean +- std dev: [conda-forge-313] 87.1 ms +- 0.6 ms -> [pbs-313] 93.6 ms +- 0.6 ms: 1.08x slower
nbody: Mean +- std dev: [conda-forge-313] 82.5 ms +- 1.3 ms -> [pbs-313] 81.7 ms +- 3.0 ms: 1.01x faster
nqueens: Mean +- std dev: [conda-forge-313] 70.7 ms +- 0.9 ms -> [pbs-313] 78.1 ms +- 0.4 ms: 1.10x slower
pathlib: Mean +- std dev: [conda-forge-313] 19.7 ms +- 0.1 ms -> [pbs-313] 21.0 ms +- 0.1 ms: 1.07x slower
pickle: Mean +- std dev: [conda-forge-313] 10.6 us +- 0.1 us -> [pbs-313] 11.1 us +- 0.2 us: 1.05x slower
pickle_dict: Mean +- std dev: [conda-forge-313] 25.6 us +- 0.3 us -> [pbs-313] 19.3 us +- 0.5 us: 1.32x faster
pickle_list: Mean +- std dev: [conda-forge-313] 3.96 us +- 0.06 us -> [pbs-313] 3.61 us +- 0.03 us: 1.10x faster
pickle_pure_python: Mean +- std dev: [conda-forge-313] 267 us +- 2 us -> [pbs-313] 280 us +- 3 us: 1.05x slower
pidigits: Mean +- std dev: [conda-forge-313] 166 ms +- 1 ms -> [pbs-313] 178 ms +- 0 ms: 1.07x slower
pprint_safe_repr: Mean +- std dev: [conda-forge-313] 667 ms +- 12 ms -> [pbs-313] 757 ms +- 5 ms: 1.14x slower
pprint_pformat: Mean +- std dev: [conda-forge-313] 1.37 sec +- 0.02 sec -> [pbs-313] 1.54 sec +- 0.01 sec: 1.13x slower
pyflate: Mean +- std dev: [conda-forge-313] 403 ms +- 2 ms -> [pbs-313] 399 ms +- 3 ms: 1.01x faster
python_startup: Mean +- std dev: [conda-forge-313] 9.68 ms +- 0.03 ms -> [pbs-313] 13.5 ms +- 0.1 ms: 1.39x slower
python_startup_no_site: Mean +- std dev: [conda-forge-313] 6.76 ms +- 0.03 ms -> [pbs-313] 10.5 ms +- 0.1 ms: 1.55x slower
raytrace: Mean +- std dev: [conda-forge-313] 241 ms +- 3 ms -> [pbs-313] 252 ms +- 2 ms: 1.05x slower
regex_compile: Mean +- std dev: [conda-forge-313] 117 ms +- 1 ms -> [pbs-313] 121 ms +- 1 ms: 1.04x slower
regex_dna: Mean +- std dev: [conda-forge-313] 153 ms +- 3 ms -> [pbs-313] 139 ms +- 1 ms: 1.10x faster
regex_effbot: Mean +- std dev: [conda-forge-313] 2.43 ms +- 0.06 ms -> [pbs-313] 2.56 ms +- 0.06 ms: 1.05x slower
regex_v8: Mean +- std dev: [conda-forge-313] 21.7 ms +- 0.6 ms -> [pbs-313] 21.1 ms +- 0.3 ms: 1.03x faster
richards: Mean +- std dev: [conda-forge-313] 46.0 ms +- 0.6 ms -> [pbs-313] 42.8 ms +- 0.4 ms: 1.07x faster
richards_super: Mean +- std dev: [conda-forge-313] 52.3 ms +- 0.9 ms -> [pbs-313] 48.8 ms +- 0.4 ms: 1.07x faster
scimark_fft: Mean +- std dev: [conda-forge-313] 325 ms +- 5 ms -> [pbs-313] 316 ms +- 9 ms: 1.03x faster
scimark_lu: Mean +- std dev: [conda-forge-313] 109 ms +- 1 ms -> [pbs-313] 100 ms +- 1 ms: 1.09x faster
scimark_monte_carlo: Mean +- std dev: [conda-forge-313] 61.2 ms +- 0.5 ms -> [pbs-313] 54.5 ms +- 1.4 ms: 1.12x faster
scimark_sor: Mean +- std dev: [conda-forge-313] 123 ms +- 1 ms -> [pbs-313] 116 ms +- 1 ms: 1.06x faster
scimark_sparse_mat_mult: Mean +- std dev: [conda-forge-313] 3.89 ms +- 0.12 ms -> [pbs-313] 4.40 ms +- 0.44 ms: 1.13x slower
spectral_norm: Mean +- std dev: [conda-forge-313] 104 ms +- 1 ms -> [pbs-313] 105 ms +- 1 ms: 1.01x slower
sqlglot_normalize: Mean +- std dev: [conda-forge-313] 261 ms +- 3 ms -> [pbs-313] 110 ms +- 1 ms: 2.36x faster
sqlglot_optimize: Mean +- std dev: [conda-forge-313] 47.5 ms +- 0.4 ms -> [pbs-313] 53.4 ms +- 0.3 ms: 1.12x slower
sqlglot_parse: Mean +- std dev: [conda-forge-313] 1.12 ms +- 0.01 ms -> [pbs-313] 1.14 ms +- 0.01 ms: 1.02x slower
sqlglot_transpile: Mean +- std dev: [conda-forge-313] 1.37 ms +- 0.01 ms -> [pbs-313] 1.40 ms +- 0.01 ms: 1.03x slower
sqlite_synth: Mean +- std dev: [conda-forge-313] 2.09 us +- 0.03 us -> [pbs-313] 3.25 us +- 0.02 us: 1.56x slower
sympy_expand: Mean +- std dev: [conda-forge-313] 409 ms +- 3 ms -> [pbs-313] 448 ms +- 2 ms: 1.10x slower
sympy_integrate: Mean +- std dev: [conda-forge-313] 16.2 ms +- 0.1 ms -> [pbs-313] 16.8 ms +- 0.1 ms: 1.04x slower
sympy_sum: Mean +- std dev: [conda-forge-313] 122 ms +- 1 ms -> [pbs-313] 130 ms +- 1 ms: 1.06x slower
sympy_str: Mean +- std dev: [conda-forge-313] 238 ms +- 3 ms -> [pbs-313] 252 ms +- 1 ms: 1.06x slower
telco: Mean +- std dev: [conda-forge-313] 7.37 ms +- 0.18 ms -> [pbs-313] 8.27 ms +- 0.12 ms: 1.12x slower
tomli_loads: Mean +- std dev: [conda-forge-313] 1.94 sec +- 0.03 sec -> [pbs-313] 2.03 sec +- 0.02 sec: 1.05x slower
tornado_http: Mean +- std dev: [conda-forge-313] 91.0 ms +- 1.0 ms -> [pbs-313] 94.4 ms +- 1.0 ms: 1.04x slower
typing_runtime_protocols: Mean +- std dev: [conda-forge-313] 146 us +- 4 us -> [pbs-313] 154 us +- 3 us: 1.05x slower
unpack_sequence: Mean +- std dev: [conda-forge-313] 35.7 ns +- 0.4 ns -> [pbs-313] 39.9 ns +- 0.7 ns: 1.12x slower
unpickle: Mean +- std dev: [conda-forge-313] 11.6 us +- 0.2 us -> [pbs-313] 14.1 us +- 0.2 us: 1.22x slower
unpickle_list: Mean +- std dev: [conda-forge-313] 4.41 us +- 0.06 us -> [pbs-313] 4.63 us +- 0.12 us: 1.05x slower
unpickle_pure_python: Mean +- std dev: [conda-forge-313] 193 us +- 1 us -> [pbs-313] 189 us +- 2 us: 1.02x faster
xml_etree_parse: Mean +- std dev: [conda-forge-313] 129 ms +- 2 ms -> [pbs-313] 248 ms +- 1 ms: 1.92x slower
xml_etree_iterparse: Mean +- std dev: [conda-forge-313] 85.0 ms +- 1.1 ms -> [pbs-313] 135 ms +- 1 ms: 1.59x slower
xml_etree_generate: Mean +- std dev: [conda-forge-313] 77.0 ms +- 0.7 ms -> [pbs-313] 83.2 ms +- 0.6 ms: 1.08x slower
xml_etree_process: Mean +- std dev: [conda-forge-313] 53.0 ms +- 0.6 ms -> [pbs-313] 57.8 ms +- 0.6 ms: 1.09x slower

Benchmark hidden because not significant (2): asyncio_tcp_ssl, bench_thread_pool

Geometric mean: 1.06x slower

This seems far more inline with what I'd expect. The statically linked libpython probably accounts for the additional difference.

@zanieb
Copy link
Member

zanieb commented Mar 20, 2025

I can reproduce some of the difference you're reporting for that microbenchmark

❯ hyperfine "/home/zb/.local/share/uv/python/cpython-3.13.2-linux-x86_64-gnu/bin/python3.13 bench.py" "/home/zb/miniforge3/envs/env-py3.13/bin/python bench.py"
Benchmark 1: /home/zb/.local/share/uv/python/cpython-3.13.2-linux-x86_64-gnu/bin/python3.13 bench.py
  Time (mean ± σ):      2.992 s ±  0.012 s    [User: 2.986 s, System: 0.006 s]
  Range (min … max):    2.975 s …  3.008 s    10 runs
 
Benchmark 2: /home/zb/miniforge3/envs/env-py3.13/bin/python bench.py
  Time (mean ± σ):      2.550 s ±  0.052 s    [User: 2.546 s, System: 0.004 s]
  Range (min … max):    2.505 s …  2.679 s    10 runs
 
Summary
  '/home/zb/miniforge3/envs/env-py3.13/bin/python bench.py' ran
    1.17 ± 0.02 times faster than '/home/zb/.local/share/uv/python/cpython-3.13.2-linux-x86_64-gnu/bin/python3.13 bench.py'

(but I also expect this to be related to libpython)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants