Introduce benchmarks for npy/npz readers #122

garyttierney · 2024-08-16T00:54:03Z

This is a WIP and just some groundwork to get some figures on the recent conversations around improving backtest performance.

cargo bench will produce a report on the command-line with a comparison against the previous run if available:

format-throughput/npz   time:   [40.665 ms 40.878 ms 41.108 ms]
                        thrpt:  [24.326 Melem/s 24.463 Melem/s 24.591 Melem/s]
                 change:
                        time:   [+0.1910% +1.0585% +1.8781%] (p = 0.02 < 0.05)
                        thrpt:  [-1.8434% -1.0475% -0.1906%]
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
format-throughput/npy   time:   [28.188 ms 28.509 ms 28.846 ms]
                        thrpt:  [34.667 Melem/s 35.077 Melem/s 35.476 Melem/s]
                 change:
                        time:   [-2.0433% -0.3669% +1.4170%] (p = 0.68 > 0.05)
                        thrpt:  [-1.3972% +0.3682% +2.0859%]
                        No change in performance detected.

nkaz001 · 2024-08-16T13:04:34Z

At this point, I'm unsure if the write_npz method needs to be included in the library. Would it be sufficient to implement this method within the benchmark code, or could you extend it to handle more general cases, such as accepting a dataset name and other parameters?

Additionally, I'm curious about the proper compression options for npz files. Will using custom options cause any compatibility issues? This option is currently used.

garyttierney · 2024-08-16T17:29:54Z

At this point, I'm unsure if the write_npz method needs to be included in the library. Would it be sufficient to implement this method within the benchmark code, or could you extend it to handle more general cases, such as accepting a dataset name and other parameters?

Yeah I think it would make more sense to move it to the benchmark code, although we code make the to_npz code in recorder call this too and make sure it can handle the options.

Additionally, I'm curious about the proper compression options for npz files. Will using custom options cause any compatibility issues? This option is currently used.

Those should be the same options used as defaults, although I'll explicitly add them to the benchmark code.

One thing I'd like to get in before this is ready is a memory-mapped reader and compare throughput vs the additional syscalls + buffer copy.

Introduce benchmarks for npy/npz readers

0c7905f

garyttierney force-pushed the feat/data-format-benchmarks branch from 36a0449 to 0c7905f Compare August 16, 2024 00:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce benchmarks for npy/npz readers #122

Introduce benchmarks for npy/npz readers #122

garyttierney commented Aug 16, 2024

nkaz001 commented Aug 16, 2024

garyttierney commented Aug 16, 2024

Introduce benchmarks for npy/npz readers #122

Are you sure you want to change the base?

Introduce benchmarks for npy/npz readers #122

Conversation

garyttierney commented Aug 16, 2024

nkaz001 commented Aug 16, 2024

garyttierney commented Aug 16, 2024