reader: Improve performance of read_all #441

wcampbell0x2a · 2024-05-19T06:15:40Z

Add Leftover::{Byte, Bits}, so that instead of conversion straight away
into a Bitvec, we keep it as a slice instead. In the case of the read_all
attribute this improves the speed, as reading until EOF doesn't keep
causing our reader to convert to bits and back again all the time.
This does result in some slowdown, but with some #[inline(never)]
we can keep it to a minimum. The total gain to read_all speed is
worth any .8 ms slowdown I saw in testing.

Closes #439

github-actions · 2024-05-19T06:20:27Z

Benchmark for `b49a65c`

Click to view benchmark

Test	Base	PR	%
count	3.8±0.02µs	N/A	N/A
deku_read_bits	1230.7±22.08ns	1250.3±20.97ns	+1.59%
deku_read_byte	3.5±0.05ns	3.4±0.13ns	-2.86%
deku_read_enum	3.0±0.05ns	2.6±0.04ns	-13.33%
deku_read_vec	34.1±0.44ns	37.6±0.89ns	+10.26%
deku_write_bits	161.9±3.24ns	162.7±17.05ns	+0.49%
deku_write_byte	21.8±0.20ns	22.7±0.34ns	+4.13%
deku_write_enum	21.1±0.41ns	21.7±0.38ns	+2.84%
deku_write_vec	422.7±5.49ns	362.6±7.05ns	-14.22%
read_all	4.1±0.06µs	N/A	N/A
read_all_bytes	3.6±0.03µs	N/A	N/A

github-actions · 2024-05-19T06:22:55Z

Benchmark for `7f81f7b`

Click to view benchmark

Test	Base	PR	%
count	3.6±0.05µs	N/A	N/A
deku_read_bits	1187.3±15.54ns	1238.2±13.65ns	+4.29%
deku_read_byte	3.3±0.03ns	3.4±0.03ns	+3.03%
deku_read_enum	2.6±0.05ns	2.6±0.05ns	0.00%
deku_read_vec	35.8±0.55ns	37.5±0.68ns	+4.75%
deku_write_bits	150.3±1.46ns	153.7±3.39ns	+2.26%
deku_write_byte	22.7±0.20ns	22.7±0.69ns	0.00%
deku_write_enum	21.5±0.31ns	21.7±0.45ns	+0.93%
deku_write_vec	351.2±5.78ns	406.7±4.96ns	+15.80%
read_all	5.2±0.04µs	N/A	N/A
read_all_bytes	4.8±0.04µs	N/A	N/A

github-actions · 2024-05-19T06:24:12Z

Benchmark for `42291cc`

Click to view benchmark

Test	Base	PR	%
count	3.8±0.03µs	N/A	N/A
deku_read_bits	1232.7±21.71ns	1283.2±13.27ns	+4.10%
deku_read_byte	3.3±0.02ns	3.4±0.25ns	+3.03%
deku_read_enum	2.6±0.04ns	2.6±0.16ns	0.00%
deku_read_vec	34.0±0.50ns	38.0±0.73ns	+11.76%
deku_write_bits	148.0±2.23ns	155.2±12.15ns	+4.86%
deku_write_byte	21.8±0.35ns	22.7±0.33ns	+4.13%
deku_write_enum	21.1±0.35ns	21.8±0.46ns	+3.32%
deku_write_vec	353.8±28.27ns	370.1±11.25ns	+4.61%
read_all	4.1±0.10µs	N/A	N/A
read_all_bytes	3.6±0.04µs	N/A	N/A

github-actions · 2024-05-19T06:25:33Z

Benchmark for `a473082`

Click to view benchmark

Test	Base	PR	%
count	3.8±0.03µs	N/A	N/A
deku_read_bits	1253.0±20.87ns	1198.3±18.86ns	-4.37%
deku_read_byte	3.3±0.07ns	3.6±0.07ns	+9.09%
deku_read_enum	2.6±0.05ns	3.0±0.07ns	+15.38%
deku_read_vec	33.9±0.53ns	37.7±0.69ns	+11.21%
deku_write_bits	152.9±3.45ns	153.9±3.13ns	+0.65%
deku_write_byte	21.8±0.51ns	22.4±0.40ns	+2.75%
deku_write_enum	21.1±0.33ns	21.7±0.37ns	+2.84%
deku_write_vec	357.3±38.06ns	421.8±4.87ns	+18.05%
read_all	5.0±0.04µs	N/A	N/A
read_all_bytes	4.6±0.18µs	N/A	N/A

github-actions · 2024-05-19T15:10:00Z

Benchmark for `e1f913a`

Click to view benchmark

Test	Base	PR	%
count	3.8±0.04µs	N/A	N/A
deku_read_bits	1413.7±27.29ns	1298.0±20.27ns	-8.18%
deku_read_byte	3.3±0.08ns	3.4±0.08ns	+3.03%
deku_read_enum	2.6±0.04ns	2.6±0.11ns	0.00%
deku_read_vec	34.0±0.56ns	43.0±1.08ns	+26.47%
deku_write_bits	170.9±8.62ns	165.2±3.57ns	-3.34%
deku_write_byte	21.9±0.39ns	23.6±0.62ns	+7.76%
deku_write_enum	21.4±0.53ns	22.2±0.60ns	+3.74%
deku_write_vec	351.9±9.65ns	355.1±4.87ns	+0.91%
read_all	4.1±0.07µs	N/A	N/A
read_all_bytes	5.0±0.03µs	N/A	N/A

github-actions · 2024-05-19T15:58:30Z

Benchmark for `8c98ec9`

Click to view benchmark

Test	Base	PR	%
count	3.8±0.03µs	N/A	N/A
deku_read_bits	1183.6±16.25ns	1325.2±11.72ns	+11.96%
deku_read_byte	3.3±0.05ns	3.4±0.05ns	+3.03%
deku_read_enum	2.6±0.06ns	2.6±0.08ns	0.00%
deku_read_vec	33.9±0.53ns	39.9±0.74ns	+17.70%
deku_write_bits	150.7±2.83ns	167.3±12.46ns	+11.02%
deku_write_byte	21.7±0.36ns	22.4±0.39ns	+3.23%
deku_write_enum	21.1±0.33ns	21.7±0.33ns	+2.84%
deku_write_vec	350.3±4.98ns	350.9±3.68ns	+0.17%
read_all	4.1±0.05µs	N/A	N/A
read_all_bytes	5.0±0.04µs	N/A	N/A

github-actions · 2024-05-22T03:27:16Z

Benchmark for `d2397b4`

Click to view benchmark

Test	Base	PR	%
count	4.6±0.04µs	N/A	N/A
deku_read_bits	1234.3±15.55ns	1211.2±12.58ns	-1.87%
deku_read_byte	3.4±0.07ns	3.4±0.11ns	0.00%
deku_read_enum	2.6±0.41ns	2.6±0.06ns	0.00%
deku_read_vec	33.5±0.56ns	38.9±0.46ns	+16.12%
deku_write_bits	153.5±21.23ns	153.5±8.61ns	0.00%
deku_write_byte	21.8±0.35ns	21.9±0.37ns	+0.46%
deku_write_enum	21.1±0.27ns	21.7±0.22ns	+2.84%
deku_write_vec	404.6±5.31ns	406.5±5.66ns	+0.47%
read_all	17.6±0.01µs	N/A	N/A
read_all_bytes	17.6±0.03µs	N/A	N/A

github-actions · 2024-05-22T03:32:03Z

Benchmark for `9ff94ec`

Click to view benchmark

Test	Base	PR	%
count	3.3±0.05µs	N/A	N/A
deku_read_bits	1197.1±17.14ns	1299.6±18.08ns	+8.56%
deku_read_byte	3.4±0.04ns	3.4±0.11ns	0.00%
deku_read_enum	2.5±0.04ns	2.6±0.04ns	+4.00%
deku_read_vec	34.2±0.47ns	37.9±0.63ns	+10.82%
deku_write_bits	148.0±3.18ns	166.4±5.02ns	+12.43%
deku_write_byte	21.9±0.21ns	21.8±0.38ns	-0.46%
deku_write_enum	21.1±0.44ns	21.8±0.54ns	+3.32%
deku_write_vec	347.1±6.39ns	351.0±6.85ns	+1.12%
read_all	17.5±0.09µs	N/A	N/A
read_all_bytes	16.5±0.10µs	N/A	N/A

wcampbell0x2a · 2024-05-22T03:47:03Z

For reference, not as perf as it was ;)

+ critcmp stable nightly stable-perf nightly-perf
group                 nightly                                nightly-perf                            stable                                 stable-perf
-----                 -------                                ------------                            ------                                 -----------
Deserialize/binrw     1.38      2.8±0.06ms        ? ?/sec    1.00      2.0±0.04ms        ? ?/sec     1.37      2.8±0.10ms        ? ?/sec    1.01      2.1±0.07ms        ? ?/sec
Deserialize/custom    1.21      2.2±0.06ms        ? ?/sec    1.00  1826.9±30.32µs        ? ?/sec     1.21      2.2±0.07ms        ? ?/sec    1.03  1873.1±56.09µs        ? ?/sec
Deserialize/deku      1.11      3.1±0.08ms        ? ?/sec    1.00      2.8±0.05ms        ? ?/sec     1.05      3.0±0.06ms        ? ?/sec    1.01      2.8±0.13ms        ? ?/sec
Serialize/binrw       2.09      5.5±0.21ms        ? ?/sec    1.00      2.7±0.06ms        ? ?/sec     2.19      5.8±0.21ms        ? ?/sec    1.00      2.7±0.10ms        ? ?/sec
Serialize/custom      1.11  1943.4±49.78µs        ? ?/sec    1.02  1790.1±137.87µs        ? ?/sec    1.14      2.0±0.04ms        ? ?/sec    1.00  1752.8±44.16µs        ? ?/sec
Serialize/deku        1.12      3.1±0.17ms        ? ?/sec    1.00      2.8±0.04ms        ? ?/sec     1.10      3.1±0.08ms        ? ?/sec    1.01      2.8±0.14ms        ? ?/sec

before:

github-actions · 2024-05-22T03:47:54Z

Benchmark for `bd2a83c`

Click to view benchmark

Test	Base	PR	%
count	3.4±0.05µs	N/A	N/A
deku_read_bits	1241.8±15.85ns	1185.6±38.67ns	-4.53%
deku_read_byte	15.4±0.02ns	3.4±0.10ns	-77.92%
deku_read_enum	15.5±0.06ns	2.6±0.05ns	-83.23%
deku_read_vec	41.3±0.42ns	36.9±0.61ns	-10.65%
deku_write_bits	161.8±2.92ns	169.1±3.97ns	+4.51%
deku_write_byte	22.1±0.32ns	22.1±0.34ns	0.00%
deku_write_enum	21.4±0.30ns	22.0±0.30ns	+2.80%
deku_write_vec	420.7±3.94ns	342.9±4.97ns	-18.49%
read_all	16.6±0.11µs	N/A	N/A
read_all_bytes	17.1±0.24µs	N/A	N/A

wcampbell0x2a · 2024-05-22T04:15:37Z

Hmmm, my inline(never) hurt the performance of the count vs read_all. I might have to balance this..

read_all_bytes          time:   [10.605 µs 10.623 µs 10.649 µs]
read_all                time:   [10.715 µs 10.727 µs 10.741 µs]
count                   time:   [2.5363 µs 2.5426 µs 2.5526 µs]

wcampbell0x2a · 2024-05-22T04:20:40Z

Hmmm, my inline(never) hurt the performance of the count vs read_all. I might have to balance this..
read_all_bytes          time:   [10.605 µs 10.623 µs 10.649 µs]
read_all                time:   [10.715 µs 10.727 µs 10.741 µs]
count                   time:   [2.5363 µs 2.5426 µs 2.5526 µs]

Or! Just take the improvements and go to sleep ;)

github-actions · 2024-05-25T16:21:30Z

Benchmark for `5f329ba`

Click to view benchmark

Test	Base	PR	%
count	3.7±0.05µs	N/A	N/A
deku_read_bits	1304.6±18.95ns	1274.5±11.97ns	-2.31%
deku_read_byte	3.3±0.10ns	3.4±0.07ns	+3.03%
deku_read_enum	2.6±0.05ns	3.0±0.05ns	+15.38%
deku_read_vec	35.2±0.58ns	35.6±0.53ns	+1.14%
deku_write_bits	149.8±2.65ns	170.6±3.72ns	+13.89%
deku_write_byte	22.1±0.30ns	22.4±0.22ns	+1.36%
deku_write_enum	21.4±0.43ns	21.7±0.20ns	+1.40%
deku_write_vec	398.3±3.11ns	333.4±7.79ns	-16.29%
read_all	4.4±0.07µs	N/A	N/A
read_all_bytes	3.9±0.03µs	N/A	N/A

* Add Leftover::{Byte, Bits}, so that instead of conversion straight away into a Bitvec, we keep it as a slice instead. In the case of the read_all attribute this improves the speed, as reading until EOF doesn't keep causing our reader to convert to bits and back again all the time. * This _does_ result in some slowdown, but with some #[inline(never)] we can keep it to a minimum. The total gain to read_all speed is worth any .8 ms slowdown I saw in testing.

* This was causing issues with bit_size of buf, and BitSize already takes care of this

* reader: Improve performance of read_all * Add Leftover::{Byte, Bits}, so that instead of conversion straight away into a Bitvec, we keep it as a slice instead. In the case of the read_all attribute this improves the speed, as reading until EOF doesn't keep causing our reader to convert to bits and back again all the time. * This _does_ result in some slowdown, but with some #[inline(never)] we can keep it to a minimum. The total gain to read_all speed is worth any .8 ms slowdown I saw in testing. * Always use BitSize for bit read * This was causing issues with bit_size of buf, and BitSize already takes care of this

wcampbell0x2a mentioned this pull request May 19, 2024

Performance of read_all and count #439

Closed

wcampbell0x2a force-pushed the 439-improve-read-all-perf branch 3 times, most recently from c9d336a to 43dbdfb Compare May 19, 2024 06:19

wcampbell0x2a force-pushed the 439-improve-read-all-perf branch from 43dbdfb to 696adb2 Compare May 19, 2024 15:04

wcampbell0x2a force-pushed the 439-improve-read-all-perf branch 2 times, most recently from 833f696 to 558ed33 Compare May 22, 2024 03:27

wcampbell0x2a force-pushed the 439-improve-read-all-perf branch from 558ed33 to 0b22526 Compare May 22, 2024 03:42

wcampbell0x2a changed the title ~~DRAFT: reader: Improve performance of read_all~~ reader: Improve performance of read_all May 22, 2024

wcampbell0x2a force-pushed the 439-improve-read-all-perf branch from 0b22526 to 03ff770 Compare May 25, 2024 16:16

wcampbell0x2a added this to the 0.18.0 milestone May 31, 2024

wcampbell0x2a mentioned this pull request Jul 31, 2024

Is there a faster way to read large Vec<u8>? #462

Closed

wcampbell0x2a added 2 commits August 10, 2024 13:48

Always use BitSize for bit read

312cbad

* This was causing issues with bit_size of buf, and BitSize already takes care of this

wcampbell0x2a force-pushed the 439-improve-read-all-perf branch from 03ff770 to 312cbad Compare August 10, 2024 17:59

wcampbell0x2a requested a review from sharksforarms August 11, 2024 15:30

sharksforarms approved these changes Sep 3, 2024

View reviewed changes

wcampbell0x2a merged commit fb7fd83 into master Sep 4, 2024
6 of 13 checks passed

wcampbell0x2a deleted the 439-improve-read-all-perf branch September 4, 2024 02:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reader: Improve performance of read_all #441

reader: Improve performance of read_all #441

wcampbell0x2a commented May 19, 2024 •

edited

Loading

github-actions bot commented May 19, 2024

github-actions bot commented May 19, 2024

github-actions bot commented May 19, 2024

github-actions bot commented May 19, 2024

github-actions bot commented May 19, 2024

github-actions bot commented May 19, 2024

github-actions bot commented May 22, 2024

github-actions bot commented May 22, 2024

wcampbell0x2a commented May 22, 2024

github-actions bot commented May 22, 2024

wcampbell0x2a commented May 22, 2024

wcampbell0x2a commented May 22, 2024

github-actions bot commented May 25, 2024

reader: Improve performance of read_all #441

reader: Improve performance of read_all #441

Conversation

wcampbell0x2a commented May 19, 2024 • edited Loading

github-actions bot commented May 19, 2024

Benchmark for b49a65c

github-actions bot commented May 19, 2024

Benchmark for 7f81f7b

github-actions bot commented May 19, 2024

Benchmark for 42291cc

github-actions bot commented May 19, 2024

Benchmark for a473082

github-actions bot commented May 19, 2024

Benchmark for e1f913a

github-actions bot commented May 19, 2024

Benchmark for 8c98ec9

github-actions bot commented May 22, 2024

Benchmark for d2397b4

github-actions bot commented May 22, 2024

Benchmark for 9ff94ec

wcampbell0x2a commented May 22, 2024

github-actions bot commented May 22, 2024

Benchmark for bd2a83c

wcampbell0x2a commented May 22, 2024

wcampbell0x2a commented May 22, 2024

github-actions bot commented May 25, 2024

Benchmark for 5f329ba

wcampbell0x2a commented May 19, 2024 •

edited

Loading

Benchmark for `b49a65c`

Benchmark for `7f81f7b`

Benchmark for `42291cc`

Benchmark for `a473082`

Benchmark for `e1f913a`

Benchmark for `8c98ec9`

Benchmark for `d2397b4`

Benchmark for `9ff94ec`

Benchmark for `bd2a83c`

Benchmark for `5f329ba`