Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

reader: Improve performance of read_all #441

Merged
merged 2 commits into from
Sep 4, 2024

Conversation

wcampbell0x2a
Copy link
Collaborator

@wcampbell0x2a wcampbell0x2a commented May 19, 2024

  • Add Leftover::{Byte, Bits}, so that instead of conversion straight away
    into a Bitvec, we keep it as a slice instead. In the case of the read_all
    attribute this improves the speed, as reading until EOF doesn't keep
    causing our reader to convert to bits and back again all the time.

  • This does result in some slowdown, but with some #[inline(never)]
    we can keep it to a minimum. The total gain to read_all speed is
    worth any .8 ms slowdown I saw in testing.

Closes #439

@wcampbell0x2a wcampbell0x2a force-pushed the 439-improve-read-all-perf branch 3 times, most recently from c9d336a to 43dbdfb Compare May 19, 2024 06:19
Copy link

Benchmark for b49a65c

Click to view benchmark
Test Base PR %
count 3.8±0.02µs N/A N/A
deku_read_bits 1230.7±22.08ns 1250.3±20.97ns +1.59%
deku_read_byte 3.5±0.05ns 3.4±0.13ns -2.86%
deku_read_enum 3.0±0.05ns 2.6±0.04ns -13.33%
deku_read_vec 34.1±0.44ns 37.6±0.89ns +10.26%
deku_write_bits 161.9±3.24ns 162.7±17.05ns +0.49%
deku_write_byte 21.8±0.20ns 22.7±0.34ns +4.13%
deku_write_enum 21.1±0.41ns 21.7±0.38ns +2.84%
deku_write_vec 422.7±5.49ns 362.6±7.05ns -14.22%
read_all 4.1±0.06µs N/A N/A
read_all_bytes 3.6±0.03µs N/A N/A

Copy link

Benchmark for 7f81f7b

Click to view benchmark
Test Base PR %
count 3.6±0.05µs N/A N/A
deku_read_bits 1187.3±15.54ns 1238.2±13.65ns +4.29%
deku_read_byte 3.3±0.03ns 3.4±0.03ns +3.03%
deku_read_enum 2.6±0.05ns 2.6±0.05ns 0.00%
deku_read_vec 35.8±0.55ns 37.5±0.68ns +4.75%
deku_write_bits 150.3±1.46ns 153.7±3.39ns +2.26%
deku_write_byte 22.7±0.20ns 22.7±0.69ns 0.00%
deku_write_enum 21.5±0.31ns 21.7±0.45ns +0.93%
deku_write_vec 351.2±5.78ns 406.7±4.96ns +15.80%
read_all 5.2±0.04µs N/A N/A
read_all_bytes 4.8±0.04µs N/A N/A

Copy link

Benchmark for 42291cc

Click to view benchmark
Test Base PR %
count 3.8±0.03µs N/A N/A
deku_read_bits 1232.7±21.71ns 1283.2±13.27ns +4.10%
deku_read_byte 3.3±0.02ns 3.4±0.25ns +3.03%
deku_read_enum 2.6±0.04ns 2.6±0.16ns 0.00%
deku_read_vec 34.0±0.50ns 38.0±0.73ns +11.76%
deku_write_bits 148.0±2.23ns 155.2±12.15ns +4.86%
deku_write_byte 21.8±0.35ns 22.7±0.33ns +4.13%
deku_write_enum 21.1±0.35ns 21.8±0.46ns +3.32%
deku_write_vec 353.8±28.27ns 370.1±11.25ns +4.61%
read_all 4.1±0.10µs N/A N/A
read_all_bytes 3.6±0.04µs N/A N/A

Copy link

Benchmark for a473082

Click to view benchmark
Test Base PR %
count 3.8±0.03µs N/A N/A
deku_read_bits 1253.0±20.87ns 1198.3±18.86ns -4.37%
deku_read_byte 3.3±0.07ns 3.6±0.07ns +9.09%
deku_read_enum 2.6±0.05ns 3.0±0.07ns +15.38%
deku_read_vec 33.9±0.53ns 37.7±0.69ns +11.21%
deku_write_bits 152.9±3.45ns 153.9±3.13ns +0.65%
deku_write_byte 21.8±0.51ns 22.4±0.40ns +2.75%
deku_write_enum 21.1±0.33ns 21.7±0.37ns +2.84%
deku_write_vec 357.3±38.06ns 421.8±4.87ns +18.05%
read_all 5.0±0.04µs N/A N/A
read_all_bytes 4.6±0.18µs N/A N/A

@wcampbell0x2a wcampbell0x2a force-pushed the 439-improve-read-all-perf branch from 43dbdfb to 696adb2 Compare May 19, 2024 15:04
Copy link

Benchmark for e1f913a

Click to view benchmark
Test Base PR %
count 3.8±0.04µs N/A N/A
deku_read_bits 1413.7±27.29ns 1298.0±20.27ns -8.18%
deku_read_byte 3.3±0.08ns 3.4±0.08ns +3.03%
deku_read_enum 2.6±0.04ns 2.6±0.11ns 0.00%
deku_read_vec 34.0±0.56ns 43.0±1.08ns +26.47%
deku_write_bits 170.9±8.62ns 165.2±3.57ns -3.34%
deku_write_byte 21.9±0.39ns 23.6±0.62ns +7.76%
deku_write_enum 21.4±0.53ns 22.2±0.60ns +3.74%
deku_write_vec 351.9±9.65ns 355.1±4.87ns +0.91%
read_all 4.1±0.07µs N/A N/A
read_all_bytes 5.0±0.03µs N/A N/A

Copy link

Benchmark for 8c98ec9

Click to view benchmark
Test Base PR %
count 3.8±0.03µs N/A N/A
deku_read_bits 1183.6±16.25ns 1325.2±11.72ns +11.96%
deku_read_byte 3.3±0.05ns 3.4±0.05ns +3.03%
deku_read_enum 2.6±0.06ns 2.6±0.08ns 0.00%
deku_read_vec 33.9±0.53ns 39.9±0.74ns +17.70%
deku_write_bits 150.7±2.83ns 167.3±12.46ns +11.02%
deku_write_byte 21.7±0.36ns 22.4±0.39ns +3.23%
deku_write_enum 21.1±0.33ns 21.7±0.33ns +2.84%
deku_write_vec 350.3±4.98ns 350.9±3.68ns +0.17%
read_all 4.1±0.05µs N/A N/A
read_all_bytes 5.0±0.04µs N/A N/A

@wcampbell0x2a wcampbell0x2a force-pushed the 439-improve-read-all-perf branch 2 times, most recently from 833f696 to 558ed33 Compare May 22, 2024 03:27
Copy link

Benchmark for d2397b4

Click to view benchmark
Test Base PR %
count 4.6±0.04µs N/A N/A
deku_read_bits 1234.3±15.55ns 1211.2±12.58ns -1.87%
deku_read_byte 3.4±0.07ns 3.4±0.11ns 0.00%
deku_read_enum 2.6±0.41ns 2.6±0.06ns 0.00%
deku_read_vec 33.5±0.56ns 38.9±0.46ns +16.12%
deku_write_bits 153.5±21.23ns 153.5±8.61ns 0.00%
deku_write_byte 21.8±0.35ns 21.9±0.37ns +0.46%
deku_write_enum 21.1±0.27ns 21.7±0.22ns +2.84%
deku_write_vec 404.6±5.31ns 406.5±5.66ns +0.47%
read_all 17.6±0.01µs N/A N/A
read_all_bytes 17.6±0.03µs N/A N/A

Copy link

Benchmark for 9ff94ec

Click to view benchmark
Test Base PR %
count 3.3±0.05µs N/A N/A
deku_read_bits 1197.1±17.14ns 1299.6±18.08ns +8.56%
deku_read_byte 3.4±0.04ns 3.4±0.11ns 0.00%
deku_read_enum 2.5±0.04ns 2.6±0.04ns +4.00%
deku_read_vec 34.2±0.47ns 37.9±0.63ns +10.82%
deku_write_bits 148.0±3.18ns 166.4±5.02ns +12.43%
deku_write_byte 21.9±0.21ns 21.8±0.38ns -0.46%
deku_write_enum 21.1±0.44ns 21.8±0.54ns +3.32%
deku_write_vec 347.1±6.39ns 351.0±6.85ns +1.12%
read_all 17.5±0.09µs N/A N/A
read_all_bytes 16.5±0.10µs N/A N/A

@wcampbell0x2a wcampbell0x2a force-pushed the 439-improve-read-all-perf branch from 558ed33 to 0b22526 Compare May 22, 2024 03:42
@wcampbell0x2a wcampbell0x2a changed the title DRAFT: reader: Improve performance of read_all reader: Improve performance of read_all May 22, 2024
@wcampbell0x2a
Copy link
Collaborator Author

For reference, not as perf as it was ;)

+ critcmp stable nightly stable-perf nightly-perf
group                 nightly                                nightly-perf                            stable                                 stable-perf
-----                 -------                                ------------                            ------                                 -----------
Deserialize/binrw     1.38      2.8±0.06ms        ? ?/sec    1.00      2.0±0.04ms        ? ?/sec     1.37      2.8±0.10ms        ? ?/sec    1.01      2.1±0.07ms        ? ?/sec
Deserialize/custom    1.21      2.2±0.06ms        ? ?/sec    1.00  1826.9±30.32µs        ? ?/sec     1.21      2.2±0.07ms        ? ?/sec    1.03  1873.1±56.09µs        ? ?/sec
Deserialize/deku      1.11      3.1±0.08ms        ? ?/sec    1.00      2.8±0.05ms        ? ?/sec     1.05      3.0±0.06ms        ? ?/sec    1.01      2.8±0.13ms        ? ?/sec
Serialize/binrw       2.09      5.5±0.21ms        ? ?/sec    1.00      2.7±0.06ms        ? ?/sec     2.19      5.8±0.21ms        ? ?/sec    1.00      2.7±0.10ms        ? ?/sec
Serialize/custom      1.11  1943.4±49.78µs        ? ?/sec    1.02  1790.1±137.87µs        ? ?/sec    1.14      2.0±0.04ms        ? ?/sec    1.00  1752.8±44.16µs        ? ?/sec
Serialize/deku        1.12      3.1±0.17ms        ? ?/sec    1.00      2.8±0.04ms        ? ?/sec     1.10      3.1±0.08ms        ? ?/sec    1.01      2.8±0.14ms        ? ?/sec

before:
image

Copy link

Benchmark for bd2a83c

Click to view benchmark
Test Base PR %
count 3.4±0.05µs N/A N/A
deku_read_bits 1241.8±15.85ns 1185.6±38.67ns -4.53%
deku_read_byte 15.4±0.02ns 3.4±0.10ns -77.92%
deku_read_enum 15.5±0.06ns 2.6±0.05ns -83.23%
deku_read_vec 41.3±0.42ns 36.9±0.61ns -10.65%
deku_write_bits 161.8±2.92ns 169.1±3.97ns +4.51%
deku_write_byte 22.1±0.32ns 22.1±0.34ns 0.00%
deku_write_enum 21.4±0.30ns 22.0±0.30ns +2.80%
deku_write_vec 420.7±3.94ns 342.9±4.97ns -18.49%
read_all 16.6±0.11µs N/A N/A
read_all_bytes 17.1±0.24µs N/A N/A

@wcampbell0x2a
Copy link
Collaborator Author

Hmmm, my inline(never) hurt the performance of the count vs read_all. I might have to balance this..

read_all_bytes          time:   [10.605 µs 10.623 µs 10.649 µs]
read_all                time:   [10.715 µs 10.727 µs 10.741 µs]
count                   time:   [2.5363 µs 2.5426 µs 2.5526 µs]

@wcampbell0x2a
Copy link
Collaborator Author

Hmmm, my inline(never) hurt the performance of the count vs read_all. I might have to balance this..

read_all_bytes          time:   [10.605 µs 10.623 µs 10.649 µs]
read_all                time:   [10.715 µs 10.727 µs 10.741 µs]
count                   time:   [2.5363 µs 2.5426 µs 2.5526 µs]

Or! Just take the improvements and go to sleep ;)

@wcampbell0x2a wcampbell0x2a force-pushed the 439-improve-read-all-perf branch from 0b22526 to 03ff770 Compare May 25, 2024 16:16
Copy link

Benchmark for 5f329ba

Click to view benchmark
Test Base PR %
count 3.7±0.05µs N/A N/A
deku_read_bits 1304.6±18.95ns 1274.5±11.97ns -2.31%
deku_read_byte 3.3±0.10ns 3.4±0.07ns +3.03%
deku_read_enum 2.6±0.05ns 3.0±0.05ns +15.38%
deku_read_vec 35.2±0.58ns 35.6±0.53ns +1.14%
deku_write_bits 149.8±2.65ns 170.6±3.72ns +13.89%
deku_write_byte 22.1±0.30ns 22.4±0.22ns +1.36%
deku_write_enum 21.4±0.43ns 21.7±0.20ns +1.40%
deku_write_vec 398.3±3.11ns 333.4±7.79ns -16.29%
read_all 4.4±0.07µs N/A N/A
read_all_bytes 3.9±0.03µs N/A N/A

* Add Leftover::{Byte, Bits}, so that instead of conversion straight away
  into a Bitvec, we keep it as a slice instead. In the case of the read_all
  attribute this improves the speed, as reading until EOF doesn't keep
  causing our reader to convert to bits and back again all the time.

* This _does_ result in some slowdown, but with some #[inline(never)]
  we can keep it to a minimum. The total gain to read_all speed is
  worth any .8 ms slowdown I saw in testing.
* This was causing issues with bit_size of buf, and BitSize already takes
  care of this
@wcampbell0x2a wcampbell0x2a merged commit fb7fd83 into master Sep 4, 2024
6 of 13 checks passed
@wcampbell0x2a wcampbell0x2a deleted the 439-improve-read-all-perf branch September 4, 2024 02:50
wcampbell0x2a added a commit that referenced this pull request Sep 4, 2024
* reader: Improve performance of read_all

* Add Leftover::{Byte, Bits}, so that instead of conversion straight away
  into a Bitvec, we keep it as a slice instead. In the case of the read_all
  attribute this improves the speed, as reading until EOF doesn't keep
  causing our reader to convert to bits and back again all the time.

* This _does_ result in some slowdown, but with some #[inline(never)]
  we can keep it to a minimum. The total gain to read_all speed is
  worth any .8 ms slowdown I saw in testing.

* Always use BitSize for bit read

* This was causing issues with bit_size of buf, and BitSize already takes
  care of this
wcampbell0x2a added a commit that referenced this pull request Sep 4, 2024
* reader: Improve performance of read_all

* Add Leftover::{Byte, Bits}, so that instead of conversion straight away
  into a Bitvec, we keep it as a slice instead. In the case of the read_all
  attribute this improves the speed, as reading until EOF doesn't keep
  causing our reader to convert to bits and back again all the time.

* This _does_ result in some slowdown, but with some #[inline(never)]
  we can keep it to a minimum. The total gain to read_all speed is
  worth any .8 ms slowdown I saw in testing.

* Always use BitSize for bit read

* This was causing issues with bit_size of buf, and BitSize already takes
  care of this
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance of read_all and count
2 participants