Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Optimize zero reads #34

Merged
merged 1 commit into from
Oct 20, 2024
Merged

Optimize zero reads #34

merged 1 commit into from
Oct 20, 2024

Conversation

nirs
Copy link
Member

@nirs nirs commented Oct 17, 2024

Use optimized range loop[1], optimized by the compiler to single memclr
call. This dramatically speeds up zero reads.

format compression utilization speedup
qcow2 - 0% 28.72
qcow2 zlib 0% 28.04
qcow2 - 50% 4.54
qcow2 zlib 50% 1.03
qcow2 - 100% 1.01
qcow2 zlib 100% 1.00

Before:

% go test -bench Read
BenchmarkRead0p/qcow2-12           14      77515735 ns/op     3462.98 MB/s      1050518 B/op        39 allocs/op
BenchmarkRead0p/qcow2_zlib-12      14      77823402 ns/op     3449.29 MB/s      1050504 B/op        39 allocs/op
BenchmarkRead50p/qcow2-12          24      48812158 ns/op     5499.36 MB/s      1181856 B/op        45 allocs/op
BenchmarkRead50p/qcow2_zlib-12      2     899659187 ns/op      298.37 MB/s    184996316 B/op     43247 allocs/op
BenchmarkRead100p/qcow2-12         61      19306020 ns/op    13904.24 MB/s      1181854 B/op        45 allocs/op
BenchmarkRead100p/qcow2_zlib-12     1    1732168542 ns/op      154.97 MB/s    368850952 B/op     86460 allocs/op

After:

% go test -bench Read
BenchmarkRead0p/qcow2-12          471       2698377 ns/op    99480.34 MB/s      1050514 B/op        39 allocs/op
BenchmarkRead0p/qcow2_zlib-12     468       2774952 ns/op    96735.15 MB/s      1050511 B/op        39 allocs/op
BenchmarkRead50p/qcow2-12         100      10735870 ns/op    25003.61 MB/s      1181854 B/op        45 allocs/op
BenchmarkRead50p/qcow2_zlib-12      2     868310583 ns/op      309.15 MB/s    185038456 B/op     43263 allocs/op
BenchmarkRead100p/qcow2-12         63      18977718 ns/op    14144.77 MB/s      1181851 B/op        45 allocs/op
BenchmarkRead100p/qcow2_zlib-12     1    1727832917 ns/op      155.36 MB/s    368886656 B/op     86471 allocs/op

Comparing with qemu-img show that we match qemu-img performance for
uncompressed version of the lima default image:

% time ./go-qcow2reader-example /tmp/test.qcow2 > /tmp/tmp.img
./go-qcow2reader-example /tmp/test.qcow2 > /tmp/tmp.img  0.06s user 0.73s system 93% cpu 0.854 total

% time qemu-img convert -O raw /tmp/test.qcow2 /tmp/tmp.img
qemu-img convert -O raw /tmp/test.qcow2 /tmp/tmp.img  0.04s user 0.70s system 98% cpu 0.756 total

[1] https://go-review.googlesource.com/c/go/+/2520

Part-of #32

Based on #35 for more correct benchmarks.

@nirs nirs force-pushed the zero-reads branch 2 times, most recently from d8933ae to 82f0931 Compare October 18, 2024 06:05
@nirs nirs requested a review from AkihiroSuda October 18, 2024 06:10
@AkihiroSuda AkihiroSuda requested a review from a team October 18, 2024 06:43
@nirs
Copy link
Member Author

nirs commented Oct 19, 2024

@AkihiroSuda current version is simpler, needs no temporary buffer, and much faster.

Use optimized range loop[1], optimized by the compiler to single memclr
call. This dramatically speeds up zero reads.

| format | compression | utilization | speedup |
|--------|-------------|-------------|---------|
| qcow2  | -           |          0% |   28.72 |
| qcow2  | zlib        |          0% |   28.04 |
| qcow2  | -           |         50% |    4.54 |
| qcow2  | zlib        |         50% |    1.03 |
| qcow2  | -           |        100% |    1.01 |
| qcow2  | zlib        |        100% |    1.00 |

Before:

    % go test -bench Read
    BenchmarkRead0p/qcow2-12           14      77515735 ns/op     3462.98 MB/s      1050518 B/op        39 allocs/op
    BenchmarkRead0p/qcow2_zlib-12      14      77823402 ns/op     3449.29 MB/s      1050504 B/op        39 allocs/op
    BenchmarkRead50p/qcow2-12          24      48812158 ns/op     5499.36 MB/s      1181856 B/op        45 allocs/op
    BenchmarkRead50p/qcow2_zlib-12      2     899659187 ns/op      298.37 MB/s    184996316 B/op     43247 allocs/op
    BenchmarkRead100p/qcow2-12         61      19306020 ns/op    13904.24 MB/s      1181854 B/op        45 allocs/op
    BenchmarkRead100p/qcow2_zlib-12     1    1732168542 ns/op      154.97 MB/s    368850952 B/op     86460 allocs/op

After:

    % go test -bench Read
    BenchmarkRead0p/qcow2-12          471       2698377 ns/op    99480.34 MB/s      1050514 B/op        39 allocs/op
    BenchmarkRead0p/qcow2_zlib-12     468       2774952 ns/op    96735.15 MB/s      1050511 B/op        39 allocs/op
    BenchmarkRead50p/qcow2-12         100      10735870 ns/op    25003.61 MB/s      1181854 B/op        45 allocs/op
    BenchmarkRead50p/qcow2_zlib-12      2     868310583 ns/op      309.15 MB/s    185038456 B/op     43263 allocs/op
    BenchmarkRead100p/qcow2-12         63      18977718 ns/op    14144.77 MB/s      1181851 B/op        45 allocs/op
    BenchmarkRead100p/qcow2_zlib-12     1    1727832917 ns/op      155.36 MB/s    368886656 B/op     86471 allocs/op

Comparing with qemu-img show that we match qemu-img performance for
uncompressed version of the lima default image:

    % time ./go-qcow2reader-example /tmp/test.qcow2 > /tmp/tmp.img
    ./go-qcow2reader-example /tmp/test.qcow2 > /tmp/tmp.img  0.06s user 0.73s system 93% cpu 0.854 total

    % time qemu-img convert -O raw /tmp/test.qcow2 /tmp/tmp.img
    qemu-img convert -O raw /tmp/test.qcow2 /tmp/tmp.img  0.04s user 0.70s system 98% cpu 0.756 total

[1] https://go-review.googlesource.com/c/go/+/2520

Signed-off-by: Nir Soffer <nirsof@gmail.com>
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@AkihiroSuda AkihiroSuda merged commit 5f33c4a into lima-vm:master Oct 20, 2024
2 checks passed
@nirs nirs deleted the zero-reads branch October 20, 2024 21:38
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants