Improve .NET 8 stream decompression perf #84
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
.NET 8 has shown a significant regression when decompressing streams, though not blocks.
Modifications
Drop some overly aggressive inlining on cold paths, which may provide better enregistration. Also, for .NET 8 use an inline array to avoid allocating the scratch buffer on the heap.
Results
The regression has been effectively eliminated.
BenchmarkDotNet v0.13.10, Windows 11 (10.0.22621.2861/22H2/2022Update/SunValley2) 12th Gen Intel Core i7-1270P, 1 CPU, 16 logical and 12 physical cores .NET SDK 8.0.100
[Host] : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
Job-EHUKKX : .NET Framework 4.8.1 (4.8.9181.0), X64 RyuJIT VectorSize=256
Job-SIHOZS : .NET Framework 4.8.1 (4.8.9181.0), X64 RyuJIT VectorSize=256
Job-VAGBBP : .NET 6.0.25 (6.0.2523.51912), X64 RyuJIT AVX2
Job-ZACCID : .NET 6.0.25 (6.0.2523.51912), X64 RyuJIT AVX2
Job-ACXPKE : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
Job-JFCQYT : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
Job-CWWXAC : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
Job-IDGNPX : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
Note: Benchmarks include other improvements since 1.1.3