Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Stagger Stepping in Negative Levels #2921

Merged
merged 6 commits into from
Dec 14, 2021

Conversation

felixhandte
Copy link
Contributor

@felixhandte felixhandte commented Dec 10, 2021

#2749 returned the negative levels to pre-#1562 behavior. I.e., skipping stepSize positions every iteration. This PR emulates #1562-like stepping by applying the stepSize skip every other position.

This PR addresses #2827.

Benchmarks

silesia.tar on gcc-10:

Screenshot from 2021-12-13 17-14-03

silesia.tar on clang-13:

Screenshot from 2021-12-13 17-14-15

Status

I believe this PR is ready for merge.

This replicates the behavior of @terrelln's `ZSTD_fast` implementation. That
is, it always looks at adjacent pairs of positions, and only applies the
acceleration every other position. This produces a more fine-grained
acceleration.
This avoids an additional addition, at the cost of an additional variable.
@felixhandte
Copy link
Contributor Author

felixhandte commented Dec 10, 2021

Old Benchmark `silesia.tar` on GCC-10:

Screenshot from 2021-12-10 16-52-30

The position updates are rewritten from `ip[N] = ip[N-1] + step` to be
`ip[N] = ip[N-2] + step`. This lets us only deal with the asymmetric spacing
of gaps at setup and then we only have to keep a single `step` variable.

This seems to work quite well on GCC and Clang!
@felixhandte
Copy link
Contributor Author

felixhandte commented Dec 13, 2021

Old Benchmark The new version shows the following performance:

gcc-10:

Screenshot from 2021-12-13 15-26-17

clang-13:

Screenshot from 2021-12-13 15-26-35

The templating does at ~10KB to the library size.

I couldn't find a good way to spread `ip0` and `ip1` apart when we accelerate
due to incompressible inputs. (The methods I tried slowed things down quite a
bit.)

Since we aren't splaying ip0 and ip1 apart (which would be like `0_1_2_3_`, as
opposed to the `01__23__` we were actually doing), it's a big ambitious to
increment `step` by 2. Instead, let's increment it by 1, which has the benefit
sliiightly improving compression. Speed remains pretty much unchanged.
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compression ratio for --fast=2 and higher became significantly worse. Expected?
3 participants