Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[v24.2.x] compression: correct endianness in snappy_java_compressor (Manual backport) #25112

Merged

Conversation

WillemKauf
Copy link
Contributor

@WillemKauf WillemKauf commented Feb 19, 2025

Cherry-pick conflict in setup.py.

Also removed test_upgrade_java_compression from java_compression_test.py in backports.

Closes issue #25107

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Bug Fixes

  • Fix the endianness of snappy_java_compressor headers to match that of snappy-java.

The versions in the snappy header are written using big-endian format in the
`snappy-java` client used by kafka.

Mistakenly, `redpanda` would write them using little-endian format in our
`snappy_java_compressor` implementation.

Correct this by encoding and decoding the `version` and `compatible_version`
headers using big-endian format in `snappy_java_compressor`.

For references to `snappy-java`'s big-endian implementation, see:

* https://github.com/xerial/snappy-java/blob/65e1ec3de1a0d447b137c6dd6393629aa3d75b8b/src/main/java/org/xerial/snappy/SnappyOutputStream.java#L343-L349
* https://github.com/xerial/snappy-java/blob/65e1ec3de1a0d447b137c6dd6393629aa3d75b8b/src/main/java/org/xerial/snappy/SnappyCodec.java#L78-L81

(cherry picked from commit 1c1b006)
Most `snappy` clients do not perform this version check, and furthermore,
it is implemented incorrectly here.

(cherry picked from commit 5723eb4)
(cherry picked from commit 72d02ee)
The two committed files in `snappy_payload` are a raw uncompressed data file,
and a `snappy` compressed data file generated by `redpanda` using the
incorrect little-endian encoding for the version fields in the `snappy`
header.

They are used in a unit test to ensure that with the big-endian fix for
`snappy`, we are still able to decompress the buffer and get the same
decompressed data as before the fix.

(cherry picked from commit a84252d)
In order to allow `kafka-python` to use these compression types,
we must be able to import the respective module.

(cherry picked from commit 17a2e55)
To test compression compatibility with Java-based Kafka consumers/producers.

These tests are parameterized for all compression types, but they most notably
serve as reproducers for an outstanding header-field encoding bug in
`snappy_java_compressor.cc`.

(cherry picked from commit 379f380)
@vbotbuildovich
Copy link
Collaborator

CI test results

test results on build#61991
test_id test_kind job_url test_status passed
gtest_raft_rpunit.gtest_raft_rpunit unit https://buildkite.com/redpanda/redpanda/builds/61991#01951bf8-3c6e-448e-bb6b-3847c823e188 FLAKY 1/2

@WillemKauf WillemKauf enabled auto-merge February 19, 2025 17:55
@lf-rep lf-rep disabled auto-merge February 20, 2025 00:20
@lf-rep lf-rep merged commit 855abeb into redpanda-data:v24.2.x Feb 20, 2025
17 of 20 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants