Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Feature Request]: Support Zstd codec in SerializableAvroCodecFactory #32349

Closed
1 of 17 tasks
clairemcginty opened this issue Aug 28, 2024 · 0 comments · Fixed by #32352
Closed
1 of 17 tasks

[Feature Request]: Support Zstd codec in SerializableAvroCodecFactory #32349

clairemcginty opened this issue Aug 28, 2024 · 0 comments · Fixed by #32352

Comments

@clairemcginty
Copy link
Contributor

clairemcginty commented Aug 28, 2024

What would you like to happen?

Avro 1.9+ supports ZSTD compression codec. I tried to use org.apache.avro.file.CodecFactory.zstandardCodec(3) in my Beam GenericRecord write, but ran into the following exception from SerializableAvroCodecFactory:

Exception in thread "main" java.lang.IllegalStateException: zstandard[3] is not supported
	at org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:601)
	at org.apache.beam.sdk.extensions.avro.io.SerializableAvroCodecFactory.<init>(SerializableAvroCodecFactory.java:60)
	at org.apache.beam.sdk.extensions.avro.io.AvroIO$TypedWrite.withCodec(AvroIO.java:1695)
	at org.apache.beam.sdk.extensions.avro.io.AvroIO$Write.withCodec(AvroIO.java:1923)

Is there any reason that zstd isn't supported? If not, I can add it to the list of allowed formats in SerializableAvroCodecFactory.

My guess is that's due to cross-avro-version compatibility, since DataFileConstants.ZSTANDARD_CODEC doesn't exist in Avro 1.8, but we can just hardcode that as a String rather than importing the Avro library variable to preserve compatibility with Avro 1.8.

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant