Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Bug]: Beam YAML WriteToJson fails on Beam 2.55 #30776

Closed
2 of 16 tasks
Polber opened this issue Mar 27, 2024 · 2 comments · Fixed by #30779 or #30780
Closed
2 of 16 tasks

[Bug]: Beam YAML WriteToJson fails on Beam 2.55 #30776

Polber opened this issue Mar 27, 2024 · 2 comments · Fixed by #30779 or #30780

Comments

@Polber
Copy link
Contributor

Polber commented Mar 27, 2024

What happened?

Since Beam 2.55 was released, the Cross-language transform for JsonWrite does not work on Beam YAML (or Beam Python when using ExternalTransform)

A change to https://github.com/apache/beam/blob/master/sdks/java/io/json/build.gradle removed a dependency on everit -
implementation library.java.everit_json_schema
PR: #29924

This also removed the library from being packaged into the beam-sdks-java-extensions-sql-expansion-service-2.55.0.jar, (sdks:java:extensions:schemaio-expansion-service:shadowJar)

So, when using xlang JsonWrite - https://github.com/apache/beam/blob/master/sdks/java/io/json/src/main/java/org/apache/beam/sdk/io/json/providers/JsonWriteTransformProvider.java
the expansion will fail complaining about java.lang.ClassNotFoundException: org.everit.json.schema.Schema$Builder

Issue Priority

Priority: 1 (data loss / total loss of function)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@kennknowles
Copy link
Member

Seems like there's an issue in how dependencies are specified. Searched for uses of it: https://github.com/search?q=repo%3Aapache%2Fbeam+org.everit+language%3AJava&type=code

It looks like the core SDK depends on it but requires users to add it as a dependency:

provided library.java.everit_json_schema

You were getting lucky that it was also added as a firm dependency, despite sdks/java/io/json/ not actually depending on it. I bet the reason I removed it was that I got an IWYU error. There are two good fixes: (1) add a dep directly at the point of bundling the expansion service jar or (2) just add the dep to the core SDK. And I guess there is fix (3) which is cludge to check if it is present and don't validate if it is not available.

@kennknowles
Copy link
Member

I notice that sdks/java/extensions/schemaio-expansion-service/build.gradle is has suppressed all dependency configuration warnings.

I presume this is because it does not directly depend on any of those things, but wants them in the uber jar. I have to believe there is a more principled way of achieving that, for example a runtime scope or something to do with shadow jar configuration?

# for free to join this conversation on GitHub. Already have an account? # to comment