Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

bijection-avro sometimes deserializes objects to GenericData.Record instead of the requested type #265

Open
rabejens opened this issue Jun 12, 2017 · 1 comment

Comments

@rabejens
Copy link

I defined an Avro schema and used SBT Avrohugger to generate the Scala code. Serialization and deserialization so far works on my local machine. I am doing something like this:

val x: Array[Byte] = ... // Get the serialized data
val myThing = SpecificAvroCodecs.toBinary[MyAvroThing](MyAvroThing.SCHEMA$).invert(x)

When I run this locally, it works perfectly. I now created a Spark task that can be submitted to Spark with the help of the SBT Assembly plugin. When I "submit" this task locally (using spark-submit --master local[*]), this serialization works. However, when I submit it to a "real" Spark installation, I get a CCE:

Exception in thread "main" java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to com.example.avro.MyAvroThing

So, the deserializer does not recognize the format and deserializes it to a generic Avro type. I double checked that all necessary Avro libraries and Twitter's Bijection-Avro are correctly embedded in my resulting JAR.

As a next investigation step, I analyzed the GenericData.Record I get by doing:

val mystery = SpecificAvroCodecs.toBinary[MyAvroThing](MyAvroThing.SCHEMA$).invert(x).asInstanceOf[Try[Any]]
mystery.get match {
  case _: MyAvroThing => println("ok!")
  case r: GenericData.Record => println("Got a generic record with schema: " + r.getSchema.getFields.map(_.name()).mkString(", "))
  case _ => println("Got something completely different")
}

When I run this locally, it prints out ok! as it correctly gets the MyAvroThing. When I run this on the Spark cluster, I get:

Got a generic record with schema: foo, bar, quux

this means, my schema IS honored by the deserializer and it is deserialized correctly, only the transformation to the resulting class is not done somehow.

When I query the record's fields by name, I get the correct data I expect in my MyAvroThing.

What is going wrong here?

@johnynek
Copy link
Collaborator

I wonder if the issue could be a classpath issue. Locally you have one version of avro, on the cluster you have another and it shows up as a runtime error.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants