Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

CoreNLP 3.8 fails in Apache Spark #556

Closed
maziyarpanahi opened this issue Oct 27, 2017 · 5 comments
Closed

CoreNLP 3.8 fails in Apache Spark #556

maziyarpanahi opened this issue Oct 27, 2017 · 5 comments

Comments

@maziyarpanahi
Copy link

Hi,

I can use CoreNLP 3.6 and 3.7 simply by calling these jars in my Spark app (1.6 and 2.2):

spark-shell --master yarn --deploy-mode client --queue multivac --driver-cores 5 --driver-memory 8g --executor-cores 5 --executor-memory 4g --num-executors 30 --jars /home/jars/stanford-corenlp-3.7.0/ejml-0.23.jar,/home/jars/stanford-corenlp-3.7.0/stanford-corenlp-3.7.0.jar,/home/jars/stanford-corenlp-3.7.0/stanford-corenlp-3.7.0-models.jar,/home/jars/stanford-corenlp-3.7.0/protobuf.jar,/home/jars/stanford-corenlp-3.7.0/jollyday.jar

But if I try the same set of jars from CoreNLP 3.8 it always fails with this error:

scala> import edu.stanford.nlp.simple._
scala> new Sentence(document).words()

java.lang.VerifyError: Bad type on operand stack
Exception Details:
  Location:
    com/google/protobuf/GeneratedMessageV3$ExtendableMessage.getExtension(Lcom/google/protobuf/GeneratedMessage$GeneratedExtension;I)Ljava/lang/Object; @3: invokevirtual
  Reason:
    Type 'com/google/protobuf/GeneratedMessage$GeneratedExtension' (current frame, stack[1]) is not assignable to 'com/google/protobuf/ExtensionLite'
  Current Frame:
    bci: @3
    flags: { }
    locals: { 'com/google/protobuf/GeneratedMessageV3$ExtendableMessage', 'com/google/protobuf/GeneratedMessage$GeneratedExtension', integer }
    stack: { 'com/google/protobuf/GeneratedMessageV3$ExtendableMessage', 'com/google/protobuf/GeneratedMessage$GeneratedExtension', integer }
  Bytecode:
    0x0000000: 2a2b 1cb6 0024 b0

  at edu.stanford.nlp.simple.Document.<init>(Document.java:433)
  at edu.stanford.nlp.simple.Sentence.<init>(Sentence.java:118)
  at edu.stanford.nlp.simple.Sentence.<init>(Sentence.java:126)
  ... 56 elided

Any help is appreciated,

Cheers,
Maziyar

@J38
Copy link
Contributor

J38 commented Oct 27, 2017

I have a theory which may be incorrect.

Look at this page on Maven Central:

https://search.maven.org/#artifactdetails%7Corg.apache.spark%7Cspark-parent%7C1.2.2%7Cpom

You'll notice that this project relies on protobuf 2.4.1. Stanford CoreNLP uses protobuf 3.2.0.

I think the mismatch is causing this problem. The bad news is I'm not sure how to resolve this. This page also claims Spark doesn't directly use protobuf, so you could look at your Spark installation and the jars it uses and see if you can manually upgrade to the protobuf 3.2.0 jar.

@J38
Copy link
Contributor

J38 commented Oct 27, 2017

Even more evidence of the protobuf conflict:

https://github.com/apache/spark/blob/master/pom.xml

@J38
Copy link
Contributor

J38 commented Oct 27, 2017

My advice is to figure out where Spark's jar dependencies are, and manually change the dependency to 3.2.0 and see if that fixes things.

@maziyarpanahi
Copy link
Author

Hi @J38
Good catch! You are right the Spark pulls the older version 2.5.0 of protobuf-java and when I remove the current protobuf-java-2.5.0.jar and replace it with the latest version (following) it works without any problem.
http://central.maven.org/maven2/com/google/protobuf/protobuf-java/3.4.0/protobuf-java-3.4.0.jar

I opened an issue to see if it is possible to bump the version of protobuf to a newer version:
https://issues.apache.org/jira/browse/SPARK-22380

Many thanks @J38 for your catch, I am going to manually use 3.4 instead of 2.5 and I hope in the next release of Spark this dependency is already the newest version.

Cheers,
Maziyar

@sseveran
Copy link

I believe that hadoop itself has the dependency on protobuf. This is going to be fixed in 3.0. See https://issues.apache.org/jira/browse/HADOOP-11804

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants