Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

How to set language for CoreNLP Simple API #551

Closed
maziyarpanahi opened this issue Oct 24, 2017 · 10 comments
Closed

How to set language for CoreNLP Simple API #551

maziyarpanahi opened this issue Oct 24, 2017 · 10 comments

Comments

@maziyarpanahi
Copy link

Hi,

I am using Simple API in my Spark applications. It is very fast compare to normal pipeline/annotation. I was wondering how to set a different language for my POS tagger.
Here is how I use it for default English in Scala:

val newSentences = new Document(document).sentences().asScala.map(_.text())
val wordsArray = new Sentence(finalSentence).words().asScala
val posTagsArray = new Sentence(finalSentence).posTags().asScala

That's being said, I saw in the code there is option to pass properties for Simple API:

public Sentence(String text, Properties props) {
    // Set document
    this.document = new Document(text);
    // Set sentence
    if (props.containsKey("ssplit.isOneSentence")) {
      this.impl = this.document.sentence(0, props).impl;
    } else {
      Properties modProps = new Properties(props);
      modProps.setProperty("ssplit.isOneSentence", "true");
      this.impl = this.document.sentence(0, modProps).impl;
    }
    // Set tokens
    this.tokensBuilders = document.sentence(0).tokensBuilders;
    // Asserts
    assert (this.document.sentence(0).impl == this.impl);
    assert (this.document.sentence(0).tokensBuilders == this.tokensBuilders);
  }

Even posTags can pas properties variable:

public List<String> posTags(Properties props) {
    document.runPOS(props);
    synchronized (impl) {
      return lazyList(tokensBuilders, CoreNLPProtos.Token.Builder::getPos);
    }
  }

But neither works when I set fr as a language:

val props = new Properties()
props.setProperty("tokenize.language", "fr")
val posTagsArray = new Sentence("Au fond, les choses sont assez simples.", props).posTags(props)

Does anyone know how to change language for Simple API?

Many thanks.

@gangeli
Copy link
Member

gangeli commented Oct 24, 2017

So, setting the properties manually should work (I'll look into why it doesn't), but easier is to use the class FrenchSentence and FrenchDocument (or the equivalent for other languages). These should have the properties pre-set on the object, so it should do the right thing when you call the associated functions (e.g., posTags()).

@maziyarpanahi
Copy link
Author

hi @gangeli and thanks for your respond. I saw FrenchSentence and FrenchDocument, but unfortunately it says:

import edu.stanford.nlp.simple._
new Sentence(document).posTags()

error: not found: type FrenchSentence

Maybe I am missing another import?

@gangeli
Copy link
Member

gangeli commented Oct 26, 2017

What version of CoreNLP are you using? I believe it should be in 3.8.0, but I do know it's a relatively recent addition.

@maziyarpanahi
Copy link
Author

maziyarpanahi commented Oct 27, 2017

@gangeli unfortunately I can't get CoreNLP 3.8 working with Spark 2.2. I have tested 3.6 and 3.7, but when I try to use 3.8 it always complain about Google protobuf:

Type 'com/google/protobuf/GeneratedMessage$GeneratedExtension' (current frame, stack[1]) is not assignable to 'com/google/protobuf/ExtensionLite'

So my CoreNLP is 3.7. I guess the FrenchSentence is in 3.8?

PS: I created a separate issue for the Spark and 3.8 #556

@J38
Copy link
Contributor

J38 commented Oct 31, 2017

@maziyarpanahi were you able to get the FrenchSentence and FrenchDocument working ?

@J38 J38 closed this as completed Oct 31, 2017
@kiru
Copy link

kiru commented Dec 27, 2017

@gangeli I couldn't find the FrenchSentence and FrenchDocument class in the 3.8.0 release. I see sem in the latest version ( https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/simple/FrenchDocument.java).
I tried to set same properties, to get it working with French, but in version 3.8.0 the properties given to this constructor are ignored:
edu.stanford.nlp.simple.Document#Document(java.util.Properties, java.lang.String)

Is there another way to set the properties? ( or just change the language?)

Thanks,
Kiru

@maziyarpanahi
Copy link
Author

Hi @J38
Sorry for the delay, but the first question is still unanswered since I can't change the language for SimpleAPI.
Is it at all possible to use a different language or model for SimpleAPI?
Many thanks.

@ernesto-butto
Copy link

ernesto-butto commented Jun 12, 2018

Hello, I'm having the same issue, I'm using 3.9. I don't think this issue should be closed, I see the code in the repository but can't load edu.stanford.nlp.simple.SpanishDocument

 <dependency>
            <groupId>edu.stanford.nlp</groupId>
            <artifactId>stanford-corenlp</artifactId>
            <version>3.9.1</version>
        </dependency>
        <dependency>
            <groupId>edu.stanford.nlp</groupId>
            <artifactId>stanford-corenlp</artifactId>
            <version>3.9.1</version>
            <classifier>models</classifier>
        </dependency>

        <dependency>
            <groupId>edu.stanford.nlp</groupId>
            <artifactId>stanford-corenlp</artifactId>
            <version>3.9.1</version>
            <classifier>models-spanish</classifier>
        </dependency>

@jeigrei
Copy link

jeigrei commented Jan 23, 2019

I agree that this issue should be reopened -- I'm having the same issue as @poolebu

@phuongnm94
Copy link

phuongnm94 commented Apr 27, 2021

I tried Simple API with the French language on both Stanford libraries versions 3.9.1 or 4.2.0

It works successfully

import edu.stanford.nlp.simple.{Document, Sentence}
import edu.stanford.nlp.util.StringUtils;

var props=StringUtils.argsToProperties("-props", "StanfordCoreNLP-french.properties")
props.setProperty("annotators", "tokenize,ssplit,parse");
var x = "Das ist auch für die Bedingungen des Binnenmarktes von Wichtigkeit ."
var s = new Sentence(x, props)

s.posTags() 

s.nerTags() 

try { s.parse(props)   } catch{ case _:  Throwable => }  finally { } // The first call parse alway error because of Graph null, so I found a work around solution by this line.
s.parse(props).pennString() 

and output:


scala> s.posTags() 
res1: java.util.List[String] = [PROPN, AUX, VERB, ADP, DET, PROPN, DET, PROPN, ADP, PROPN, PUNCT]

scala> 

scala> s.nerTags() 
res2: java.util.List[String] = [O, O, O, O, O, I-LOC, O, I-PER, I-PER, I-PER, O]

scala> 

scala> try { s.parse(props)   } catch{ case _:  Throwable => }  finally { } // The first call parse alway error because of Graph null, so I found a work around solution by this way.
res3: Any = ()

scala> s.parse(props).pennString() 
res4: String =
"(ROOT
  (SENT
    (NP (PROPN Das))
    (VN (AUX ist) (VERB auch))
    (PP (ADP für)
      (NP (DET die) (PROPN Bedingungen)))
    (NP (DET des) (PROPN Binnenmarktes)
      (PP (ADP von)
        (NP (PROPN Wichtigkeit))))
    (PUNCT .)))
"

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants