Skip to content

java.lang.NumberFormatException: Bad number put into wordToNumber #547

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
dexception opened this issue Oct 16, 2017 · 1 comment
Closed

Comments

@dexception
Copy link

Same exceptions over and over again.

2017-10-15 15:36:02 WARN NumberNormalizer:81 - java.lang.NumberFormatException: Bad number put into wordToNumber. Word is: "2.7million", originally part of "2.7million", piece # 0
edu.stanford.nlp.ie.NumberNormalizer.wordToNumber(NumberNormalizer.java:294)
edu.stanford.nlp.ie.NumberNormalizer.findNumbers(NumberNormalizer.java:636)
edu.stanford.nlp.ie.NumberNormalizer.findAndMergeNumbers(NumberNormalizer.java:725)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressions(TimeExpressionExtractorImpl.java:189)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressions(TimeExpressionExtractorImpl.java:183)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressionCoreMaps(TimeExpressionExtractorImpl.java:114)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressionCoreMaps(TimeExpressionExtractorImpl.java:104)
edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.runSUTime(NumberSequenceClassifier.java:345)
edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.classifyWithSUTime(NumberSequenceClassifier.java:143)
edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.classifyWithGlobalInformation(NumberSequenceClassifier.java:106)
edu.stanford.nlp.ie.NERClassifierCombiner.recognizeNumberSequences(NERClassifierCombiner.java:369)
edu.stanford.nlp.ie.NERClassifierCombiner.classifyWithGlobalInformation(NERClassifierCombiner.java:312)
edu.stanford.nlp.ie.NERClassifierCombiner.classify(NERClassifierCombiner.java:299)
edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyToCharacterOffsets(AbstractSequenceClassifier.java:618)

2017-10-15 15:38:22 WARN NumberNormalizer:81 - java.lang.NumberFormatException: Bad number put into wordToNumber. Word is: "2.5million", originally part of "2.5million", piece # 0
edu.stanford.nlp.ie.NumberNormalizer.wordToNumber(NumberNormalizer.java:294)
edu.stanford.nlp.ie.NumberNormalizer.findNumbers(NumberNormalizer.java:636)
edu.stanford.nlp.ie.NumberNormalizer.findAndMergeNumbers(NumberNormalizer.java:725)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressions(TimeExpressionExtractorImpl.java:189)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressions(TimeExpressionExtractorImpl.java:183)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressionCoreMaps(TimeExpressionExtractorImpl.java:114)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressionCoreMaps(TimeExpressionExtractorImpl.java:104)
edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.runSUTime(NumberSequenceClassifier.java:345)
edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.classifyWithSUTime(NumberSequenceClassifier.java:143)
edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.classifyWithGlobalInformation(NumberSequenceClassifier.java:106)
edu.stanford.nlp.ie.NERClassifierCombiner.recognizeNumberSequences(NERClassifierCombiner.java:369)
edu.stanford.nlp.ie.NERClassifierCombiner.classifyWithGlobalInformation(NERClassifierCombiner.java:312)
edu.stanford.nlp.ie.NERClassifierCombiner.classify(NERClassifierCombiner.java:299)
edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyToCharacterOffsets(AbstractSequenceClassifier.java:618)

2017-10-15 15:42:44 WARN NumberNormalizer:81 - java.lang.NumberFormatException: Bad number put into wordToNumber. Word is: "3.5million", originally part of "3.5million", piece # 0
edu.stanford.nlp.ie.NumberNormalizer.wordToNumber(NumberNormalizer.java:294)
edu.stanford.nlp.ie.NumberNormalizer.findNumbers(NumberNormalizer.java:636)
edu.stanford.nlp.ie.NumberNormalizer.findAndMergeNumbers(NumberNormalizer.java:725)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressions(TimeExpressionExtractorImpl.java:189)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressions(TimeExpressionExtractorImpl.java:183)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressionCoreMaps(TimeExpressionExtractorImpl.java:114)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressionCoreMaps(TimeExpressionExtractorImpl.java:104)
edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.runSUTime(NumberSequenceClassifier.java:345)
edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.classifyWithSUTime(NumberSequenceClassifier.java:143)
edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.classifyWithGlobalInformation(NumberSequenceClassifier.java:106)
edu.stanford.nlp.ie.NERClassifierCombiner.recognizeNumberSequences(NERClassifierCombiner.java:369)
edu.stanford.nlp.ie.NERClassifierCombiner.classifyWithGlobalInformation(NERClassifierCombiner.java:312)
edu.stanford.nlp.ie.NERClassifierCombiner.classify(NERClassifierCombiner.java:299)
edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyToCharacterOffsets(AbstractSequenceClassifier.java:618)

2017-10-15 16:32:36 WARN NumberNormalizer:81 - java.lang.NumberFormatException: Bad number put into wordToNumber. Word is: "1783.9million", originally part of "1,783.9million", piece # 0
edu.stanford.nlp.ie.NumberNormalizer.wordToNumber(NumberNormalizer.java:294)
edu.stanford.nlp.ie.NumberNormalizer.findNumbers(NumberNormalizer.java:636)
edu.stanford.nlp.ie.NumberNormalizer.findAndMergeNumbers(NumberNormalizer.java:725)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressions(TimeExpressionExtractorImpl.java:189)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressions(TimeExpressionExtractorImpl.java:183)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressionCoreMaps(TimeExpressionExtractorImpl.java:114)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressionCoreMaps(TimeExpressionExtractorImpl.java:104)
edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.runSUTime(NumberSequenceClassifier.java:345)
edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.classifyWithSUTime(NumberSequenceClassifier.java:143)
edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.classifyWithGlobalInformation(NumberSequenceClassifier.java:106)
edu.stanford.nlp.ie.NERClassifierCombiner.recognizeNumberSequences(NERClassifierCombiner.java:369)
edu.stanford.nlp.ie.NERClassifierCombiner.classifyWithGlobalInformation(NERClassifierCombiner.java:312)
edu.stanford.nlp.ie.NERClassifierCombiner.classify(NERClassifierCombiner.java:299)
edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyToCharacterOffsets(AbstractSequenceClassifier.java:618)
com.innefu.util.NERSentimentUtil.getStanford(NERSentimentUtil.java:382)

2017-10-15 16:32:36 WARN NumberNormalizer:81 - java.lang.NumberFormatException: Bad number put into wordToNumber. Word is: "356.8million", originally part of "356.8million", piece # 0
edu.stanford.nlp.ie.NumberNormalizer.wordToNumber(NumberNormalizer.java:294)
edu.stanford.nlp.ie.NumberNormalizer.findNumbers(NumberNormalizer.java:636)
edu.stanford.nlp.ie.NumberNormalizer.findAndMergeNumbers(NumberNormalizer.java:725)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressions(TimeExpressionExtractorImpl.java:189)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressions(TimeExpressionExtractorImpl.java:183)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressionCoreMaps(TimeExpressionExtractorImpl.java:114)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressionCoreMaps(TimeExpressionExtractorImpl.java:104)
edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.runSUTime(NumberSequenceClassifier.java:345)
edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.classifyWithSUTime(NumberSequenceClassifier.java:143)
edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.classifyWithGlobalInformation(NumberSequenceClassifier.java:106)
edu.stanford.nlp.ie.NERClassifierCombiner.recognizeNumberSequences(NERClassifierCombiner.java:369)
edu.stanford.nlp.ie.NERClassifierCombiner.classifyWithGlobalInformation(NERClassifierCombiner.java:312)
edu.stanford.nlp.ie.NERClassifierCombiner.classify(NERClassifierCombiner.java:299)
edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyToCharacterOffsets(AbstractSequenceClassifier.java:618)

2017-10-16 00:20:12 WARN NumberNormalizer:81 - java.lang.NumberFormatException: Bad number put into wordToNumber. Word is: "1.7billion", originally part of "1.7billion", piece # 0
edu.stanford.nlp.ie.NumberNormalizer.wordToNumber(NumberNormalizer.java:294)
edu.stanford.nlp.ie.NumberNormalizer.findNumbers(NumberNormalizer.java:636)
edu.stanford.nlp.ie.NumberNormalizer.findAndMergeNumbers(NumberNormalizer.java:725)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressions(TimeExpressionExtractorImpl.java:189)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressions(TimeExpressionExtractorImpl.java:183)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressionCoreMaps(TimeExpressionExtractorImpl.java:114)
edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressionCoreMaps(TimeExpressionExtractorImpl.java:104)
edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.runSUTime(NumberSequenceClassifier.java:345)
edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.classifyWithSUTime(NumberSequenceClassifier.java:143)
edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.classifyWithGlobalInformation(NumberSequenceClassifier.java:106)
edu.stanford.nlp.ie.NERClassifierCombiner.recognizeNumberSequences(NERClassifierCombiner.java:369)
edu.stanford.nlp.ie.NERClassifierCombiner.classifyWithGlobalInformation(NERClassifierCombiner.java:312)
edu.stanford.nlp.ie.NERClassifierCombiner.classify(NERClassifierCombiner.java:299)
edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyToCharacterOffsets(AbstractSequenceClassifier.java:618)

@demongolem
Copy link

demongolem commented Jan 23, 2018

I face the same sort of thing with Word is: ".6billion", originally part of ".6billion", piece # 0. Is there going to be any work done on this?
It looks like consecutive tokens are .6 and billion and then in getTokenText in ChunkAnnotationUtils, the StringBuilder is constructed without a delimiter such that .6billion gets passed to wordToNumber, probably the same as in the original issue reported.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants