-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Emoji with multiple code units not detected #99
Comments
I have been doing some digging and testing (within my limited skill set). Could it be that the problem is in the polarity_scores function:
because the loop only looks at single characters, missing emojis with multiple code units? Although I have to admit that, after looking at the code, I do not understand how any emojis are found, because the above loop creates a string without emojis (replacing them with their description), which is then passed on to the sentiment_valence function, which only checks against the lexicon (which does not contain the emoji descriptions). But clearly I am missing something here. |
Found the problem, not sure about a fix. I understand now that the sentiment scores are derived from the emoji description words, like normal text, in the sentiment_valence function. So, in case of
|
Thank you for elucidating what is happening here with emoji interpretation. We are running tests and surprised that :red heart: etc. returned neutral scores consistently. I will try your fix for now, but agree that it is merely a substitute until the logic is more substantially addressed. |
Thanks for getting back about this. After a hiatus in research, I plan to get back to sentiment analysis soon, so I'll watch this space and see whether there is anything I can contribute. |
Hi, I am working on my project and came to the conclusion it is completely skipping emojis. It does not replace it with the text (ran it in debugger mode and if the sentence is an emoji it completely skips any evaluation). Any updates on this? It is really crucial for my research. |
@rehovicova |
First: apologies if I provide insufficient information or use wrong terminology. This is my first GitHub issue ever, so please be kind
Demo code works fine for me, including emojis. However, the demo emoji are described by a single code unit. Emojis with more than one, e.g. "red heart" (2764 FE0F) are not detected, despite being in the lexicon.
returns
Catch utf-8 emoji such as such as 💘 and 💋 and 😁------------------ {'neg': 0.0, 'neu': 0.615, 'pos': 0.385, 'compound': 0.875}
Not bad at all--------------------------------------------------- {'neg': 0.0, 'neu': 0.513, 'pos': 0.487, 'compound': 0.431}
Me and Fay are 4 years old today ❤️ (ft Grumio)… {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
The text was updated successfully, but these errors were encountered: