-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Exception: Encountered <C_ESC_STRING>, Expected <STRING> #136
Comments
The error does come from this char: https://www.codetable.net/decimal/8729
This is the RTF I can see in the MsgViewer:
The problem should be somewhere around those three lines with the 8729 Bullet Operator:
I'm trying to get closer to the culprit, but my RTF knowledge does only go that far... |
Changing RTFParser.jj like this seems to do the trick:
(Just removed the This will change the File RTFParser.java from this: final public void unicode_char() throws ParseException {Token code;
code = jj_consume_token(C_UNICODE);
jj_consume_token(STRING);
current_group.addUnicodeChar( code.image );
} to this: final public void unicode_char() throws ParseException {Token code;
code = jj_consume_token(C_UNICODE);
current_group.addUnicodeChar( code.image );
} I think the jj_consume_token(STRING) isn't needed in this case - but I may be way off here! Outcome: No exception and Unicode char is shown, but I'm not quite sure that I'm absolutely correct here. |
Some more info from here: https://www.zopatista.com/python/2012/06/06/rtf-and-unicode/
So this Sequence is actually two times the same char. I'm not quite sure why this is the case... |
A few infos how the parser works: https://stackoverflow.com/questions/17310377/what-does-consume-mean-in-javacc |
Okay, so at the moment I have two different cases for the parser: Number 1:
This is Unicode 8729 followed by WIN-1252 B7 -> Both are the "bullet point" character. BUT, a few lines along my mail I got the following:
And THIS one seems to be F0A7 (65536 - 3929 in Hex because above 32767 RTF wants Unicode Chars to be negative) which is in the private Unicode Characters Area. I have no clue what this char does mean or what it does there, but it seems we're not alone: So this sequence seems to be some weird character (which we don't care) but followed by the replacement character '?' - which is NOT an C_ESC_STRING - it's a normal STRING. It seems our parser needs to handle both those cases, which it does not at the moment. |
The Pull Request will fix the problem. |
Hi @ThomasChr and thank you very much for your engagement 😃 That's right the you want to get rid of is the placeholder (the char that should be displayed when the parser does not support Unicode). However, it seems that in your case the placeholder is not a but a <C_ESC_STRING> which is unexpected 🙂 I'll take a look at the PR as soon as I have a comfortable time ahead |
Take your time, no stress! |
I've got a mail which throws the following error:
I can't provide the mail because it's private. Just wanted to open the issue so that we can track it. Maybe I'll find it myself.
Any ideas from the top of your head?
The text was updated successfully, but these errors were encountered: