-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Wrong text encoding assumption #212
Comments
I was in this section of the code hacking on issue #237. I can see two things:
Disabling text_is_big5 and reordering enc_list to put utf8_cd at the front yields a version that gives your desired result. |
It looks that the same bug was already reported several years ago in https://sourceforge.net/p/zbar/bugs/73/ in the special case of accented characters. I came across this bug after checking a QR code created from a vcard which used twice the German character ß. The QR code had been created this way:
The created QR code was ok (according to my mobile phone's app BarcodeScsanner Version 4.7.8 and other Code scanner apps).
however displayed the German letter ß as Chinese letter テ歹. The OP already showed the bug for some french accented vovels and ligatures and for German Umlaut lower case ü. I would not be surprised if many or even all country specific characters, e.g.
all go wrong. For some strange reason, |
i just ran into this as well, trying to verify QR codes i had created myself. they contain PGP signed meta data, and the non-UTF-8 decoding of umlaut characters now invalidates these signatures. a barcode reader app on my smartphone correctly decodes the QR codes, these signatures are valid as expected. i've noticed that this issue also affects the GUI QtQR, as it relies on the python library. i'm not sure autodetection of encodings can be done reliably at all. at least, UTF-8 should probably be the default, and there should be an encoding parameter to manually set the desired encoding (e.g., anything from |
I'm glad someone is looking at this. Y'all are probably the "someone who knows more about character encoding than me" mentioned above :-) This brings up a key point: are the project owners still around, so someone can actually accept merge requests into master? |
This is (probably) the same issue in gnome Decoder. Summary:
This Python session reproduces the mistake:
So, the issue seems that it prefers BIG-5 over UTF-8. (I haven't understood the logic in qrdectxt.c yet.) Not sure I like that assumption, but as per link above, it's possible that it's the correct order in some places of the world. (Certainly not in Zürich, though.) |
It is a good workaround. I implemented the binary decoding option by bypassing the built-in character encoding conversion. It just returns the data as-is so it can be decoded separately. Care must be taken to decode every QR code individually though. Otherwise, you won't be able to tell where each QR code begins or ends. |
My concern is that it may be worse for case where the QR code actually has an encoding set. In this case it would be possible to convert it to text correctly no matter what, if the library does the conversion to text. |
This image
contain the text "Il était une fois, un noël radieiux et un gros test. Manchmal sind wir über freundlich."
but ZBar returns "Il 矇tait une fois, un no禱l radieiux et un gros test. Manchmal sind wir 羹ber freundlich.".
The text was updated successfully, but these errors were encountered: