UTF-8 support #244

idea--list · 2019-08-26T10:06:52Z

Wanted to display glyphs from font.h files generated from UTF-8 font.ttf files.
Found some discussions in #185 suggesting the use of iso-8859-x chars, which IMHO is not a good idea (one needs to find a proper tool to convert UTF-8 .ttf files to iso first, then generate the font.h files, that will be still larger than the ASCII font.h files boundled with the library. Even in that case you might end up not beeing able to display a glyph if your iso-8859-x coded font did contain it. Of course a font.h generated from a UTF-8 .ttf will need more memory, but it's 2019 and newer boards tend to have a right amount of memory).

While looking for a solution i also found #200 from Bodmer who also provided me some help. He forked V1.3.6 and modified it but that version does not compile on my board... and since that version even the code has been reordered here in master branch...

So i ended up comparing Bodmer's modified version with the original master and as result here is my PR that enables UTF-8 with custom made fonts, while not changing anything with the default fonts.
UTF-8 support is turned off by default ensure not breaking anything on boards with few ram, you only have to enable it by adding display.utf8() to the sketch.

Tested with a MAX32630FTHR and 2.13" Flexible Monochrome eInk / ePaper Display.
For generating UTF-8 font.h files #185 is also needed

Please test on other board/display combinations and merge ;-)

ladyada · 2019-08-26T16:52:09Z

hiya - this PR looks good, it does need a little doxygening
please read https://learn.adafruit.com/the-well-automated-arduino-library/doxygen

idea--list · 2019-08-27T12:06:21Z

Hi @ladyada i never used Doxygen and for me using any unix/linux tools is a challange.
If there is anyone who knows how to use Doxygen i am happy if i do not need to touch it :-)

By the way i also ported the whole GFX and EPD libraries to Mbed-os. Am not sure if you are interested and if so where shall i post it?

ladyada · 2019-08-27T19:34:02Z

hiya doxygen runs on mac, windows, linux - anything. its important to have all PR's with doxygen so that there is documentation to help support the code, we won't accept a PR without it. its very easy to doxygen, please check the guide we wrote

BillyDonahue · 2019-08-27T19:48:41Z

Adafruit_GFX.h

+    @param  x  true = enable (new behavior), false = disable (old behavior)
+  */
+  /**********************************************************************/
+  void utf8(boolean x=true) { _utf8 = x; }


x could have a better name. Maybe enable?

Also, giving it a default of true makes it a little ambiguous if no argument is given, because it looks like a getter. That is, gfx->utf8() can be read as a getter of the utf8 state of gfx. Better to just make them write true if they mean true.

But see above because I don't think utf8 should be a runtime switch like this.

Seems i was not paying enough attention while copy&pasting code...
Of course this should default to false to prevent creating headache on lower-end boards after the update.

BillyDonahue · 2019-08-27T19:49:39Z

Adafruit_GFX.cpp

+
+    return (uint16_t)c; // fall-back to extended ASCII
+}
+


There's a lot of new blank lines in this change. Seems like they should be omitted.

BillyDonahue · 2019-08-27T19:50:33Z

Adafruit_GFX.cpp

+
+    decoderState = 0;
+
+    return (uint16_t)c; // fall-back to extended ASCII


This conversion would happen without the C cast, and the cast could suppress warnings as the code evolves. Less is more when it comes to casts.

BillyDonahue · 2019-08-27T19:51:10Z

Adafruit_GFX.cpp

+        return (uint16_t)c;
+    }
+
+    if (decoderState == 0)


This is a departure in bracing style from the rest of the library, which is more vertically compact. When in Rome...

We don't have to treat decoderState 2 and 1 differently, we can shift bits up as we consume them and that's a simpler algorithm.

We need to error out if non-leading bytes start with the 10 bitpattern.

There should be some provision for the result of the decode to be the NUL code point.
Returning 0 as an in-band "no-result" indicator prevents this. The decoder should return a status indicator and a separate result payload, or pick some other value as the indicator.

If I can suggest an alternate implementation:
https://gcc.godbolt.org/z/E0sEcZ
or https://gist.github.com/BillyDonahue/232420eb6eeeee4130c7803c4d59f1bd

#include <cstdint> struct Utf8Decoder { Utf8Decoder() : state(0), value(0) {} /// \return /// -1 : decoding error occurred. /// 0 : success but no value emitted yet. /// 1 : success and `value` member holds a Unicode16 payload. int decode(uint8_t c); int state; uint16_t value; }; inline int Utf8Decoder::decode(uint8_t c) { if(state > 0){ if((c>>6) == 0x2){ // 10XX'XXXX value = (value<<6) | (c & ((1<<6)-1)); --state; return (state == 0) ? 1 : 0; } }else{ if((c>>7) == 0x0){ // 0XXX'XXXX value = c; return 1; } if((c>>5) == 0x6){ // 110X'XXXX value = (c & ((1<<5)-1)); state = 1; return 0; } if((c>>4) == 0xE){ // 1110'XXXX value = (c & ((1<<4)-1)); state = 2; return 0; } } // error, but retain value 'c' state = 0; value = c; return -1; }

Test:

#include <cassert> #include <cctype> #include <cstdint> #include <cstdio> #include <iomanip> #include <iostream> #include <sstream> #include <string> int main() { // little test for the decoder using namespace std::literals; auto test = [](const std::string& s) { std::ostringstream report; report << std::hex; Utf8Decoder d; for(uint8_t c : s){ switch (d.decode(c)) { case -1: report << "!"; break; case 0: report << '.'; break; case 1: report << "(" << d.value << ")"; break; } } fprintf(stderr, "[%s] => [%s]\n", s.c_str(), report.str().c_str()); return report.str(); }; assert(test("") == ""); assert(test("hello"s) == "(68)(65)(6c)(6c)(6f)"); assert(test(u8"Привет"s) == ".(41f).(440).(438).(432).(435).(442)"); assert(test(u8"Γειά σου"s) == ".(393).(3b5).(3b9).(3ac)(20).(3c3).(3bf).(3c5)"); assert(test(u8"¿Qué?"s) == ".(bf)(51)(75).(e9)(3f)"); assert(test(u8"värld"s) == "(76).(e4)(72)(6c)(64)"); assert(test("\0"s) == "(0)"); assert(test("\x80"s) == "!"); assert(test("\xc0\x80"s) == ".(0)"); assert(test("\xc0\x01"s) == ".!"); assert(test("\xe0\x80\x80"s) == "..(0)"); assert(test("\xe0\xe0\x80\x80\x41"s) == ".!!!(41)"); return 0; }

BillyDonahue · 2019-08-27T20:11:59Z

Adafruit_GFX.cpp

 /**************************************************************************/
 /*!
    @brief  Print one byte/character of data, used to support print()
-    @param  c  The 8-bit ascii character to write
+    @param  utf8  The 8-bit UTF-8 or ascii code


utf8 isn't a parameter of this function, though. Is this left over from an earlier draft?

BillyDonahue · 2019-08-27T20:12:57Z

Adafruit_GFX.cpp

+size_t Adafruit_GFX::write(uint8_t data) {
+
+    uint16_t c = (uint16_t)data;
+    if (_utf8) c = decodeUTF8(data);


Consistency: if( not if (.

BillyDonahue · 2019-08-27T21:32:58Z

Adafruit_GFX.h

@@ -107,6 +108,10 @@ class Adafruit_GFX : public Print {
    setTextSize(uint8_t sx, uint8_t sy),
    setFont(const GFXfont *f = NULL);

+  // Serial UTF-8 decoder
+  uint16_t decodeUTF8(uint8_t c);


I'd argue that this should be a free function or class, not a member of Adafruit_GFX. It only modifies decoderState and decoderBuffer and doesn't interact with the rest of Adafruit_GFX. It's also reusable outside the context of graphics so there's some reuse potential if it's factored out.

So maybe what we really have here is a minimal Utf8Decoder API and some new member function variants. We'd still upgrade drawChar as you have done, and write would expand its input type.

size_t Adafruit_GFX::write(uint16_t ch);

Maybe provide a new write adaptor member function to make things easy to use?

size_t Adafruit_GFX::writeUtf8(uint8_t ch, Utf8Decoder& decoder);

This would feed ch to decoder and call gfx.write whenever this results in emission of a unicode character. It's more flexible to leave that to the users to do for themselves. The important thing is that the statefulness of the UTF8 decoder isn't part of the GFX object. The GFX object should always be ready to write() characters (say alerts from a timer interrupt), even if it's in the middle of writing other payload strings. If UTF8 decoder state is part of GFX you can get unlucky and have your alert strings corrupt your payload strings.

If writeUtf8 or Utf8Decoder::decode aren't called, their code can be optimized out of the build image. Such a footprint reduction would not be possible if utf8 support is configured at runtime as a GFX internal. We might reasonably say that it's not the responsibility of Adafruit_GFX to handle character decoding. It just knows how to draw and write Unicode16 characters, and users have to get those in whatever way is appropriate. They might just want to store their Unicode strings as arrays of uint16_t in ROM or something and skip UTF-8 altogether. Who knows?

BillyDonahue · 2019-08-27T21:35:34Z

Adafruit_GFX.h

@@ -211,6 +228,10 @@ class Adafruit_GFX : public Print {
  /************************************************************************/
  int16_t getCursorY(void) const { return cursor_y; };

+  // Set or get an arbitrary library attribute or configuration option
+  void    setAttribute(uint8_t id = 0, uint8_t a = 0);


This appears not to have been implemented. Vestigial?

idea--list · 2019-08-28T01:30:47Z

@BillyDonahue Thanks for the code review and comments i learn a lot from it though i can not understand everything. I began learning C/C++ just a few months ago. Was just happy to find a working fork and then updated the master branch based on that as meanwhile the code got rearranged. That way it compiled on my board and did not run into issues, so i thought the code was ok.

Please feel free to modify the PR, i think the result will be much better as if i would do that :-)

idea--list · 2019-08-28T10:42:27Z

@BillyDonahue just added you as collaborator to my branch in case you can not change the code directly in this PR.

idea--list added 4 commits August 26, 2019 11:23

Update Adafruit_GFX.h

bd286d0

Update Adafruit_GFX.cpp

b4ad664

Update Adafruit_GFX.cpp

4260514

Update Adafruit_GFX.h

506acdd

idea--list mentioned this pull request Aug 26, 2019

Add capability to use custom fonts that include characters from the Unicode Basic Multilingual Plane #200

Open

BillyDonahue reviewed Aug 28, 2019

View reviewed changes

adafruit deleted a comment from Jarvis785 Dec 6, 2020

DoomHammer mentioned this pull request Feb 3, 2023

Add option to use UTF-8 glyphs #414

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF-8 support #244

UTF-8 support #244

idea--list commented Aug 26, 2019 •

edited

Loading

ladyada commented Aug 26, 2019

idea--list commented Aug 27, 2019

ladyada commented Aug 27, 2019

BillyDonahue Aug 27, 2019

idea--list Aug 28, 2019

BillyDonahue Aug 27, 2019

BillyDonahue Aug 27, 2019

BillyDonahue Aug 27, 2019 •

edited

Loading

BillyDonahue Aug 27, 2019

BillyDonahue Aug 27, 2019

BillyDonahue Aug 27, 2019

BillyDonahue Aug 27, 2019

idea--list commented Aug 28, 2019

idea--list commented Aug 28, 2019


		decoderState = 0;

		return (uint16_t)c; // fall-back to extended ASCII

UTF-8 support #244

Are you sure you want to change the base?

UTF-8 support #244

Conversation

idea--list commented Aug 26, 2019 • edited Loading

ladyada commented Aug 26, 2019

idea--list commented Aug 27, 2019

ladyada commented Aug 27, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BillyDonahue Aug 27, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

idea--list commented Aug 28, 2019

idea--list commented Aug 28, 2019

idea--list commented Aug 26, 2019 •

edited

Loading

BillyDonahue Aug 27, 2019 •

edited

Loading