Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Emoji and variation selectors (VS15 / VS16) #997

Closed
christianparpart opened this issue Aug 3, 2021 · 14 comments
Closed

Emoji and variation selectors (VS15 / VS16) #997

christianparpart opened this issue Aug 3, 2021 · 14 comments
Labels
enhancement New feature or request fixed-in-nightly This is (or is assumed to be) fixed in the nightly builds.

Comments

@christianparpart
Copy link

i see you are supporting also ZWJ emoji, so it seems.

Maybe you also want to support U+FE0E and U+FE0F VS15/VS16 variation selectors?

  • U+FE0E overrides width to Narrow and changes emoji presentation to text (standard glyph rendering)
  • U+FE0F overrides width to Wide and changes emoji presentation to emoji (square colored bitmaps)

I just checked, Kitty also seems to do that, so I'll now allow width changes to.

There is only a little number of TEs doing proper text shaping but I think the list is slowly growing and thus, maybe you want to consider supporting VS15/16 too.

@christianparpart christianparpart added the enhancement New feature or request label Aug 3, 2021
@wez
Copy link
Member

wez commented Aug 3, 2021

wezterm defers to harfbuzz for shaping, so I'm not sure what more is needed.
Is there a test case that contrasts these behaviors?

@christianparpart
Copy link
Author

I am sorry, I do not know your source code internals. But do you forward all the necessary codepoints? I am sorry if I'm giving superfluous information here (apologizing ahead of time :) ), but maybe U+FE0F is not landing in the same grid cell as U+00A9 for example?

echo -ne "\u00A9\uFE0F"

On top of that, U+00A9 should be rendered using regular font, but U+00A9 followed by a U+FE0F represents one grapheme cluster whose emoji presentation is "emoji" (i.e. not "text") and therefore should use the emoji font instead of the non-emoji font (e.g. regular).

Have a look here maybe for a hopefully self-explaining illustration:

https://github.com/contour-terminal/libunicode/blob/master/src/unicode/emoji_segmenter_test.cpp#L136-L145

wez added a commit that referenced this issue Aug 4, 2021
These modifiers have the effect of forcing us to consider the grapheme
as being either a single cell (VS15) or two cells (VS16) in the
terminal model.

These don't affect font choice as wezterm doesn't know whether a given
font in the fallback has a textual vs. an emoji version of a given
glyph, or whether a later font in the fallback has one or the other
because we can't know until we fall back, and that has a very high
cost--we perform fallback asynchronously in another thread because
of its high cost.

Depending on the selected glyph, it may or may not render as double
wide.

refs: #997
@khaledhosny
Copy link

If the font supports both text and emoji presentation and can switch between them internally (e.g. using ligatures) then Hard buzz would help. I haven't seen any fonts do this, though, most systems handle it by switching fonts based on what presentation is required.

@christianparpart
Copy link
Author

most systems handle it by switching fonts based on what presentation is required.

Exactly. This is what I was trying to say. And it look like wezterm isn't doing that. Or did i miss something?

@wez
Copy link
Member

wez commented Aug 4, 2021

From the commit message in 76b72e4:

These don't affect font choice as wezterm doesn't know whether a given
font in the fallback has a textual vs. an emoji version of a given
glyph, or whether a later font in the fallback has one or the other
because we can't know until we fall back, and that has a very high
cost--we perform fallback asynchronously in another thread because
of its high cost.

AFAIK, there isn't a concept of "this font is a text font" or "this font is an emoji presentation font" which means that a dedicated emoji font fallback list option would need to be introduced for this in order for wezterm to choose a font to match the selected presentation. I'm not sure how I feel about that at the moment, but I'll give it some thought.

@khaledhosny
Copy link

I think heuristics can be used to detects color fonts, e,g. using https://harfbuzz.github.io/harfbuzz-hb-ot-color.html#hb-ot-color-has-layers, https://harfbuzz.github.io/harfbuzz-hb-ot-color.html#hb-ot-color-has-png, etc. and assume these have emoji presentation by default. Since there are not that many emoji fonts, a list of known emoji font names can be used in addition to/instead of such checks.

wez added a commit that referenced this issue Aug 6, 2021
Make a distinction between default and selected presentation,
and account for that in the cell width.

Add a method to the cell that returns the effective presentation.

refs: #997
wez added a commit that referenced this issue Aug 6, 2021
This should give the shaper a better chance at using text
presentation in a run that mixes emoji with text and/or
uses presentation selectors.

It also exposes the presentation property to the shaper
so that it could potentially adjust its fallback strategy.
However, it doesn't do that here.

refs: #997
wez added a commit that referenced this issue Aug 11, 2021
This commit annotates fonts with a boolean that indicates whether
we think it contains glyphs with emoji presentation, and then
passes the cluster.presentation field down to the shaper.

If the presentation doesn't match the current font in the fallback,
then it will be skipped until we exhaust its options.

`wezterm ls-fonts` also shows whether we think a font has emoji
presentation.

refs: #997
@wez
Copy link
Member

wez commented Aug 11, 2021

as of 0866e5d the behavior is now:

; wezterm ls-fonts
Primary font:
wezterm.font_with_fallback({
  -- /home/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs
  {family="Operator Mono SSm Lig", weight="DemiLight"},

  -- /home/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontConfig
  {family="Operator Mono SSm Lig", weight="DemiLight"},

  -- /home/wez/.fonts/MaterialDesignIconsDesktop.ttf, FontDirs
  "Material Design Icons Desktop",

  -- /home/wez/.fonts/MaterialDesignIconsDesktop.ttf, FontConfig
  "Material Design Icons Desktop",

  -- /usr/share/fonts/google-noto-emoji/NotoColorEmoji.ttf, FontConfig
  -- Assumed to have Emoji Presentation
  "Noto Color Emoji",

  -- <built-in>, BuiltIn
  "JetBrains Mono",

  -- <built-in>, BuiltIn
  "Last Resort High-Efficiency",

})

image

@wez wez added the fixed-in-nightly This is (or is assumed to be) fixed in the nightly builds. label Aug 11, 2021
@wez wez closed this as completed Aug 12, 2021
@christianparpart
Copy link
Author

Hey @wez

thanks for also taking care of VS15/VS16, but I had some findings ->

while looking for terminals that do support VS15/VS16 (again), I was also trying your TE (again :) ).

echo -e "\U0001f600\ufe0e"

yields to the following notification

No fonts contain glyphs for these codepoints: \u{fe0e}.
Placeholder glyphs are being displayed instead.

and that visual:

image

but should look like (gnome-terminal screenshot):

image

I did not look into your source code, but maybe there is no grapheme cluster segmentation happening and that's why this grapheme cluster) is split and passed into two separate hb_shape() calls, which it shouldn't?

(p.s.: I actually came here to see what cell width your apply to VS15 overrides, as I think there is no common consensus in the TE dev community.)

@wez
Copy link
Member

wez commented Aug 16, 2021

I think that notification is showing the wrong information (it infers the grapheme from something that's lost information on its journey to harfbuzz). I think what's really happening is that the presentation mode is preventing wezterm from finding any matching font. I'll dig into that.

Regarding your question about widths, the logic I settled on is:

  • If VS specifies Emoji, use 2
  • If VS specifies Text, use 1
  • If default presentation is Emoji, use 2
  • if default presentation is Text, use the unicode width algorithm on grapheme but clamp its maximum value to 2, because the crate that implements that algorithm is unaware of emoji combining sequences (ugh!)

@4cm4k1
Copy link
Contributor

4cm4k1 commented Aug 18, 2021

Here are 2 test cases, if this helps, I found while running zplug update and yarn test (using jest), both of which have emoji in their output. Since these examples have the Unicode codepoint for emoji presentation variation selector, I think it has to do with this issue and the fact that these particular symbols default to text presentation:

  • ✔️ "\U2714\UFE0F"
  • ✖️ "\U2716\UFE0F"

Here is a screenshot illustrating various echo statements of the first example listed:
Terminal output for `echo -e '\U2714\UFE0F'`

Version: wezterm 20210816-234431-24d971c8

wez added a commit that referenced this issue Aug 18, 2021
The introduction of the Emoji vs Text VS processing means that we might
in some cases not find a glyph with the requested presentation.

In that case, we'd rather show the emoji presentation glyph than none at
all, so we'll retry fallback processing with unspecified presentation.

refs: #997
@wez
Copy link
Member

wez commented Aug 18, 2021

@christianparpart FWIW, I get the same results as gnome terminal when I echo -e "\U0001f600\ufe0e" on my system. I think there might be something a bit wonky about the font resolution for you (wezterm ls-fonts --text "😀︎" might be interesting to see)

I pushed a commit that avoids claiming that VS15 or VS16 are missing glyphs; that was just a simplistic oversight in the error reporting path.

That same commit will make a second pass through the fallback list and ignore the selected presentation so that in the case where we couldn't find the glyph in the requested presentation, we'll accept any presentation. We'll still proceed to try loading fallbacks from the system and re-evaluate once they load, so hopefully that makes things look a bit better for @4cm4k1

@4cm4k1
Copy link
Contributor

4cm4k1 commented Aug 18, 2021

@wez Pulled down the newly built nightly; LGTM! Thanks.

@SlySven
Copy link

SlySven commented Nov 4, 2021

Might I make an observation in that whether a colour emoji presentation is a narrow or wide one does depend on (I think) the version of Unicode that it it was introduced in. You may find https://github.com/ridiculousfish/widecharwidth/ of interest - which is keep up to date (unlike Markus Kuhn's mk_wcwidth()/mk_wcwidth_cjk() which have not been updated since Unicode 5.0!)

wez added a commit that referenced this issue Nov 25, 2021
This is a fairly far-reaching commit. The idea is:

* Introduce a unicode_version config that specifies the default level
  of unicode conformance for each newly created Terminal (each Pane)
* The unicode_version is passed down to the `grapheme_column_width`
  function which interprets the width based on the version
* `Cell` records the width so that later calculations don't need to
  know the unicode version

In a subsequent diff, I will introduce an escape sequence that allows
setting/pushing/popping the unicode version so that it can be overridden
via eg: a shell alias prior to launching an application that uses a
different version of unicode from the default.

This approach allows output from multiple applications with differing
understanding of unicode to coexist on the same screen a little more
sanely.

Note that the default `unicode_version` is set to 9, which means that
emoji presentation selectors are now by-default ignored.  This was
selected to better match the level of support in widely deployed
applications.

I expect to raise that default version in the future.

Also worth noting: there are a number of callers of
`unicode_column_width` in things like overlays and lua helper functions
that pass `None` for the unicode version: these will assume the latest
known-to-wezterm/termwiz version of unicode to be desired. If those
overlays do things with emoji presentation selectors, then there may be
some alignment artifacts. That can be tackled in a follow up commit.

refs: #1231
refs: #997
wez added a commit that referenced this issue Nov 25, 2021
As promised in the previous commit, this one implements an escape
sequence to control the unicode version.

Unknown to me in the previous commit, iTerm2 already defines such
an escape sequence, so we simply implement it here with the same
semantics.

refs: #1231
refs: #997
wez added a commit that referenced this issue Nov 25, 2021
Not all codepoints are valid when combined with a presentation
selector.

This commit ensures that we respect the valid sequences defined
by the current version of unicode (version 14).

refs: #1231
refs: #997
@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2023

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 4, 2023
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
enhancement New feature or request fixed-in-nightly This is (or is assumed to be) fixed in the nightly builds.
Projects
None yet
Development

No branches or pull requests

5 participants