-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathworking-with-harfbuzz-clusters.html
233 lines (233 loc) · 10 KB
/
working-with-harfbuzz-clusters.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Working with HarfBuzz clusters: HarfBuzz Manual</title>
<meta name="generator" content="DocBook XSL Stylesheets Vsnapshot">
<link rel="home" href="index.html" title="HarfBuzz Manual">
<link rel="up" href="clusters.html" title="Clusters">
<link rel="prev" href="clusters.html" title="Clusters">
<link rel="next" href="a-clustering-example-for-levels-0-and-1.html" title="A clustering example for levels 0 and 1">
<meta name="generator" content="GTK-Doc V1.34.0 (XML mode)">
<link rel="stylesheet" href="style.css" type="text/css">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<table class="navigation" id="top" width="100%" summary="Navigation header" cellpadding="2" cellspacing="5"><tr valign="middle">
<td width="100%" align="left" class="shortcuts"></td>
<td><a accesskey="h" href="index.html"><img src="home.png" width="16" height="16" border="0" alt="Home"></a></td>
<td><a accesskey="u" href="clusters.html"><img src="up.png" width="16" height="16" border="0" alt="Up"></a></td>
<td><a accesskey="p" href="clusters.html"><img src="left.png" width="16" height="16" border="0" alt="Prev"></a></td>
<td><a accesskey="n" href="a-clustering-example-for-levels-0-and-1.html"><img src="right.png" width="16" height="16" border="0" alt="Next"></a></td>
</tr></table>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="working-with-harfbuzz-clusters"></a>Working with HarfBuzz clusters</h2></div></div></div>
<p>
When you add text to a HarfBuzz buffer, each code point must be
assigned a <span class="emphasis"><em>cluster value</em></span>.
</p>
<p>
This cluster value is an arbitrary number; HarfBuzz uses it only
to distinguish between clusters. Many client programs will use
the index of each code point in the input text stream as the
cluster value. This is for the sake of convenience; the actual
value does not matter.
</p>
<p>
Some of the shaping operations performed by HarfBuzz —
such as reordering, composition, decomposition, and substitution
— may alter the cluster values of some characters. The
final cluster values in the buffer at the end of the shaping
process will indicate to client programs which subsequences of
glyphs represent a cluster and, therefore, must not be
separated.
</p>
<p>
In addition, client programs can query the final cluster values
to discern other potentially important information about the
glyphs in the output buffer (such as whether or not a ligature
was formed).
</p>
<p>
For example, if the initial sequence of cluster values was:
</p>
<pre class="programlisting">
0,1,2,3,4
</pre>
<p>
and the final sequence of cluster values is:
</p>
<pre class="programlisting">
0,0,3,3
</pre>
<p>
then there are two clusters in the output buffer: the first
cluster includes the first two glyphs, and the second cluster
includes the third and fourth glyphs. It is also evident that a
ligature or conjunct has been formed, because there are fewer
glyphs in the output buffer (four) than there were code points
in the input buffer (five).
</p>
<p>
Although client programs using HarfBuzz are free to assign
initial cluster values in any manner they choose to, HarfBuzz
does offer some useful guarantees if the cluster values are
assigned in a monotonic (either non-decreasing or non-increasing)
order.
</p>
<p>
For buffers in the left-to-right (LTR)
or top-to-bottom (TTB) text flow direction,
HarfBuzz will preserve the monotonic property: client programs
are guaranteed that monotonically increasing initial cluster
values will be returned as monotonically increasing final
cluster values.
</p>
<p>
For buffers in the right-to-left (RTL)
or bottom-to-top (BTT) text flow direction,
the directionality of the buffer itself is reversed for final
output as a matter of design. Therefore, HarfBuzz inverts the
monotonic property: client programs are guaranteed that
monotonically increasing initial cluster values will be
returned as monotonically <span class="emphasis"><em>decreasing</em></span> final
cluster values.
</p>
<p>
Client programs can adjust how HarfBuzz handles clusters during
shaping by setting the
<code class="literal">cluster_level</code> of the
buffer. HarfBuzz offers three <span class="emphasis"><em>levels</em></span> of
clustering support for this property:
</p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem">
<p><span class="emphasis"><em>Level 0</em></span> is the default.
</p>
<p>
The distinguishing feature of level 0 behavior is that, at
the beginning of processing the buffer, all code points that
are categorized as <span class="emphasis"><em>marks</em></span>,
<span class="emphasis"><em>modifier symbols</em></span>, or
<span class="emphasis"><em>Emoji extended pictographic</em></span> modifiers,
as well as the <span class="emphasis"><em>Zero Width Joiner</em></span> and
<span class="emphasis"><em>Zero Width Non-Joiner</em></span> code points, are
assigned the cluster value of the closest preceding code
point from <span class="emphasis"><em>different</em></span> category.
</p>
<p>
In essence, whenever a base character is followed by a mark
character or a sequence of mark characters, those marks are
reassigned to the same initial cluster value as the base
character. This reassignment is referred to as
"merging" the affected clusters. This behavior is based on
the Grapheme Cluster Boundary specification in <a class="ulink" href="https://www.unicode.org/reports/tr29/#Regex_Definitions" target="_top">Unicode
Technical Report 29</a>.
</p>
<p>
This cluster level is suitable for code that likes to use
HarfBuzz cluster values as an approximation of the Unicode
Grapheme Cluster Boundaries as well.
</p>
<p>
Client programs can specify level 0 behavior for a buffer by
setting its <code class="literal">cluster_level</code> to
<code class="literal">HB_BUFFER_CLUSTER_LEVEL_MONOTONE_GRAPHEMES</code>.
</p>
</li>
<li class="listitem">
<p>
<span class="emphasis"><em>Level 1</em></span> tweaks the old behavior
slightly to produce better results. Therefore, level 1
clustering is recommended for code that is not required to
implement backward compatibility with the old HarfBuzz.
</p>
<p>
<span class="emphasis"><em>Level 1</em></span> differs from level 0 by not merging the
clusters of marks and other modifier code points with the
preceding "base" code point's cluster. By preserving the
separate cluster values of these marks and modifier code
points, script shapers can perform additional operations
that might lead to improved results (for example, coloring
mark glyphs differently than their base).
</p>
<p>
Client programs can specify level 1 behavior for a buffer by
setting its <code class="literal">cluster_level</code> to
<code class="literal">HB_BUFFER_CLUSTER_LEVEL_MONOTONE_CHARACTERS</code>.
</p>
</li>
<li class="listitem">
<p>
<span class="emphasis"><em>Level 2</em></span> differs significantly in how it
treats cluster values. In level 2, HarfBuzz never merges
clusters.
</p>
<p>
This difference can be seen most clearly when HarfBuzz processes
ligature substitutions and glyph decompositions. In level 0
and level 1, ligatures and glyph decomposition both involve
merging clusters; in level 2, neither of these operations
triggers a merge.
</p>
<p>
Client programs can specify level 2 behavior for a buffer by
setting its <code class="literal">cluster_level</code> to
<code class="literal">HB_BUFFER_CLUSTER_LEVEL_CHARACTERS</code>.
</p>
</li>
</ul></div>
<p>
As mentioned earlier, client programs using HarfBuzz often
as#itial cluster values in a buffer by reusing the indices
of the code points in the input text. This gives a sequence of
cluster values that is monotonically increasing (for example,
0,1,2,3,4).
</p>
<p>
It is not <span class="emphasis"><em>required</em></span> that the cluster values
in a buffer be monotonically increasing. However, if the initial
cluster values in a buffer are monotonic and the buffer is
configured to use cluster level 0 or 1, then HarfBuzz
guarantees that the final cluster values in the shaped buffer
will also be monotonic. No such guarantee is made for cluster
level 2.
</p>
<p>
In levels 0 and 1, HarfBuzz implements the following conceptual
model for cluster values:
</p>
<div class="itemizedlist"><ul class="itemizedlist compact" style="list-style-type: disc; ">
<li class="listitem"><p>
If the sequence of input cluster values is monotonic, the
sequence of cluster values will remain monotonic.
</p></li>
<li class="listitem"><p>
Each cluster value represents a single cluster.
</p></li>
<li class="listitem"><p>
Each cluster contains one or more glyphs and one or more
characters.
</p></li>
</ul></div>
<p>
In practice, this model offers several benefits. Assuming that
the initial cluster values were monotonically increasing
and distinct before shaping began, then, in the final output:
</p>
<div class="itemizedlist"><ul class="itemizedlist compact" style="list-style-type: disc; ">
<li class="listitem"><p>
All adjacent glyphs having the same final cluster
value belong to the same cluster.
</p></li>
<li class="listitem"><p>
Each character belongs to the cluster that has the highest
cluster value <span class="emphasis"><em>not larger than</em></span> its
initial cluster value.
</p></li>
</ul></div>
</div>
<div class="footer">
<hr>Generated by GTK-Doc V1.34.0</div>
</body>
</html>