1 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
4 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
5 <title>Working with HarfBuzz clusters: HarfBuzz Manual</title>
6 <meta name="generator" content="DocBook XSL Stylesheets Vsnapshot">
7 <link rel="home" href="index.html" title="HarfBuzz Manual">
8 <link rel="up" href="clusters.html" title="Clusters">
9 <link rel="prev" href="clusters.html" title="Clusters">
10 <link rel="next" href="a-clustering-example-for-levels-0-and-1.html" title="A clustering example for levels 0 and 1">
11 <meta name="generator" content="GTK-Doc V1.29 (XML mode)">
12 <link rel="stylesheet" href="style.css" type="text/css">
14 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
15 <table class="navigation" id="top" width="100%" summary="Navigation header" cellpadding="2" cellspacing="5"><tr valign="middle">
16 <td width="100%" align="left" class="shortcuts"></td>
17 <td><a accesskey="h" href="index.html"><img src="home.png" width="16" height="16" border="0" alt="Home"></a></td>
18 <td><a accesskey="u" href="clusters.html"><img src="up.png" width="16" height="16" border="0" alt="Up"></a></td>
19 <td><a accesskey="p" href="clusters.html"><img src="left.png" width="16" height="16" border="0" alt="Prev"></a></td>
20 <td><a accesskey="n" href="a-clustering-example-for-levels-0-and-1.html"><img src="right.png" width="16" height="16" border="0" alt="Next"></a></td>
23 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
24 <a name="working-with-harfbuzz-clusters"></a>Working with HarfBuzz clusters</h2></div></div></div>
26 When you add text to a HarfBuzz buffer, each code point must be
27 assigned a <span class="emphasis"><em>cluster value</em></span>.
30 This cluster value is an arbitrary number; HarfBuzz uses it only
31 to distinguish between clusters. Many client programs will use
32 the index of each code point in the input text stream as the
33 cluster value. This is for the sake of convenience; the actual
34 value does not matter.
37 Some of the shaping operations performed by HarfBuzz —
38 such as reordering, composition, decomposition, and substitution
39 — may alter the cluster values of some characters. The
40 final cluster values in the buffer at the end of the shaping
41 process will indicate to client programs which subsequences of
42 glyphs represent a cluster and, therefore, must not be
46 In addition, client programs can query the final cluster values
47 to discern other potentially important information about the
48 glyphs in the output buffer (such as whether or not a ligature
52 For example, if the initial sequence of cluster values was:
54 <pre class="programlisting">
58 and the final sequence of cluster values is:
60 <pre class="programlisting">
64 then there are two clusters in the output buffer: the first
65 cluster includes the first two glyphs, and the second cluster
66 includes the third and fourth glyphs. It is also evident that a
67 ligature or conjunct has been formed, because there are fewer
68 glyphs in the output buffer (four) than there were code points
69 in the input buffer (five).
72 Although client programs using HarfBuzz are free to assign
73 initial cluster values in any manner they choose to, HarfBuzz
74 does offer some useful guarantees if the cluster values are
75 assigned in a monotonic (either non-decreasing or non-increasing)
79 For left-to-right scripts (LTR) and top-to-bottom scripts (TTB),
80 HarfBuzz will preserve the monotonic property: client programs
81 are guaranteed that monotonically increasing initial clulster
82 values will be returned as monotonically increasing final
86 For right-to-left scripts (RTL) and bottom-to-top scripts (BTT),
87 the directionality of the buffer itself is reversed for final
88 output as a matter of design. Therefore, HarfBuzz inverts the
89 monotonic property: client programs are guaranteed that
90 monotonically increasing initial clulster values will be
91 returned as monotonically <span class="emphasis"><em>decreasing</em></span> final
95 Client programs can adjust how HarfBuzz handles clusters during
96 shaping by setting the
97 <code class="literal">cluster_level</code> of the
98 buffer. HarfBuzz offers three <span class="emphasis"><em>levels</em></span> of
99 clustering support for this property:
101 <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
102 <li class="listitem">
103 <p><span class="emphasis"><em>Level 0</em></span> is the default and
104 reproduces the behavior of the old HarfBuzz library.
107 The distinguishing feature of level 0 behavior is that, at
108 the beginning of processing the buffer, all code points that
109 are categorized as <span class="emphasis"><em>marks</em></span>,
110 <span class="emphasis"><em>modifier symbols</em></span>, or
111 <span class="emphasis"><em>Emoji extended pictographic</em></span> modifiers,
112 as well as the <span class="emphasis"><em>Zero Width Joiner</em></span> and
113 <span class="emphasis"><em>Zero Width Non-Joiner</em></span> code points, are
114 assigned the cluster value of the closest preceding code
115 point from <span class="emphasis"><em>different</em></span> category.
118 In essence, whenever a base character is followed by a mark
119 character or a sequence of mark characters, those marks are
120 reassigned to the same initial cluster value as the base
121 character. This reassignment is referred to as
122 "merging" the affected clusters. This behavior is based on
123 the Grapheme Cluster Boundary specification in <a class="ulink" href="https://www.unicode.org/reports/tr29/#Regex_Definitions" target="_top">Unicode
124 Technical Report 29</a>.
127 Client programs can specify level 0 behavior for a buffer by
128 setting its <code class="literal">cluster_level</code> to
129 <code class="literal">HB_BUFFER_CLUSTER_LEVEL_MONOTONE_GRAPHEMES</code>.
132 <li class="listitem">
134 <span class="emphasis"><em>Level 1</em></span> tweaks the old behavior
135 slightly to produce better results. Therefore, level 1
136 clustering is recommended for code that is not required to
137 implement backward compatibility with the old HarfBuzz.
140 Level 1 differs from level 0 by not merging the
141 clusters of marks and other modifier code points with the
142 preceding "base" code point's cluster. By preserving the
143 separate cluster values of these marks and modifier code
144 points, script shapers can perform additional operations
145 that might lead to improved results (for example, reordering
146 a sequence of marks).
149 Client programs can specify level 1 behavior for a buffer by
150 setting its <code class="literal">cluster_level</code> to
151 <code class="literal">HB_BUFFER_CLUSTER_LEVEL_MONOTONE_CHARACTERS</code>.
154 <li class="listitem">
156 <span class="emphasis"><em>Level 2</em></span> differs significantly in how it
157 treats cluster values. In level 2, HarfBuzz never merges
161 This difference can be seen most clearly when HarfBuzz processes
162 ligature substitutions and glyph decompositions. In level 0
163 and level 1, ligatures and glyph decomposition both involve
164 merging clusters; in level 2, neither of these operations
168 Client programs can specify level 2 behavior for a buffer by
169 setting its <code class="literal">cluster_level</code> to
170 <code class="literal">HB_BUFFER_CLUSTER_LEVEL_CHARACTERS</code>.
175 As mentioned earlier, client programs using HarfBuzz often
176 assign initial cluster values in a buffer by reusing the indices
177 of the code points in the input text. This gives a sequence of
178 cluster values that is monotonically increasing (for example,
182 It is not <span class="emphasis"><em>required</em></span> that the cluster values
183 in a buffer be monotonically increasing. However, if the initial
184 cluster values in a buffer are monotonic and the buffer is
185 configured to use cluster level 0 or 1, then HarfBuzz
186 guarantees that the final cluster values in the shaped buffer
187 will also be monotonic. No such guarantee is made for cluster
191 In levels 0 and 1, HarfBuzz implements the following conceptual
192 model for cluster values:
194 <div class="itemizedlist"><ul class="itemizedlist compact" style="list-style-type: disc; ">
195 <li class="listitem"><p>
196 If the sequence of input cluster values is monotonic, the
197 sequence of cluster values will remain monotonic.
199 <li class="listitem"><p>
200 Each cluster value represents a single cluster.
202 <li class="listitem"><p>
203 Each cluster contains one or more glyphs and one or more
208 In practice, this model offers several benefits. Assuming that
209 the initial cluster values were monotonically increasing
210 and distinct before shaping began, then, in the final output:
212 <div class="itemizedlist"><ul class="itemizedlist compact" style="list-style-type: disc; ">
213 <li class="listitem"><p>
214 All adjacent glyphs having the same final cluster
215 value belong to the same cluster.
217 <li class="listitem"><p>
218 Each character belongs to the cluster that has the highest
219 cluster value <span class="emphasis"><em>not larger than</em></span> its
220 initial cluster value.
225 <hr>Generated by GTK-Doc V1.29</div>