<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
-<title>: HarfBuzz Manual</title>
+<title>Clusters: HarfBuzz Manual</title>
<meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
<link rel="home" href="index.html" title="HarfBuzz Manual">
<link rel="up" href="pt01.html" title="Part I. User's manual">
<link rel="prev" href="using-your-own-font-functions.html" title="Using your own font functions">
-<link rel="next" href="a-clustering-example-for-levels-0-and-1.html" title="A clustering example for levels 0 and 1">
-<meta name="generator" content="GTK-Doc V1.27.1 (XML mode)">
+<link rel="next" href="working-with-harfbuzz-clusters.html" title="Working with HarfBuzz clusters">
+<meta name="generator" content="GTK-Doc V1.25 (XML mode)">
<link rel="stylesheet" href="style.css" type="text/css">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<td><a accesskey="h" href="index.html"><img src="home.png" width="16" height="16" border="0" alt="Home"></a></td>
<td><a accesskey="u" href="pt01.html"><img src="up.png" width="16" height="16" border="0" alt="Up"></a></td>
<td><a accesskey="p" href="using-your-own-font-functions.html"><img src="left.png" width="16" height="16" border="0" alt="Prev"></a></td>
-<td><a accesskey="n" href="a-clustering-example-for-levels-0-and-1.html"><img src="right.png" width="16" height="16" border="0" alt="Next"></a></td>
+<td><a accesskey="n" href="working-with-harfbuzz-clusters.html"><img src="right.png" width="16" height="16" border="0" alt="Next"></a></td>
</tr></table>
<div class="chapter">
-<div class="titlepage"></div>
+<div class="titlepage"><div><div><h2 class="title">
+<a name="clusters"></a>Clusters</h2></div></div></div>
<div class="toc"><dl class="toc">
-<dt><span class="sect1"><a href="clusters.html#clusters">Clusters</a></span></dt>
-<dt><span class="sect1"><a href="a-clustering-example-for-levels-0-and-1.html">A clustering example for levels 0 and 1</a></span></dt>
-<dt><span class="sect1"><a href="reordering-in-levels-0-and-1.html">Reordering in levels 0 and 1</a></span></dt>
-<dt><span class="sect1"><a href="the-distinction-between-levels-0-and-1.html">The distinction between levels 0 and 1</a></span></dt>
-<dt><span class="sect1"><a href="level-2.html">Level 2</a></span></dt>
+<dt><span class="section"><a href="clusters.html#clusters-and-shaping">Clusters and shaping</a></span></dt>
+<dt><span class="section"><a href="working-with-harfbuzz-clusters.html">Working with HarfBuzz clusters</a></span></dt>
+<dt><span class="section"><a href="a-clustering-example-for-levels-0-and-1.html">A clustering example for levels 0 and 1</a></span></dt>
+<dt><span class="section"><a href="reordering-in-levels-0-and-1.html">Reordering in levels 0 and 1</a></span></dt>
+<dt><span class="section"><a href="the-distinction-between-levels-0-and-1.html">The distinction between levels 0 and 1</a></span></dt>
+<dt><span class="section"><a href="level-2.html">Level 2</a></span></dt>
<dd><dl>
-<dt><span class="sect2"><a href="level-2.html#ligatures-with-combining-marks">Ligatures with combining marks</a></span></dt>
-<dt><span class="sect2"><a href="level-2.html#reordering">Reordering</a></span></dt>
+<dt><span class="section"><a href="level-2.html#ligatures-with-combining-marks-in-level-2">Ligatures with combining marks in level 2</a></span></dt>
+<dt><span class="section"><a href="level-2.html#reordering-in-level-2">Reordering in level 2</a></span></dt>
+<dt><span class="section"><a href="level-2.html#other-considerations-in-level-2">Other considerations in level 2</a></span></dt>
</dl></dd>
</dl></div>
-<div class="sect1">
+<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
-<a name="clusters"></a>Clusters</h2></div></div></div>
+<a name="clusters-and-shaping"></a>Clusters and shaping</h2></div></div></div>
<p>
- In shaping text, a <span class="emphasis"><em>cluster</em></span> is a sequence of
- code points that needs to be treated as a single, indivisible unit.
- </p>
+ In text shaping, a <span class="emphasis"><em>cluster</em></span> is a sequence of
+ characters that needs to be treated as a single, indivisible
+ unit. A single letter or symbol can be a cluster of its
+ own. Other clusters correspond to longer subsequences of the
+ input code points — such as a ligature or conjunct form
+ — and require the shaper to ensure that the cluster is not
+ broken during the shaping process.
+ </p>
<p>
- When you add text to a HB buffer, each character is associated with
- a <span class="emphasis"><em>cluster value</em></span>. This is an arbitrary number as
- far as HB is concerned.
- </p>
+ A cluster is distinct from a <span class="emphasis"><em>grapheme</em></span>,
+ which is the smallest unit of meaning in a writing system or
+ script.
+ </p>
<p>
- Most clients will use UTF-8, UTF-16, or UTF-32 indices, but the
- actual number does not matter. Moreover, it is not required for the
- cluster values to be monotonically increasing, but pretty much all
- of HB's tests are performed on monotonically increasing cluster
- numbers. Nevertheless, there is no such assumption in the code
- itself. With that in mind, let's examine what happens with cluster
- values during shaping under each cluster-level.
- </p>
+ The definitions of the two terms are similar. However, clusters
+ are only relevant for script shaping and glyph layout. In
+ contrast, graphemes are a property of the underlying script, and
+ are of interest when client programs implement orthographic
+ or linguistic functionality.
+ </p>
<p>
- HarfBuzz provides three <span class="emphasis"><em>levels</em></span> of clustering
- support. Level 0 is the default behavior and reproduces the behavior
- of the old HarfBuzz library. Level 1 tweaks this behavior slightly
- to produce better results, so level 1 clustering is recommended for
- code that is not required to implement backward compatibility with
- the old HarfBuzz.
- </p>
+ For example, two individual letters are often two separate
+ graphemes. When two letters form a ligature, however, they
+ combine into a single glyph. They are then part of the same
+ cluster and are treated as a unit by the shaping engine —
+ even though the two original, underlying letters remain separate
+ graphemes.
+ </p>
<p>
- Level 2 differs significantly in how it treats cluster values.
- Levels 0 and 1 both process ligatures and glyph decomposition by
- merging clusters; level 2 does not.
- </p>
+ HarfBuzz is concerned with clusters, <span class="emphasis"><em>not</em></span>
+ with graphemes — although client programs using HarfBuzz
+ may still care about graphemes for other reasons from time to time.
+ </p>
<p>
- The conceptual model for what the cluster values mean, in levels 0
- and 1, is this:
- </p>
-<div class="itemizedlist"><ul class="itemizedlist compact" style="list-style-type: disc; ">
+ During the shaping process, there are several shaping operations
+ that may merge adjacent characters (for example, when two code
+ points form a ligature or a conjunct form and are replaced by a
+ single glyph) or split one character into several (for example,
+ when decomposing a code point through the
+ <code class="literal">ccmp</code> feature). Operations like these alter
+ clusters; HarfBuzz tracks the changes to ensure that no clusters
+ get lost or broken during shaping.
+ </p>
+<p>
+ HarfBuzz records cluster information independently from how
+ shaping operations affect the individual glyphs returned in an
+ output buffer. Consequently, a client program using HarfBuzz can
+ utilize the cluster information to implement features such as:
+ </p>
+<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
+<li class="listitem"><p>
+ Correctly positioning the cursor within a shaped text run,
+ even when characters have formed ligatures, composed or
+ decomposed, reordered, or undergone other shaping operations.
+ </p></li>
<li class="listitem"><p>
- the sequence of cluster values will always remain monotone
- </p></li>
+ Correctly highlighting a text selection that includes some,
+ but not all, of the characters in a word.
+ </p></li>
<li class="listitem"><p>
- each value represents a single cluster
- </p></li>
+ Applying text attributes (such as color or underlining) to
+ part, but not all, of a word.
+ </p></li>
<li class="listitem"><p>
- each cluster contains one or more glyphs and one or more
- characters
- </p></li>
+ Generating output document formats (such as PDF) with
+ embedded text that can be fully extracted.
+ </p></li>
+<li class="listitem"><p>
+ Determining the mapping between input characters and output
+ glyphs, such as which glyphs are ligatures.
+ </p></li>
+<li class="listitem"><p>
+ Performing line-breaking, justification, and other
+ line-level or paragraph-level operations that must be done
+ after shaping is complete, but which require examining
+ character-level properties.
+ </p></li>
</ul></div>
-<p>
- Assuming that initial cluster numbers were monotonically increasing
- and distinct, then all adjacent glyphs having the same cluster
- number belong to the same cluster, and all characters belong to the
- cluster that has the highest number not larger than their initial
- cluster number. This will become clearer with an example.
- </p>
</div>
</div>
<div class="footer">
-<hr>Generated by GTK-Doc V1.27.1</div>
+<hr>Generated by GTK-Doc V1.25</div>
</body>
</html>
\ No newline at end of file