2 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
3 "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
4 <!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'">
5 <!ENTITY version SYSTEM "version.xml">
7 <chapter id="getting-started">
8 <title>Getting started with HarfBuzz</title>
9 <section id="an-overview-of-the-harfbuzz-shaping-api">
10 <title>An overview of the HarfBuzz shaping API</title>
12 The core of the HarfBuzz shaping API is the function
13 <function>hb_shape()</function>. This function takes a font, a
14 buffer containing a string of Unicode codepoints and
15 (optionally) a list of font features as its input. It replaces
16 the codepoints in the buffer with the corresponding glyphs from
17 the font, correctly ordered and positioned, and with any of the
18 optional font features applied.
21 In addition to holding the pre-shaping input (the Unicode
22 codepoints that comprise the input string) and the post-shaping
23 output (the glyphs and positions), a HarfBuzz buffer has several
24 properties that affect shaping. The most important are the
25 text-flow direction (e.g., left-to-right, right-to-left,
26 top-to-bottom, or bottom-to-top), the script tag, and the
31 For input string buffers, flags are available to denote when the
32 buffer represents the beginning or end of a paragraph, to
33 indicate whether or not to visibly render Unicode <literal>Default
34 Ignorable</literal> codepoints, and to modify the cluster-merging
35 behavior for the buffer. For shaped output buffers, the
36 individual X and Y offsets and <literal>advances</literal>
37 (the logical dimensions) of each glyph are
38 accessible. HarfBuzz also flags glyphs as
39 <literal>UNSAFE_TO_BREAK</literal> if breaking the string at
40 that glyph (e.g., in a line-breaking or hyphenation process)
41 would require re-shaping the text.
45 HarfBuzz also provides methods to compare the contents of
46 buffers, join buffers, normalize buffer contents, and handle
47 invalid codepoints, as well as to determine the state of a
48 buffer (e.g., input codepoints or output glyphs). Buffer
49 lifecycles are managed and all buffers are reference-counted.
53 Although the default <function>hb_shape()</function> function is
54 sufficient for most use cases, a variant is also provided that
55 lets you specify which of HarfBuzz's shapers to use on a buffer.
59 HarfBuzz can read TrueType fonts, TrueType collections, OpenType
60 fonts, and OpenType collections. Functions are provided to query
61 font objects about metrics, Unicode coverage, available tables and
62 features, and variation selectors. Individual glyphs can also be
63 queried for metrics, variations, and glyph names. OpenType
64 variable fonts are supported, and HarfBuzz allows you to set
65 variation-axis coordinates on font objects.
69 HarfBuzz provides glue code to integrate with various other
70 libraries, including FreeType, GObject, and CoreText. Support
71 for integrating with Uniscribe and DirectWrite is experimental
76 <section id="terminology">
77 <title>Terminology</title>
82 <?dbfo list-presentation="blocks"?>
87 In text shaping, a <emphasis>script</emphasis> is a
88 writing system: a set of symbols, rules, and conventions
89 that is used to represent a language or multiple
93 In general computing lingo, the word "script" can also
94 be used to mean an executable program (usually one
95 written in a human-readable programming language). For
96 the sake of clarity, HarfBuzz documents will always use
97 more specific terminology when referring to this
98 meaning, such as "Python script" or "shell script." In
99 all other instances, "script" refers to a writing system.
102 For developers using HarfBuzz, it is important to note
103 the distinction between a script and a language. Most
104 scripts are used to write a variety of different
105 languages, and many languages may be written in more
115 In HarfBuzz, a <emphasis>shaper</emphasis> is a
116 handler for a specific script-shaping model. HarfBuzz
117 implements separate shapers for Indic, Arabic, Thai and
118 Lao, Khmer, Myanmar, Tibetan, Hangul, Hebrew, the
119 Universal Shaping Engine (USE), and a default shaper for
120 scripts with no script-specific shaping model.
129 In text shaping, a <emphasis>cluster</emphasis> is a
130 sequence of codepoints that must be treated as an
131 indivisible unit. Clusters can include code-point
132 sequences that form a ligature or base-and-mark
133 sequences. Tracking and preserving clusters is important
134 when shaping operations might separate or reorder
138 HarfBuzz provides three cluster
139 <emphasis>levels</emphasis> that implement different
140 approaches to the problem of preserving clusters during
147 <term>grapheme</term>
150 In linguistics, a <emphasis>grapheme</emphasis> is one
151 of the indivisible units that make up a writing system or
152 script. Often, graphemes are individual symbols (letters,
153 numbers, punctuation marks, logograms, etc.) but,
154 depending on the writing system, a particular grapheme
155 might correspond to a sequence of several Unicode code
159 In practice, HarfBuzz and other text-shaping engines
160 are not generally concerned with graphemes. However, it
161 is important for developers using HarfBuzz to recognize
162 that there is a difference between graphemes and shaping
163 clusters (see above). The two concepts may overlap
164 frequently, but there is no guarantee that they will be
171 <term>syllable</term>
174 In linguistics, a <emphasis>syllable</emphasis> is an
175 a sequence of sounds that makes up a building block of a
176 particular language. Every language has its own set of
177 rules describing what constitutes a valid syllable.
180 For text-shaping purposes, the various definitions of
181 "syllable" are important because script-specific shaping
182 operations may be applied at the syllable level. For
183 example, a reordering rule might specify that a vowel
184 mark be reordered to the beginning of the syllable.
187 Syllables will consist of one or more Unicode code
188 points. The definition of a syllable for a particular
189 writing system might correspond to how HarfBuzz
190 identifies clusters (see above) for the same writing
191 system. However, it is important for developers using
192 HarfBuzz to recognize that there is a difference between
193 syllables and shaping clusters. The two concepts may
194 overlap frequently, but there is no guarantee that they
204 <section id="a-simple-shaping-example">
205 <title>A simple shaping example</title>
208 Below is the simplest HarfBuzz shaping example possible.
210 <orderedlist numeration="arabic">
213 Create a buffer and put your text in it.
217 <programlisting language="C">
218 #include <hb.h>
221 buf = hb_buffer_create();
222 hb_buffer_add_utf8(buf, text, -1, 0, -1);
224 <orderedlist numeration="arabic">
225 <listitem override="2">
227 Set the script, language and direction of the buffer.
231 <programlisting language="C">
232 // If you know the direction, script, and language
233 hb_buffer_set_direction(buf, HB_DIRECTION_LTR);
234 hb_buffer_set_script(buf, HB_SCRIPT_LATIN);
235 hb_buffer_set_language(buf, hb_language_from_string("en", -1));
237 // If you don't know the direction, script, and language
238 hb_buffer_guess_segment_properties(buffer);
240 <orderedlist numeration="arabic">
241 <listitem override="3">
243 Create a face and a font from a font file.
247 <programlisting language="C">
248 hb_blob_t *blob = hb_blob_create_from_file(filename); /* or hb_blob_create_from_file_or_fail() */
249 hb_face_t *face = hb_face_create(blob, 0);
250 hb_font_t *font = hb_font_create(face);
252 <orderedlist numeration="arabic">
253 <listitem override="4">
260 hb_shape(font, buf, NULL, 0);
262 <orderedlist numeration="arabic">
263 <listitem override="5">
265 Get the glyph and position information.
269 <programlisting language="C">
270 unsigned int glyph_count;
271 hb_glyph_info_t *glyph_info = hb_buffer_get_glyph_infos(buf, &glyph_count);
272 hb_glyph_position_t *glyph_pos = hb_buffer_get_glyph_positions(buf, &glyph_count);
274 <orderedlist numeration="arabic">
275 <listitem override="6">
277 Iterate over each glyph.
281 <programlisting language="C">
282 hb_position_t cursor_x = 0;
283 hb_position_t cursor_y = 0;
284 for (unsigned int i = 0; i < glyph_count; i++) {
285 hb_codepoint_t glyphid = glyph_info[i].codepoint;
286 hb_position_t x_offset = glyph_pos[i].x_offset;
287 hb_position_t y_offset = glyph_pos[i].y_offset;
288 hb_position_t x_advance = glyph_pos[i].x_advance;
289 hb_position_t y_advance = glyph_pos[i].y_advance;
290 /* draw_glyph(glyphid, cursor_x + x_offset, cursor_y + y_offset); */
291 cursor_x += x_advance;
292 cursor_y += y_advance;
295 <orderedlist numeration="arabic">
296 <listitem override="7">
302 <programlisting language="C">
303 hb_buffer_destroy(buf);
304 hb_font_destroy(font);
305 hb_face_destroy(face);
306 hb_blob_destroy(blob);
310 This example shows enough to get us started using HarfBuzz. In
311 the sections that follow, we will use the remainder of
312 HarfBuzz's API to refine and extend the example and improve its
313 text-shaping capabilities.