2 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
3 "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
4 <!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'">
5 <!ENTITY version SYSTEM "version.xml">
7 <chapter id="what-is-harfbuzz">
8 <title>What is HarfBuzz?</title>
10 HarfBuzz is a <emphasis>text-shaping engine</emphasis>. If you
11 give HarfBuzz a font and a string containing a sequence of Unicode
12 codepoints, HarfBuzz selects and positions the corresponding
13 glyphs from the font, applying all of the necessary layout rules
14 and font features. HarfBuzz then returns the string to you in the
15 form that is correctly arranged for the language and writing
19 HarfBuzz can properly shape all of the world's major writing
20 systems. It runs on all major operating systems and software
21 platforms and it supports the major font formats in use
24 <section id="what-is-text-shaping">
25 <title>What is text shaping?</title>
27 Text shaping is the process of translating a string of character
28 codes (such as Unicode codepoints) into a properly arranged
29 sequence of glyphs that can be rendered onto a screen or into
30 final output form for inclusion in a document.
33 The shaping process is dependent on the input string, the active
34 font, the script (or writing system) that the string is in, and
35 the language that the string is in.
38 Modern software systems generally only deal with strings in the
39 Unicode encoding scheme (although legacy systems and documents may
40 involve other encodings).
43 There are several font formats that a program might
44 encounter, each of which has a set of standard text-shaping
47 <para>The dominant format is <ulink
48 url="http://www.microsoft.com/typography/otspec/">OpenType</ulink>. The
49 OpenType specification defines a series of <ulink url="https://github.com/n8willis/opentype-shaping-documents">shaping models</ulink> for
50 various scripts from around the world. These shaping models depend on
51 the font incorporating certain features as
52 <emphasis>lookups</emphasis> in its <literal>GSUB</literal>
53 and <literal>GPOS</literal> tables.
56 Alternatively, OpenType fonts can include shaping features for
57 the <ulink url="https://graphite.sil.org/">Graphite</ulink> shaping model.
60 TrueType fonts can also include OpenType shaping
61 features. Alternatively, TrueType fonts can also include <ulink url="https://developer.apple.com/fonts/TrueType-Reference-Manual/RM09/AppendixF.html">Apple
62 Advanced Typography</ulink> (AAT) tables to implement shaping
63 support. AAT fonts are generally only found on macOS and iOS systems.
66 Text strings will usually be tagged with a script and language
67 tag that provide the context needed to perform text shaping
68 correctly. The necessary <ulink
69 url="https://docs.microsoft.com/en-us/typography/opentype/spec/scripttags">script</ulink>
71 url="https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags">language</ulink>
72 tags are defined by OpenType.
76 <section id="why-do-i-need-a-shaping-engine">
77 <title>Why do I need a shaping engine?</title>
79 Text shaping is an integral part of preparing text for
80 display. Before a Unicode sequence can be rendered, the
81 codepoints in the sequence must be mapped to the corresponding
82 glyphs provided in the font, and those glyphs must be positioned
83 correctly relative to each other. For many of the scripts
84 supported in Unicode, these steps involve script-specific layout
85 rules, including complex joining, reordering, and positioning
86 behavior. Implementing these rules is the job of the shaping engine.
89 Text shaping is a fairly low-level operation. HarfBuzz is
90 used directly by text-handling libraries like <ulink
91 url="https://www.pango.org/">Pango</ulink>, as well as by the layout
92 engines in Firefox, LibreOffice, and Chromium. Unless you are
93 <emphasis>writing</emphasis> one of these layout engines
94 yourself, you will probably not need to use HarfBuzz: normally,
95 a layout engine, toolkit, or other library will turn text into
99 However, if you <emphasis>are</emphasis> writing a layout engine
100 or graphics library yourself, then you will need to perform text
101 shaping, and this is where HarfBuzz can help you.
104 Here are some specific scenarios where a text-shaping engine
105 like HarfBuzz helps you:
110 OpenType fonts contain a set of glyphs (that is, shapes
111 to represent the letters, numbers, punctuation marks, and
112 all other symbols), which are indexed by a <literal>glyph ID</literal>.
115 A particular glyph ID within the font does not necessarily
116 correlate to a predictable Unicode codepoint. For instance,
117 some fonts have the letter "a" as glyph ID 1, but
118 many others do not. In order to retrieve the right glyph
119 from the font to display "a", you need to consult
120 the table inside the font (the <literal>cmap</literal>
121 table) that maps Unicode codepoints to glyph IDs. In other
122 words, <emphasis>text shaping turns codepoints into glyph
128 Many OpenType fonts contain ligatures: combinations of
129 characters that are rendered as a single unit. For instance,
130 it is common for the "f, i" letter
131 sequence to appear in print as the single ligature glyph
135 Whether you should render an "f, i" sequence
136 as <literal>fi</literal> or as "fi" does not
137 depend on the input text. Instead, it depends on the whether
138 or not the font includes an "fi" glyph and on the
139 level of ligature application you wish to perform. The font
140 and the amount of ligature application used are under your
141 control. In other words, <emphasis>text shaping involves
142 querying the font's ligature tables and determining what
143 substitutions should be made</emphasis>.
148 While ligatures like "fi" are optional typographic
149 refinements, some languages <emphasis>require</emphasis> certain
150 substitutions to be made in order to display text correctly.
153 For example, in Tamil, when the letter "TTA" (ட)
154 letter is followed by the vowel sign "U" (ு), the pair
155 must be replaced by the single glyph "டு". The
156 sequence of Unicode characters "ட,ு" needs to be
157 substituted with a single "டு" glyph from the
161 But "டு" does not have a Unicode codepoint. To
162 find this glyph, you need to consult the table inside
163 the font (the <literal>GSUB</literal> table) that contains
164 substitution information. In other words, <emphasis>text shaping
165 chooses the correct glyph for a sequence of characters
171 Similarly, each Arabic character has four different variants
172 corresponding to the different positions it might appear in
173 within a sequence. Inside a font, there will be separate
174 glyphs for the initial, medial, final, and isolated forms of
175 each letter, each at a different glyph ID.
178 Unicode only assigns one codepoint per character, so a
179 Unicode string will not tell you which glyph variant to use
180 for each character. To decide, you need to analyze the whole
181 string and determine the appropriate glyph for each character
182 based on its position. In other words, <emphasis>text
183 shaping chooses the correct form of the letter by its
184 position and returns the correct glyph from the font</emphasis>.
189 Other languages involve marks and accents that need to be
190 rendered in specific positions relative a base character. For
191 instance, the Moldovan language includes the Cyrillic letter
192 "zhe" (ж) with a breve accent, like so: "ӂ".
195 Some fonts will provide this character as a single
196 zhe-with-breve glyph, but other fonts will not and, instead,
197 will expect the rendering engine to form the character by
198 superimposing the separate "ж" and "˘"
202 But exactly where you should draw the breve depends on the
203 height and width of the preceding zhe glyph. To find the
204 right position, you need to consult the table inside
205 the font (the <literal>GPOS</literal> table) that contains
206 positioning information.
207 In other words, <emphasis>text shaping tells you whether you
208 have a precomposed glyph within your font or if you need to
209 compose a glyph yourself out of combining marks—and,
210 if so, where to position those marks.</emphasis>
215 If tasks like these are something that you need to do, then you
216 need a text shaping engine. You could use Uniscribe if you are
217 writing Windows software; you could use CoreText on macOS; or
218 you could use HarfBuzz.
222 In the rest of this manual, the text will assume that the reader
223 is that implementor of a text-layout engine.
230 <title>What does HarfBuzz do?</title>
232 HarfBuzz provides text shaping through a cross-platform
233 C API that accepts sequences of Unicode codepoints as input. Currently,
234 the following OpenType shaping models are supported:
239 Indic (covering Devanagari, Bengali, Gujarati,
240 Gurmukhi, Kannada, Malayalam, Oriya, Tamil, Telugu, and
246 Arabic (covering Arabic, N'Ko, Syriac, and Mongolian)
284 The Universal Shaping Engine or <emphasis>USE</emphasis>
285 (covering complex scripts not covered by the above shaping
291 A default shaping model for non-complex scripts
292 (covering Latin, Cyrillic, Greek, Armenian, Georgian, Tifinagh,
298 Emoji (including emoji modifier sequences, flag sequences,
305 In addition to OpenType shaping, HarfBuzz supports the latest
306 version of Graphite shaping (the "Graphite 2" model) and AAT
311 HarfBuzz can read and understand TrueType fonts (.ttf), TrueType
312 collections (.ttc), and OpenType fonts (.otf, including those
313 fonts that contain TrueType-style outlines and those that
314 contain PostScript CFF or CFF2 outlines).
318 HarfBuzz is designed and tested to run on top of the FreeType
319 font renderer. It can run on Linux, Android, Windows, macOS, and
324 In addition to its core shaping functionality, HarfBuzz provides
325 functions for accessing other font features, including optional
326 GSUB and GPOS OpenType features, as well as
327 all color-font formats (<literal>CBDT</literal>,
328 <literal>sbix</literal>, <literal>COLR/CPAL</literal>, and
329 <literal>SVG-OT</literal>) and OpenType variable fonts. HarfBuzz
330 also includes a font-subsetting feature. HarfBuzz can perform
331 some low-level math-shaping operations, although it does not
332 currently perform full shaping for mathematical typesetting.
336 A suite of command-line utilities is also provided in the
337 source-code tree, designed to help users test and debug
338 HarfBuzz's features on real-world fonts and input.
342 <section id="what-harfbuzz-doesnt-do">
343 <title>What HarfBuzz doesn't do</title>
345 HarfBuzz will take a Unicode string, shape it, and give you the
346 information required to lay it out correctly on a single
347 horizontal (or vertical) line using the font provided. That is the
348 extent of HarfBuzz's responsibility.
351 It is important to note that if you are implementing a complete
352 text-layout engine you may have other responsibilities that
353 HarfBuzz will <emphasis>not</emphasis> help you with. For example:
358 HarfBuzz won't help you with bidirectionality. If you want to
359 lay out text that includes a mix of Hebrew and English, you
360 will need to ensure that each buffer provided to HarfBuzz
361 has all of its characters in the same order and that the
362 directionality of the buffer is set correctly. This may mean
363 segmenting the text before it is placed into HarfBuzz buffers. In
364 other words, the user will hit the keys in the following
368 A B C [space] ג ב א [space] D E F
371 but will expect to see in the output:
377 This reordering is called <emphasis>bidi processing</emphasis>
378 ("bidi" is short for bidirectional), and there's an
379 algorithm as an annex to the Unicode Standard which tells you how
380 to process a string of mixed directionality.
381 Before sending your string to HarfBuzz, you may need to apply the
382 bidi algorithm to it. Libraries such as <ulink
383 url="http://icu-project.org/">ICU</ulink> and <ulink
384 url="http://fribidi.org/">fribidi</ulink> can do this for you.
389 HarfBuzz won't help you with text that contains different font
390 properties. For instance, if you have the string "a
391 <emphasis>huge</emphasis> breakfast", and you expect
392 "huge" to be italic, then you will need to send three
393 strings to HarfBuzz: <literal>a</literal>, in your Roman font;
394 <literal>huge</literal> using your italic font; and
395 <literal>breakfast</literal> using your Roman font again.
398 Similarly, if you change the font, font size, script,
399 language, or direction within your string, then you will
400 need to shape each run independently and output them
401 independently. HarfBuzz expects to shape a run of characters
402 that all share the same properties.
407 HarfBuzz won't help you with line breaking, hyphenation, or
408 justification. As mentioned above, HarfBuzz lays out the string
409 along a <emphasis>single line</emphasis> of, notionally,
410 infinite length. If you want to find out where the potential
411 word, sentence and line break points are in your text, you
412 could use the ICU library's break iterator functions.
415 HarfBuzz can tell you how wide a shaped piece of text is, which is
416 useful input to a justification algorithm, but it knows nothing
417 about paragraphs, lines or line lengths. Nor will it adjust the
418 space between words to fit them proportionally into a line.
423 As a layout-engine implementor, HarfBuzz will help you with the
424 interface between your text and your font, and that's something
425 that you'll need—what you then do with the glyphs that your font
426 returns is up to you.
430 <section id="why-is-it-called-harfbuzz">
431 <title>Why is it called HarfBuzz?</title>
433 HarfBuzz began its life as text-shaping code within the FreeType
434 project (and you will see references to the FreeType authors
435 within the source code copyright declarations), but was then
436 extracted out to its own project. This project is maintained by
437 Behdad Esfahbod, who named it HarfBuzz. Originally, it was a
438 shaping engine for OpenType fonts—"HarfBuzz" is
439 the Persian for "open type".