<chapter id="what-is-harfbuzz">
<title>What is HarfBuzz?</title>
<para>
- HarfBuzz is a <emphasis>text shaping engine</emphasis>. It solves
- the problem of selecting and positioning glyphs from a font given a
- Unicode string.
+ HarfBuzz is a <emphasis>text shaping engine</emphasis>. If you
+ give HarfBuzz a font and a string containing a sequence of Unicode
+ codepoints, HarfBuzz selects and positions the corresponding
+ glyphs from the font, applying all of the necessary layout rules
+ and font features. HarfBuzz then returns the string to you in the
+ form that is correctly arranged for the language and writing
+ system.
</para>
- <section id="why-do-i-need-it">
- <title>Why do I need it?</title>
+ <para>
+ HarfBuzz can properly shape all of the world's major writing
+ systems. It runs on virtually all operating systems and software
+ platforms, and it supports all of the standard font formats in use
+ today.
+ </para>
+ <section id="why-do-i-need-a-shaping-engine">
+ <title>Why do I need a shaping engine?</title>
<para>
- Text shaping is an integral part of preparing text for display. It
- is a fairly low level operation; HarfBuzz is used directly by
- graphic rendering libraries such as Pango, and the layout engines
- in Firefox, LibreOffice and Chromium. Unless you are
- <emphasis>writing</emphasis> one of these layout engines yourself,
- you will probably not need to use HarfBuzz - normally higher level
- libraries will turn text into glyphs for you.
+ Text shaping is an integral part of preparing text for
+ display. Before a Unicode sequence can be rendered, the
+ codepoints in the sequence must be mapped to the glyphs
+ provided in the font, and the glyphs must be positioned
+ correctly relative to each other. For many of the scripts
+ supported in Unicode, these steps involve script-specific layout
+ rules.
+ </para>
+ <para>
+ Text shaping is a fairly low-level operation. HarfBuzz is
+ used directly by graphic rendering libraries such as Pango, as
+ well as by the layout engines in Firefox, LibreOffice, and
+ Chromium. Unless you are <emphasis>writing</emphasis> one of
+ these layout engines yourself, you will probably not need to use
+ HarfBuzz: normally, lower-level libraries will turn text into
+ glyphs for you.
</para>
<para>
However, if you <emphasis>are</emphasis> writing a layout engine
or graphics library yourself, you will need to perform text
- shaping, and this is where HarfBuzz can help you. Here are some
- reasons why you need it:
+ shaping, and this is where HarfBuzz can help you.
+ </para>
+ <para>
+ Here are some specific scenarios where a text-shaping engine
+ like HarfBuzz helps you:
</para>
<itemizedlist>
<listitem>
<para>
- OpenType fonts contain a set of glyphs, indexed by glyph ID.
- The glyph ID within the font does not necessarily relate to a
- Unicode codepoint. For instance, some fonts have the letter
- "a" as glyph ID 1. To pull the right glyph out of
- the font in order to display it, you need to consult a table
- within the font (the "cmap" table) which maps
- Unicode codepoints to glyph IDs. Text shaping turns codepoints
- into glyph IDs.
+ OpenType fonts contain a set of glyphs (that is, shapes
+ to represent the letters, numbers, punctuation marks, and
+ all other symbols), which are indexed by a <literal>glyph ID</literal>.
+ </para>
+ <para>
+ The glyph ID within the font does not necessarily correlate
+ to a predictable Unicode codepoint. For instance, some fonts
+ have the letter "a" as glyph ID 1, but many others do
+ not. To pull the right glyph out of the font in order to
+ display "a", you need to consult the table inside
+ the font (the <literal>cmap</literal> table) that maps Unicode
+ codepoints to glyph IDs. In other words, <emphasis>text shaping turns
+ codepoints into glyph IDs</emphasis>.
</para>
</listitem>
<listitem>
<para>
Many OpenType fonts contain ligatures: combinations of
- characters which are rendered together. For instance, it's
- common for the <literal>fi</literal> combination to appear in
- print as the single ligature "fi". Whether you should
- render text as <literal>fi</literal> or "fi" does not
- depend on the input text, but on the capabilities of the font
- and the level of ligature application you wish to perform.
- Text shaping involves querying the font's ligature tables and
- determining what substitutions should be made.
+ characters that are rendered as a single unit. For instance,
+ it is common for the <literal>fi</literal> letter
+ combination to appear in print as the single ligature glyph
+ "fi".
+ </para>
+ <para>
+ Whether you should render an "f, i" sequence
+ as <literal>fi</literal> or as "fi" does not
+ depend on the input text. Rather, it depends on the whether
+ or not the font includes an "fi" glyph and on the
+ level of ligature application you wish to perform. The font
+ and the amount of ligature application used are under your
+ control. In other words, <emphasis>text shaping involves
+ querying the font's ligature tables and determining what
+ substitutions should be made</emphasis>.
</para>
</listitem>
<listitem>
<para>
- While ligatures like "fi" are typographic
- refinements, some languages <emphasis>require</emphasis> such
+ While ligatures like "fi" are optional typographic
+ refinements, some languages <emphasis>require</emphasis> certain
substitutions to be made in order to display text correctly.
- In Tamil, when the letter "TTA" (ட) letter is
- followed by "U" (உ), the combination should appear
- as the single glyph "டு". The sequence of Unicode
- characters "டஉ" needs to be rendered as a single
- glyph from the font - text shaping chooses the correct glyph
- from the sequence of characters provided.
+ </para>
+ <para>
+ For example, in Tamil, when the letter "TTA" (ட)
+ letter is followed by "U" (உ), the pair
+ must be replaced by the single glyph "டு". The
+ sequence of Unicode characters "டஉ" needs to be
+ substituted with a single "டு" glyph from the
+ font.
+ </para>
+ <para>
+ But "டு" does not have a Unicode codepoint. To
+ find this glyph, you need to consult the table inside
+ the font (the <literal>GSUB</literal> table) that contains
+ substitution information. In other words, <emphasis>text shaping
+ chooses the correct glyph for a sequence of characters
+ provided</emphasis>.
</para>
</listitem>
<listitem>
<para>
- Similarly, each Arabic character has four different variants:
- within a font, there will be glyphs for the initial, medial,
- final, and isolated forms of each letter. Unicode only encodes
- one codepoint per character, and so a Unicode string will not
- tell you which glyph to use. Text shaping chooses the correct
- form of the letter and returns the correct glyph from the font
- that you need to render.
+ Similarly, each Arabic character has four different variants
+ corresponding to the different positions in might appear in
+ within a sequence. Inside a font, there will be separate
+ glyphs for the initial, medial, final, and isolated forms of
+ each letter, each at a different glyph ID.
+ </para>
+ <para>
+ Unicode only assigns one codepoint per character, so a
+ Unicode string will not tell you which glyph variant to use
+ for each character. To decide, you need to analyze the whole
+ string and determine the appropriate glyph for each character
+ based on its position. In other words, <emphasis>text
+ shaping chooses the correct form of the letter by its
+ position and returns the correct glyph from the font</emphasis>.
</para>
</listitem>
<listitem>
<para>
- Other languages have marks and accents which need to be
- rendered in certain positions around a base character. For
- instance, the Moldovan language has the Cyrillic letter
- "zhe" (ж) with a breve accent, like so: ӂ. Some
- fonts will contain this character as an individual glyph,
- whereas other fonts will not contain a zhe-with-breve glyph
- but expect the rendering engine to form the character by
- overlaying the two glyphs ж and ˘. Where you should draw the
- combining breve depends on the height of the preceding glyph.
- Again, for Arabic, the correct positioning of vowel marks
- depends on the height of the character on which you are
- placing the mark. Text shaping tells you whether you have a
+ Other languages involve marks and accents that need to be
+ rendered in specific positions relative a base character. For
+ instance, the Moldovan language includes the Cyrillic letter
+ "zhe" (ж) with a breve accent, like so: "ӂ".
+ </para>
+ <para>
+ Some fonts will provide this character as a single
+ zhe-with-breve glyph, but other fonts will not and, instead,
+ will expect the rendering engine to form the character by
+ superimposing the separate "ж" and "˘"
+ glyphs.
+ </para>
+ <para>
+ But exactly where you should draw the breve depends on the
+ height and width of the preceding zhe glyph. To find the
+ right position, you need to consult the table inside
+ the font (the <literal>GPOS</literal> table) that contains
+ positioning information.
+ In other words, <emphasis>text shaping tells you whether you have a
precomposed glyph within your font or if you need to compose a
- glyph yourself out of combining marks, and if so, where to
- position those marks.
+ glyph yourself out of combining marks—and, if so, where to
+ position those marks.</emphasis>
</para>
</listitem>
</itemizedlist>
<para>
- If this is something that you need to do, then you need a text
- shaping engine: you could use Uniscribe if you are using Windows;
- you could use CoreText on OS X; or you could use HarfBuzz. In the
- rest of this manual, we are going to assume that you are the
- implementor of a text layout engine.
+ If tasks like these are something that you need to do, then you need a text
+ shaping engine. You could use Uniscribe if you are writing
+ Windows software; you could use CoreText on macOS; or you could
+ use HarfBuzz.
+ </para>
+ <para>
+ In the rest of this manual, we are going to assume that you are the
+ implementor of a text-layout engine.
</para>
</section>
<section id="why-is-it-called-harfbuzz">
<title>Why is it called HarfBuzz?</title>
<para>
- HarfBuzz began its life as text shaping code within the FreeType
- project, (and you will see references to the FreeType authors
- within the source code copyright declarations) but was then
- abstracted out to its own project. This project is maintained by
+ HarfBuzz began its life as text-shaping code within the FreeType
+ project (and you will see references to the FreeType authors
+ within the source code copyright declarations), but was then
+ extracted out to its own project. This project is maintained by
Behdad Esfahbod, and named HarfBuzz. Originally, it was a shaping
engine for OpenType fonts - "HarfBuzz" is the Persian
for "open type".
</para>
</section>
-</chapter>
\ No newline at end of file
+</chapter>