doc/libunistring.info

   1 This is libunistring.info, produced by makeinfo version 5.1 from
   2 libunistring.texi.
   3
   4 INFO-DIR-SECTION Software development
   5 START-INFO-DIR-ENTRY
   6 * GNU libunistring: (libunistring).     Unicode string library.
   7 END-INFO-DIR-ENTRY
   8
   9    This manual is for GNU libunistring.
  10
  11 \1f
  12 File: libunistring.info,  Node: Top,  Next: Introduction,  Up: (dir)
  13
  14 GNU libunistring
  15 ****************
  16
  17 * Menu:
  18
  19 * Introduction::                Who may need Unicode strings?
  20 * Conventions::                 Conventions used in this manual
  21 * unitypes.h::                  Elementary types
  22 * unistr.h::                    Elementary Unicode string functions
  23 * uniconv.h::                   Conversions between Unicode and encodings
  24 * unistdio.h::                  Output with Unicode strings
  25 * uniname.h::                   Names of Unicode characters
  26 * unictype.h::                  Unicode character classification and properties
  27 * uniwidth.h::                  Display width
  28 * unigbrk.h::                   Grapheme cluster breaking
  29 * uniwbrk.h::                   Word breaks in strings
  30 * unilbrk.h::                   Line breaking
  31 * uninorm.h::                   Normalization forms
  32 * unicase.h::                   Case mappings
  33 * uniregex.h::                  Regular expressions
  34 * Using the library::           How to link with the library and use it?
  35 * More functionality::          More advanced functionality
  36 * Licenses::                    Licenses
  37
  38 * Index::                       General Index
  39
  40  — The Detailed Node Listing —
  41
  42 Introduction
  43
  44 * Unicode::                     What is Unicode?
  45 * Unicode and i18n::            Unicode and internationalization
  46 * Locale encodings::            What is a locale encoding?
  47 * In-memory representation::    How to represent strings in memory?
  48 * char * strings::              What to keep in mind with ‘char *’ strings
  49 * The wchar_t mess::            Why ‘wchar_t *’ strings are useless
  50 * Unicode strings::             How are Unicode strings represented?
  51
  52 unistr.h
  53
  54 * Elementary string checks::
  55 * Elementary string conversions::
  56 * Elementary string functions::
  57 * Elementary string functions with memory allocation::
  58 * Elementary string functions on NUL terminated strings::
  59
  60 unictype.h
  61
  62 * General category::
  63 * Canonical combining class::
  64 * Bidi class::
  65 * Decimal digit value::
  66 * Digit value::
  67 * Numeric value::
  68 * Mirrored character::
  69 * Arabic shaping::
  70 * Properties::
  71 * Scripts::
  72 * Blocks::
  73 * ISO C and Java syntax::
  74 * Classifications like in ISO C::
  75
  76 General category
  77
  78 * Object oriented API::
  79 * Bit mask API::
  80
  81 Properties
  82
  83 * Properties as objects::
  84 * Properties as functions::
  85
  86 unigbrk.h
  87
  88 * Grapheme cluster breaks in a string::
  89 * Grapheme cluster break property::
  90
  91 uniwbrk.h
  92
  93 * Word breaks in a string::
  94 * Word break property::
  95
  96 uninorm.h
  97
  98 * Decomposition of characters::
  99 * Composition of characters::
 100 * Normalization of strings::
 101 * Normalizing comparisons::
 102 * Normalization of streams::
 103
 104 unicase,h
 105
 106 * Case mappings of characters::
 107 * Case mappings of strings::
 108 * Case mappings of substrings::
 109 * Case insensitive comparison::
 110 * Case detection::
 111
 112 Using the library
 113
 114 * Installation::
 115 * Compiler options::
 116 * Include files::
 117 * Autoconf macro::
 118 * Reporting problems::
 119
 120 Licenses
 121
 122 * GNU GPL::                     GNU General Public License
 123 * GNU LGPL::                    GNU Lesser General Public License
 124 * GNU FDL::                     GNU Free Documentation License
 125
 126
 127 \1f
 128 File: libunistring.info,  Node: Introduction,  Next: Conventions,  Prev: Top,  Up: Top
 129
 130 1 Introduction
 131 **************
 132
 133    This library provides functions for manipulating Unicode strings and
 134 for manipulating C strings according to the Unicode standard.
 135
 136    It consists of the following parts:
 137
 138 ‘<unistr.h>’
 139      elementary string functions
 140 ‘<uniconv.h>’
 141      conversion from/to legacy encodings
 142 ‘<unistdio.h>’
 143      formatted output to strings
 144 ‘<uniname.h>’
 145      character names
 146 ‘<unictype.h>’
 147      character classification and properties
 148 ‘<uniwidth.h>’
 149      string width when using nonproportional fonts
 150 ‘<unigbrk.h>’
 151      grapheme cluster breaks
 152 ‘<uniwbrk.h>’
 153      word breaks
 154 ‘<unilbrk.h>’
 155      line breaking algorithm
 156 ‘<uninorm.h>’
 157      normalization (composition and decomposition)
 158 ‘<unicase.h>’
 159      case folding
 160 ‘<uniregex.h>’
 161      regular expressions (not yet implemented)
 162
 163    libunistring is for you if your application involves non-trivial text
 164 processing, such as upper/lower case conversions, line breaking,
 165 operations on words, or more advanced analysis of text.  Text provided
 166 by the user can, in general, contain characters of all kinds of scripts.
 167 The text processing functions provided by this library handle all
 168 scripts and all languages.
 169
 170    libunistring is for you if your application already uses the ISO C /
 171 POSIX ‘<ctype.h>’, ‘<wctype.h>’ functions and the text it operates on is
 172 provided by the user and can be in any language.
 173
 174    libunistring is also for you if your application uses Unicode strings
 175 as internal in-memory representation.
 176
 177 * Menu:
 178
 179 * Unicode::                     What is Unicode?
 180 * Unicode and i18n::            Unicode and internationalization
 181 * Locale encodings::            What is a locale encoding?
 182 * In-memory representation::    How to represent strings in memory?
 183 * char * strings::              What to keep in mind with ‘char *’ strings
 184 * The wchar_t mess::            Why ‘wchar_t *’ strings are useless
 185 * Unicode strings::             How are Unicode strings represented?
 186
 187 \1f
 188 File: libunistring.info,  Node: Unicode,  Next: Unicode and i18n,  Up: Introduction
 189
 190 1.1 Unicode
 191 ===========
 192
 193    Unicode is a standardized repertoire of characters that contains
 194 characters from all scripts of the world, from Latin letters to Chinese
 195 ideographs and Babylonian cuneiform glyphs.  It also specifies how these
 196 characters are to be rendered on a screen or on paper, and how common
 197 text processing (word selection, line breaking, uppercasing of page
 198 titles etc.)  is supposed to behave on Unicode text.
 199
 200    Unicode also specifies three ways of storing sequences of Unicode
 201 characters in a computer whose basic unit of data is an 8-bit byte:
 202 UTF-8
 203      Every character is represented as 1 to 4 bytes.
 204 UTF-16
 205      Every character is represented as 1 to 2 units of 16 bits.
 206 UTF-32, a.k.a. UCS-4
 207      Every character is represented as 1 unit of 32 bits.
 208
 209    For encoding Unicode text in a file, UTF-8 is usually used.  For
 210 encoding Unicode strings in memory for a program, either of the three
 211 encoding forms can be reasonably used.
 212
 213    Unicode is widely used on the web.  Prior to the use of Unicode, web
 214 pages were in many different encodings (ISO-8859-1 for English, French,
 215 Spanish, ISO-8859-2 for Polish, ISO-8859-7 for Greek, KOI8-R for
 216 Russian, GB2312 or BIG5 for Chinese, ISO-2022-JP-2 or EUC-JP or
 217 Shift_JIS for Japanese, and many many others).  It was next to
 218 impossible to create a document that contained Chinese and Polish text
 219 in the same document.  Due to the many encodings for Japanese, even the
 220 processing of pure Japanese text was error prone.
 221
 222    References:
 223    • The Unicode standard: <http://www.unicode.org/>
 224    • Definition of UTF-8: <http://www.rfc-editor.org/rfc/rfc3629.txt>
 225    • Definition of UTF-16: <http://www.rfc-editor.org/rfc/rfc2781.txt>
 226    • Markus Kuhn’s UTF-8 and Unicode FAQ:
 227      <http://www.cl.cam.ac.uk/~mgk25/unicode.html>
 228
 229 \1f
 230 File: libunistring.info,  Node: Unicode and i18n,  Next: Locale encodings,  Prev: Unicode,  Up: Introduction
 231
 232 1.2 Unicode and Internationalization
 233 ====================================
 234
 235    Internationalization is the process of changing the source code of a
 236 program so that it can meet the expectations of users in any culture, if
 237 culture specific data (translations, images etc.)  are provided.
 238
 239    Use of Unicode is not strictly required for internationalization, but
 240 it makes internationalization much easier, because operations that need
 241 to look at specific characters (like hyphenation, spell checking, or the
 242 automatic conversion of double-quotes to opening and closing
 243 double-quote characters) don’t need to consider multiple possible
 244 encodings of the text.
 245
 246    Use of Unicode also enables multilingualization: the ability of
 247 having text in multiple languages present in the same document or even
 248 in the same line of text.
 249
 250    But use of Unicode is not everything.  Internationalization usually
 251 consists of three features:
 252    • Use of Unicode where needed for text processing.  This is what this
 253      library is for.
 254    • Use of message catalogs for messages shown to the user, This is
 255      what GNU gettext is about.
 256    • Use of locale specific conventions for date and time formats, for
 257      numeric formatting, or for sorting of text.  This can be done
 258      adequately with the POSIX APIs and the implementation of locales in
 259      the GNU C library.
 260
 261 \1f
 262 File: libunistring.info,  Node: Locale encodings,  Next: In-memory representation,  Prev: Unicode and i18n,  Up: Introduction
 263
 264 1.3 Locale encodings
 265 ====================
 266
 267    A locale is a set of cultural conventions.  According to POSIX, for a
 268 program, at any moment, there is one locale being designated as the
 269 “current locale”.  (Actually, POSIX supports also one locale per thread,
 270 but this feature is not yet universally implemented and not widely
 271 used.)  The locale is partitioned into several aspects, called the
 272 “categories” of the locale.  The main various aspects are:
 273    • The character encoding and the character properties.  This is the
 274      ‘LC_CTYPE’ category.
 275    • The sorting rules for text.  This is the ‘LC_COLLATE’ category.
 276    • The language specific translations of messages.  This is the
 277      ‘LC_MESSAGES’ category.
 278    • The formatting rules for numbers, such as the decimal separator.
 279      This is the ‘LC_NUMERIC’ category.
 280    • The formatting rules for amounts of money.  This is the
 281      ‘LC_MONETARY’ category.
 282    • The formatting of date and time.  This is the ‘LC_TIME’ category.
 283
 284    In particular, the ‘LC_CTYPE’ category of the current locale
 285 determines the character encoding.  This is the encoding of ‘char *’
 286 strings.  We also call it the “locale encoding”.  GNU libunistring has a
 287 function, ‘locale_charset’, that returns a standardized (platform
 288 independent) name for this encoding.
 289
 290    All locale encodings used on glibc systems are essentially ASCII
 291 compatible: Most graphic ASCII characters have the same representation,
 292 as a single byte, in that encoding as in ASCII.
 293
 294    Among the possible locale encodings are UTF-8 and GB18030.  Both
 295 allow to represent any Unicode character as a sequence of bytes.  UTF-8
 296 is used in most of the world, whereas GB18030 is used in the People’s
 297 Republic of China, because it is backward compatible with the GB2312
 298 encoding that was used in this country earlier.
 299
 300    The legacy locale encodings, ISO-8859-15 (which supplanted ISO-8859-1
 301 in most of Europe), ISO-8859-2, KOI8-R, EUC-JP, etc., are still in use
 302 in many places, though.
 303
 304    UTF-16 and UTF-32 are not used as locale encodings, because they are
 305 not ASCII compatible.
 306
 307 \1f
 308 File: libunistring.info,  Node: In-memory representation,  Next: char * strings,  Prev: Locale encodings,  Up: Introduction
 309
 310 1.4 Choice of in-memory representation of strings
 311 =================================================
 312
 313    There are three ways of representing strings in memory of a running
 314 program.
 315    • As ‘char *’ strings.  Such strings are represented in locale
 316      encoding.  This approach is employed when not much text processing
 317      is done by the program.  When some Unicode aware processing is to
 318      be done, a string is converted to Unicode on the fly and back to
 319      locale encoding afterwards.
 320    • As UTF-8 or UTF-16 or UTF-32 strings.  This implies that conversion
 321      from locale encoding to Unicode is performed on input, and in the
 322      opposite direction on output.  This approach is employed when the
 323      program does a significant amount of text processing, or when the
 324      program has multiple threads operating on the same data but in
 325      different locales.
 326    • As ‘wchar_t *’, a.k.a.  “wide strings”.  This approach is
 327      misguided, see *note The wchar_t mess::.
 328
 329 \1f
 330 File: libunistring.info,  Node: char * strings,  Next: The wchar_t mess,  Prev: In-memory representation,  Up: Introduction
 331
 332 1.5 ‘char *’ strings
 333 ====================
 334
 335    The classical C strings, with its C library support standardized by
 336 ISO C and POSIX, can be used in internationalized programs with some
 337 precautions.  The problem with this API is that many of the C library
 338 functions for strings don’t work correctly on strings in locale
 339 encodings, leading to bugs that only people in some cultures of the
 340 world will experience.
 341
 342    The first problem with the C library API is the support of multibyte
 343 locales.  According to the locale encoding, in general, every character
 344 is represented by one or more bytes (up to 4 bytes in practice — but use
 345 ‘MB_LEN_MAX’ instead of the number 4 in the code).  When every character
 346 is represented by only 1 byte, we speak of an “unibyte locale”,
 347 otherwise of a “multibyte locale”.  It is important to realize that the
 348 majority of Unix installations nowadays use UTF-8 or GB18030 as locale
 349 encoding; therefore, the majority of users are using multibyte locales.
 350
 351    The important fact to remember is:
 352    _A ‘char’ is a byte, not a character._
 353
 354    As a consequence:
 355    • The ‘<ctype.h>’ API is useless in this context; it does not work in
 356      multibyte locales.
 357    • The ‘strlen’ function does not return the number of characters in a
 358      string.  Nor does it return the number of screen columns occupied
 359      by a string after it is output.  It merely returns the number of
 360      _bytes_ occupied by a string.
 361    • Truncating a string, for example, with ‘strncpy’, can have the
 362      effect of truncating it in the middle of a multibyte character.
 363      Such a string will, when output, have a garbled character at its
 364      end, often represented by a hollow box.
 365    • ‘strchr’ and ‘strrchr’ do not work with multibyte strings if the
 366      locale encoding is GB18030 and the character to be searched is a
 367      digit.
 368    • ‘strstr’ does not work with multibyte strings if the locale
 369      encoding is different from UTF-8.
 370    • ‘strcspn’, ‘strpbrk’, ‘strspn’ cannot work correctly in multibyte
 371      locales: they assume the second argument is a list of single-byte
 372      characters.  Even in this simple case, they do not work with
 373      multibyte strings if the locale encoding is GB18030 and one of the
 374      characters to be searched is a digit.
 375    • ‘strsep’ and ‘strtok_r’ do not work with multibyte strings unless
 376      all of the delimiter characters are ASCII characters < 0x30.
 377    • The ‘strcasecmp’, ‘strncasecmp’, and ‘strcasestr’ functions do not
 378      work with multibyte strings.
 379
 380    The workarounds can be found in GNU gnulib
 381 <http://www.gnu.org/software/gnulib/>.
 382    • gnulib has modules ‘mbchar’, ‘mbiter’, ‘mbuiter’ that represent
 383      multibyte characters and allow to iterate across a multibyte string
 384      with the same ease as through a unibyte string.
 385    • gnulib has functions ‘mbslen’ and ‘mbswidth’ that can be used
 386      instead of ‘strlen’ when the number of characters or the number of
 387      screen columns of a string is requested.
 388    • gnulib has functions ‘mbschr’ and ‘mbsrrchr’ that are like ‘strchr’
 389      and ‘strrchr’, but work in multibyte locales.
 390    • gnulib has a function ‘mbsstr’, like ‘strstr’, but works in
 391      multibyte locales.
 392    • gnulib has functions ‘mbscspn’, ‘mbspbrk’, ‘mbsspn’ that are like
 393      ‘strcspn’, ‘strpbrk’, ‘strspn’, but work in multibyte locales.
 394    • gnulib has functions ‘mbssep’ and ‘mbstok_r’ that are like ‘strsep’
 395      and ‘strtok_r’ but work in multibyte locales.
 396    • gnulib has functions ‘mbscasecmp’, ‘mbsncasecmp’, ‘mbspcasecmp’,
 397      and ‘mbscasestr’ that are like ‘strcasecmp’, ‘strncasecmp’, and
 398      ‘strcasestr’, but work in multibyte locales.  Still, the function
 399      ‘ulc_casecmp’ is preferable to these functions; see below.
 400
 401    The second problem with the C library API is that it has some
 402 assumptions built-in that are not valid in some languages:
 403    • It assumes that there are only two forms of every character:
 404      uppercase and lowercase.  This is not true for Croatian, where the
 405      character LETTER DZ WITH CARON comes in three forms: LATIN CAPITAL
 406      LETTER DZ WITH CARON (DZ), LATIN CAPITAL LETTER D WITH SMALL LETTER
 407      Z WITH CARON (Dz), LATIN SMALL LETTER DZ WITH CARON (dz).
 408    • It assumes that uppercasing of 1 character leads to 1 character.
 409      This is not true for German, where the LATIN SMALL LETTER SHARP S,
 410      when uppercased, becomes ‘SS’.
 411    • It assumes that there is 1:1 mapping between uppercase and
 412      lowercase forms.  This is not true for the Greek sigma: GREEK
 413      CAPITAL LETTER SIGMA is the uppercase of both GREEK SMALL LETTER
 414      SIGMA and GREEK SMALL LETTER FINAL SIGMA.
 415    • It assumes that the upper/lowercase mappings are position
 416      independent.  This is not true for the Greek sigma and the
 417      Lithuanian i.
 418
 419    The correct way to deal with this problem is
 420   1. to provide functions for titlecasing, as well as for upper- and
 421      lowercasing,
 422   2. to view case transformations as functions that operates on strings,
 423      rather than on characters.
 424
 425    This is implemented in this library, through the functions declared
 426 in ‘<unicase.h>’, see *note unicase.h::.
 427
 428 \1f
 429 File: libunistring.info,  Node: The wchar_t mess,  Next: Unicode strings,  Prev: char * strings,  Up: Introduction
 430
 431 1.6 The ‘wchar_t’ mess
 432 ======================
 433
 434    The ISO C and POSIX standard creators made an attempt to fix the
 435 first problem mentioned in the previous section.  They introduced
 436    • a type ‘wchar_t’, designed to encapsulate an entire character,
 437    • a “wide string” type ‘wchar_t *’, and
 438    • functions declared in ‘<wctype.h>’ that were meant to supplant the
 439      ones in ‘<ctype.h>’.
 440
 441    Unfortunately, this API and its implementation has numerous problems:
 442
 443    • On AIX and Windows platforms, ‘wchar_t’ is a 16-bit type.  This
 444      means that it can never accommodate an entire Unicode character.
 445      Either the ‘wchar_t *’ strings are limited to characters in UCS-2
 446      (the “Basic Multilingual Plane” of Unicode), or — if ‘wchar_t *’
 447      strings are encoded in UTF-16 — a ‘wchar_t’ represents only half of
 448      a character in the worst case, making the ‘<wctype.h>’ functions
 449      pointless.
 450
 451    • On Solaris and FreeBSD, the ‘wchar_t’ encoding is locale dependent
 452      and undocumented.  This means, if you want to know any property of
 453      a ‘wchar_t’ character, other than the properties defined by
 454      ‘<wctype.h>’ — such as whether it’s a dash, currency symbol,
 455      paragraph separator, or similar —, you have to convert it to ‘char
 456      *’ encoding first, by use of the function ‘wctomb’.
 457
 458    • When you read a stream of wide characters, through the functions
 459      ‘fgetwc’ and ‘fgetws’, and when the input stream/file is not in the
 460      expected encoding, you have no way to determine the invalid byte
 461      sequence and do some corrective action.  If you use these
 462      functions, your program becomes “garbage in - more garbage out” or
 463      “garbage in - abort”.
 464
 465    As a consequence, it is better to use multibyte strings, as explained
 466 in the previous section.  Such multibyte strings can bypass limitations
 467 of the ‘wchar_t’ type, if you use functions defined in gnulib and
 468 libunistring for text processing.  They can also faithfully transport
 469 malformed characters that were present in the input, without requiring
 470 the program to produce garbage or abort.
 471
 472 \1f
 473 File: libunistring.info,  Node: Unicode strings,  Prev: The wchar_t mess,  Up: Introduction
 474
 475 1.7 Unicode strings
 476 ===================
 477
 478    libunistring supports Unicode strings in three representations:
 479    • UTF-8 strings, through the type ‘uint8_t *’.  The units are bytes
 480      (‘uint8_t’).
 481    • UTF-16 strings, through the type ‘uint16_t *’, The units are 16-bit
 482      memory words (‘uint16_t’).
 483    • UTF-32 strings, through the type ‘uint32_t *’.  The units are
 484      32-bit memory words (‘uint32_t’).
 485
 486    As with C strings, there are two variants:
 487    • Unicode strings with a terminating NUL character are represented as
 488      a pointer to the first unit of the string.  There is a unit
 489      containing a 0 value at the end.  It is considered part of the
 490      string for all memory allocation purposes, but is not considered
 491      part of the string for all other logical purposes.
 492    • Unicode strings where embedded NUL characters are allowed.  These
 493      are represented by a pointer to the first unit and the number of
 494      units (not bytes!)  of the string.  In this setting, there is no
 495      trailing zero-valued unit used as “end marker”.
 496
 497 \1f
 498 File: libunistring.info,  Node: Conventions,  Next: unitypes.h,  Prev: Introduction,  Up: Top
 499
 500 2 Conventions
 501 *************
 502
 503    This chapter explains conventions valid throughout the libunistring
 504 library.
 505
 506    Variables of type ‘char *’ denote C strings in locale encoding.  See
 507 *note Locale encodings::.
 508
 509    Variables of type ‘uint8_t *’ denote UTF-8 strings.  Their units are
 510 bytes.
 511
 512    Variables of type ‘uint16_t *’ denote UTF-16 strings, without byte
 513 order mark.  Their units are 2-byte words.
 514
 515    Variables of type ‘uint32_t *’ denote UTF-32 strings, without byte
 516 order mark.  Their units are 4-byte words.
 517
 518    Argument pairs ‘(S, N)’ denote a string ‘S[0..N-1]’ with exactly N
 519 units.
 520
 521    All functions with prefix ‘ulc_’ operate on C strings in locale
 522 encoding.
 523
 524    All functions with prefix ‘u8_’ operate on UTF-8 strings.
 525
 526    All functions with prefix ‘u16_’ operate on UTF-16 strings.
 527
 528    All functions with prefix ‘u32_’ operate on UTF-32 strings.
 529
 530    For every function with prefix ‘u8_’, operating on UTF-8 strings,
 531 there is also a corresponding function with prefix ‘u16_’, operating on
 532 UTF-16 strings, and a corresponding function with prefix ‘u32_’,
 533 operating on UTF-32 strings.  Their description is analogous; in this
 534 documentation we describe only the function that operates on UTF-8
 535 strings, for brevity.
 536
 537    A declaration with a variable N denotes the three concrete
 538 declarations with N = 8, N = 16, N = 32.
 539
 540    All parameters starting with ‘str’ and the parameters of functions
 541 starting with ‘u8_str’/‘u16_str’/‘u32_str’ denote a NUL terminated
 542 string.
 543
 544    Error values are always returned through the ‘errno’ variable,
 545 usually with a return value that indicates the presence of an error
 546 (NULL for functions that return an pointer, or -1 for functions that
 547 return an ‘int’).
 548
 549    Functions returning a string result take a ‘(RESULTBUF, LENGTHP)’
 550 argument pair.  If RESULTBUF is not NULL and the result fits into
 551 ‘*LENGTHP’ units, it is put in RESULTBUF, and RESULTBUF is returned.
 552 Otherwise, a freshly allocated string is returned.  In both cases,
 553 ‘*LENGTHP’ is set to the length (number of units) of the returned
 554 string.  In case of error, NULL is returned and ‘errno’ is set.
 555
 556 \1f
 557 File: libunistring.info,  Node: unitypes.h,  Next: unistr.h,  Prev: Conventions,  Up: Top
 558
 559 3 Elementary types ‘<unitypes.h>’
 560 *********************************
 561
 562    The include file ‘<unitypes.h>’ provides the following basic types.
 563
 564  -- Type: uint8_t
 565  -- Type: uint16_t
 566  -- Type: uint32_t
 567      These are the storage units of UTF-8/16/32 strings, respectively.
 568      The definitions are taken from ‘<stdint.h>’, on platforms where
 569      this include file is present.
 570
 571  -- Type: ucs4_t
 572      This type represents a single Unicode character, outside of an
 573      UTF-32 string.
 574
 575 \1f
 576 File: libunistring.info,  Node: unistr.h,  Next: uniconv.h,  Prev: unitypes.h,  Up: Top
 577
 578 4 Elementary Unicode string functions ‘<unistr.h>’
 579 **************************************************
 580
 581    This include file declares elementary functions for Unicode strings.
 582 It is essentially the equivalent of what ‘<string.h>’ is for C strings.
 583
 584 * Menu:
 585
 586 * Elementary string checks::
 587 * Elementary string conversions::
 588 * Elementary string functions::
 589 * Elementary string functions with memory allocation::
 590 * Elementary string functions on NUL terminated strings::
 591
 592 \1f
 593 File: libunistring.info,  Node: Elementary string checks,  Next: Elementary string conversions,  Up: unistr.h
 594
 595 4.1 Elementary string checks
 596 ============================
 597
 598    The following function is available to verify the integrity of a
 599 Unicode string.
 600
 601  -- Function: const uint8_t * u8_check (const uint8_t *S, size_t N)
 602  -- Function: const uint16_t * u16_check (const uint16_t *S, size_t N)
 603  -- Function: const uint32_t * u32_check (const uint32_t *S, size_t N)
 604      This function checks whether a Unicode string is well-formed.  It
 605      returns NULL if valid, or a pointer to the first invalid unit
 606      otherwise.
 607
 608 \1f
 609 File: libunistring.info,  Node: Elementary string conversions,  Next: Elementary string functions,  Prev: Elementary string checks,  Up: unistr.h
 610
 611 4.2 Elementary string conversions
 612 =================================
 613
 614    The following functions perform conversions between the different
 615 forms of Unicode strings.
 616
 617  -- Function: uint16_t * u8_to_u16 (const uint8_t *S, size_t N, uint16_t
 618           *RESULTBUF, size_t *LENGTHP)
 619      Converts an UTF-8 string to an UTF-16 string.
 620
 621  -- Function: uint32_t * u8_to_u32 (const uint8_t *S, size_t N, uint32_t
 622           *RESULTBUF, size_t *LENGTHP)
 623      Converts an UTF-8 string to an UTF-32 string.
 624
 625  -- Function: uint8_t * u16_to_u8 (const uint16_t *S, size_t N, uint8_t
 626           *RESULTBUF, size_t *LENGTHP)
 627      Converts an UTF-16 string to an UTF-8 string.
 628
 629  -- Function: uint32_t * u16_to_u32 (const uint16_t *S, size_t N,
 630           uint32_t *RESULTBUF, size_t *LENGTHP)
 631      Converts an UTF-16 string to an UTF-32 string.
 632
 633  -- Function: uint8_t * u32_to_u8 (const uint32_t *S, size_t N, uint8_t
 634           *RESULTBUF, size_t *LENGTHP)
 635      Converts an UTF-32 string to an UTF-8 string.
 636
 637  -- Function: uint16_t * u32_to_u16 (const uint32_t *S, size_t N,
 638           uint16_t *RESULTBUF, size_t *LENGTHP)
 639      Converts an UTF-32 string to an UTF-16 string.
 640
 641 \1f
 642 File: libunistring.info,  Node: Elementary string functions,  Next: Elementary string functions with memory allocation,  Prev: Elementary string conversions,  Up: unistr.h
 643
 644 4.3 Elementary string functions
 645 ===============================
 646
 647    The following functions inspect and return details about the first
 648 character in a Unicode string.
 649
 650  -- Function: int u8_mblen (const uint8_t *S, size_t N)
 651  -- Function: int u16_mblen (const uint16_t *S, size_t N)
 652  -- Function: int u32_mblen (const uint32_t *S, size_t N)
 653      Returns the length (number of units) of the first character in S,
 654      which is no longer than N.  Returns 0 if it is the NUL character.
 655      Returns -1 upon failure.
 656
 657      This function is similar to ‘mblen’, except that it operates on a
 658      Unicode string and that S must not be NULL.
 659
 660  -- Function: int u8_mbtouc_unsafe (ucs4_t *PUC, const uint8_t *S,
 661           size_t N)
 662  -- Function: int u16_mbtouc_unsafe (ucs4_t *PUC, const uint16_t *S,
 663           size_t N)
 664  -- Function: int u32_mbtouc_unsafe (ucs4_t *PUC, const uint32_t *S,
 665           size_t N)
 666      Returns the length (number of units) of the first character in S,
 667      putting its ‘ucs4_t’ representation in ‘*PUC’.  Upon failure,
 668      ‘*PUC’ is set to ‘0xfffd’, and an appropriate number of units is
 669      returned.
 670
 671      The number of available units, N, must be > 0.
 672
 673      This function is similar to ‘mbtowc’, except that it operates on a
 674      Unicode string, PUC and S must not be NULL, N must be > 0, and the
 675      NUL character is not treated specially.
 676
 677  -- Function: int u8_mbtouc (ucs4_t *PUC, const uint8_t *S, size_t N)
 678  -- Function: int u16_mbtouc (ucs4_t *PUC, const uint16_t *S, size_t N)
 679  -- Function: int u32_mbtouc (ucs4_t *PUC, const uint32_t *S, size_t N)
 680      This function is like ‘u8_mbtouc_unsafe’, except that it will
 681      detect an invalid UTF-8 character, even if the library is compiled
 682      without ‘--enable-safety’.
 683
 684  -- Function: int u8_mbtoucr (ucs4_t *PUC, const uint8_t *S, size_t N)
 685  -- Function: int u16_mbtoucr (ucs4_t *PUC, const uint16_t *S, size_t N)
 686  -- Function: int u32_mbtoucr (ucs4_t *PUC, const uint32_t *S, size_t N)
 687      Returns the length (number of units) of the first character in S,
 688      putting its ‘ucs4_t’ representation in ‘*PUC’.  Upon failure,
 689      ‘*PUC’ is set to ‘0xfffd’, and -1 is returned for an invalid
 690      sequence of units, -2 is returned for an incomplete sequence of
 691      units.
 692
 693      The number of available units, N, must be > 0.
 694
 695      This function is similar to ‘u8_mbtouc’, except that the return
 696      value gives more details about the failure, similar to ‘mbrtowc’.
 697
 698    The following function stores a Unicode character as a Unicode string
 699 in memory.
 700
 701  -- Function: int u8_uctomb (uint8_t *S, ucs4_t UC, int N)
 702  -- Function: int u16_uctomb (uint16_t *S, ucs4_t UC, int N)
 703  -- Function: int u32_uctomb (uint32_t *S, ucs4_t UC, int N)
 704      Puts the multibyte character represented by UC in S, returning its
 705      length.  Returns -1 upon failure, -2 if the number of available
 706      units, N, is too small.  The latter case cannot occur if N >=
 707      6/2/1, respectively.
 708
 709      This function is similar to ‘wctomb’, except that it operates on a
 710      Unicode strings, S must not be NULL, and the argument N must be
 711      specified.
 712
 713    The following functions copy Unicode strings in memory.
 714
 715  -- Function: uint8_t * u8_cpy (uint8_t *DEST, const uint8_t *SRC,
 716           size_t N)
 717  -- Function: uint16_t * u16_cpy (uint16_t *DEST, const uint16_t *SRC,
 718           size_t N)
 719  -- Function: uint32_t * u32_cpy (uint32_t *DEST, const uint32_t *SRC,
 720           size_t N)
 721      Copies N units from SRC to DEST.
 722
 723      This function is similar to ‘memcpy’, except that it operates on
 724      Unicode strings.
 725
 726  -- Function: uint8_t * u8_move (uint8_t *DEST, const uint8_t *SRC,
 727           size_t N)
 728  -- Function: uint16_t * u16_move (uint16_t *DEST, const uint16_t *SRC,
 729           size_t N)
 730  -- Function: uint32_t * u32_move (uint32_t *DEST, const uint32_t *SRC,
 731           size_t N)
 732      Copies N units from SRC to DEST, guaranteeing correct behavior for
 733      overlapping memory areas.
 734
 735      This function is similar to ‘memmove’, except that it operates on
 736      Unicode strings.
 737
 738    The following function fills a Unicode string.
 739
 740  -- Function: uint8_t * u8_set (uint8_t *S, ucs4_t UC, size_t N)
 741  -- Function: uint16_t * u16_set (uint16_t *S, ucs4_t UC, size_t N)
 742  -- Function: uint32_t * u32_set (uint32_t *S, ucs4_t UC, size_t N)
 743      Sets the first N characters of S to UC.  UC should be a character
 744      that occupies only 1 unit.
 745
 746      This function is similar to ‘memset’, except that it operates on
 747      Unicode strings.
 748
 749    The following function compares two Unicode strings of the same
 750 length.
 751
 752  -- Function: int u8_cmp (const uint8_t *S1, const uint8_t *S2, size_t
 753           N)
 754  -- Function: int u16_cmp (const uint16_t *S1, const uint16_t *S2,
 755           size_t N)
 756  -- Function: int u32_cmp (const uint32_t *S1, const uint32_t *S2,
 757           size_t N)
 758      Compares S1 and S2, each of length N, lexicographically.  Returns a
 759      negative value if S1 compares smaller than S2, a positive value if
 760      S1 compares larger than S2, or 0 if they compare equal.
 761
 762      This function is similar to ‘memcmp’, except that it operates on
 763      Unicode strings.
 764
 765    The following function compares two Unicode strings of possibly
 766 different lengths.
 767
 768  -- Function: int u8_cmp2 (const uint8_t *S1, size_t N1, const uint8_t
 769           *S2, size_t N2)
 770  -- Function: int u16_cmp2 (const uint16_t *S1, size_t N1, const
 771           uint16_t *S2, size_t N2)
 772  -- Function: int u32_cmp2 (const uint32_t *S1, size_t N1, const
 773           uint32_t *S2, size_t N2)
 774      Compares S1 and S2, lexicographically.  Returns a negative value if
 775      S1 compares smaller than S2, a positive value if S1 compares larger
 776      than S2, or 0 if they compare equal.
 777
 778      This function is similar to the gnulib function ‘memcmp2’, except
 779      that it operates on Unicode strings.
 780
 781    The following function searches for a given Unicode character.
 782
 783  -- Function: uint8_t * u8_chr (const uint8_t *S, size_t N, ucs4_t UC)
 784  -- Function: uint16_t * u16_chr (const uint16_t *S, size_t N, ucs4_t
 785           UC)
 786  -- Function: uint32_t * u32_chr (const uint32_t *S, size_t N, ucs4_t
 787           UC)
 788      Searches the string at S for UC.  Returns a pointer to the first
 789      occurrence of UC in S, or NULL if UC does not occur in S.
 790
 791      This function is similar to ‘memchr’, except that it operates on
 792      Unicode strings.
 793
 794    The following function counts the number of Unicode characters.
 795
 796  -- Function: size_t u8_mbsnlen (const uint8_t *S, size_t N)
 797  -- Function: size_t u16_mbsnlen (const uint16_t *S, size_t N)
 798  -- Function: size_t u32_mbsnlen (const uint32_t *S, size_t N)
 799      Counts and returns the number of Unicode characters in the N units
 800      from S.
 801
 802      This function is similar to the gnulib function ‘mbsnlen’, except
 803      that it operates on Unicode strings.
 804
 805 \1f
 806 File: libunistring.info,  Node: Elementary string functions with memory allocation,  Next: Elementary string functions on NUL terminated strings,  Prev: Elementary string functions,  Up: unistr.h
 807
 808 4.4 Elementary string functions with memory allocation
 809 ======================================================
 810
 811    The following function copies a Unicode string.
 812
 813  -- Function: uint8_t * u8_cpy_alloc (const uint8_t *S, size_t N)
 814  -- Function: uint16_t * u16_cpy_alloc (const uint16_t *S, size_t N)
 815  -- Function: uint32_t * u32_cpy_alloc (const uint32_t *S, size_t N)
 816      Makes a freshly allocated copy of S, of length N.
 817
 818 \1f
 819 File: libunistring.info,  Node: Elementary string functions on NUL terminated strings,  Prev: Elementary string functions with memory allocation,  Up: unistr.h
 820
 821 4.5 Elementary string functions on NUL terminated strings
 822 =========================================================
 823
 824    The following functions inspect and return details about the first
 825 character in a Unicode string.
 826
 827  -- Function: int u8_strmblen (const uint8_t *S)
 828  -- Function: int u16_strmblen (const uint16_t *S)
 829  -- Function: int u32_strmblen (const uint32_t *S)
 830      Returns the length (number of units) of the first character in S.
 831      Returns 0 if it is the NUL character.  Returns -1 upon failure.
 832
 833  -- Function: int u8_strmbtouc (ucs4_t *PUC, const uint8_t *S)
 834  -- Function: int u16_strmbtouc (ucs4_t *PUC, const uint16_t *S)
 835  -- Function: int u32_strmbtouc (ucs4_t *PUC, const uint32_t *S)
 836      Returns the length (number of units) of the first character in S,
 837      putting its ‘ucs4_t’ representation in ‘*PUC’.  Returns 0 if it is
 838      the NUL character.  Returns -1 upon failure.
 839
 840  -- Function: const uint8_t * u8_next (ucs4_t *PUC, const uint8_t *S)
 841  -- Function: const uint16_t * u16_next (ucs4_t *PUC, const uint16_t *S)
 842  -- Function: const uint32_t * u32_next (ucs4_t *PUC, const uint32_t *S)
 843      Forward iteration step.  Advances the pointer past the next
 844      character, or returns NULL if the end of the string has been
 845      reached.  Puts the character’s ‘ucs4_t’ representation in ‘*PUC’.
 846
 847    The following function inspects and returns details about the
 848 previous character in a Unicode string.
 849
 850  -- Function: const uint8_t * u8_prev (ucs4_t *PUC, const uint8_t *S,
 851           const uint8_t *START)
 852  -- Function: const uint16_t * u16_prev (ucs4_t *PUC, const uint16_t *S,
 853           const uint16_t *START)
 854  -- Function: const uint32_t * u32_prev (ucs4_t *PUC, const uint32_t *S,
 855           const uint32_t *START)
 856      Backward iteration step.  Advances the pointer to point to the
 857      previous character (the one that ends at ‘S’), or returns NULL if
 858      the beginning of the string (specified by ‘START’) had been
 859      reached.  Puts the character’s ‘ucs4_t’ representation in ‘*PUC’.
 860      Note that this function works only on well-formed Unicode strings.
 861
 862    The following functions determine the length of a Unicode string.
 863
 864  -- Function: size_t u8_strlen (const uint8_t *S)
 865  -- Function: size_t u16_strlen (const uint16_t *S)
 866  -- Function: size_t u32_strlen (const uint32_t *S)
 867      Returns the number of units in S.
 868
 869      This function is similar to ‘strlen’ and ‘wcslen’, except that it
 870      operates on Unicode strings.
 871
 872  -- Function: size_t u8_strnlen (const uint8_t *S, size_t MAXLEN)
 873  -- Function: size_t u16_strnlen (const uint16_t *S, size_t MAXLEN)
 874  -- Function: size_t u32_strnlen (const uint32_t *S, size_t MAXLEN)
 875      Returns the number of units in S, but at most MAXLEN.
 876
 877      This function is similar to ‘strnlen’ and ‘wcsnlen’, except that it
 878      operates on Unicode strings.
 879
 880    The following functions copy portions of Unicode strings in memory.
 881
 882  -- Function: uint8_t * u8_strcpy (uint8_t *DEST, const uint8_t *SRC)
 883  -- Function: uint16_t * u16_strcpy (uint16_t *DEST, const uint16_t
 884           *SRC)
 885  -- Function: uint32_t * u32_strcpy (uint32_t *DEST, const uint32_t
 886           *SRC)
 887      Copies SRC to DEST.
 888
 889      This function is similar to ‘strcpy’ and ‘wcscpy’, except that it
 890      operates on Unicode strings.
 891
 892  -- Function: uint8_t * u8_stpcpy (uint8_t *DEST, const uint8_t *SRC)
 893  -- Function: uint16_t * u16_stpcpy (uint16_t *DEST, const uint16_t
 894           *SRC)
 895  -- Function: uint32_t * u32_stpcpy (uint32_t *DEST, const uint32_t
 896           *SRC)
 897      Copies SRC to DEST, returning the address of the terminating NUL in
 898      DEST.
 899
 900      This function is similar to ‘stpcpy’, except that it operates on
 901      Unicode strings.
 902
 903  -- Function: uint8_t * u8_strncpy (uint8_t *DEST, const uint8_t *SRC,
 904           size_t N)
 905  -- Function: uint16_t * u16_strncpy (uint16_t *DEST, const uint16_t
 906           *SRC, size_t N)
 907  -- Function: uint32_t * u32_strncpy (uint32_t *DEST, const uint32_t
 908           *SRC, size_t N)
 909      Copies no more than N units of SRC to DEST.
 910
 911      This function is similar to ‘strncpy’ and ‘wcsncpy’, except that it
 912      operates on Unicode strings.
 913
 914  -- Function: uint8_t * u8_stpncpy (uint8_t *DEST, const uint8_t *SRC,
 915           size_t N)
 916  -- Function: uint16_t * u16_stpncpy (uint16_t *DEST, const uint16_t
 917           *SRC, size_t N)
 918  -- Function: uint32_t * u32_stpncpy (uint32_t *DEST, const uint32_t
 919           *SRC, size_t N)
 920      Copies no more than N units of SRC to DEST.  Returns a pointer past
 921      the last non-NUL unit written into DEST.  In other words, if the
 922      units written into DEST include a NUL, the return value is the
 923      address of the first such NUL unit, otherwise it is ‘DEST + N’.
 924
 925      This function is similar to ‘stpncpy’, except that it operates on
 926      Unicode strings.
 927
 928  -- Function: uint8_t * u8_strcat (uint8_t *DEST, const uint8_t *SRC)
 929  -- Function: uint16_t * u16_strcat (uint16_t *DEST, const uint16_t
 930           *SRC)
 931  -- Function: uint32_t * u32_strcat (uint32_t *DEST, const uint32_t
 932           *SRC)
 933      Appends SRC onto DEST.
 934
 935      This function is similar to ‘strcat’ and ‘wcscat’, except that it
 936      operates on Unicode strings.
 937
 938  -- Function: uint8_t * u8_strncat (uint8_t *DEST, const uint8_t *SRC,
 939           size_t N)
 940  -- Function: uint16_t * u16_strncat (uint16_t *DEST, const uint16_t
 941           *SRC, size_t N)
 942  -- Function: uint32_t * u32_strncat (uint32_t *DEST, const uint32_t
 943           *SRC, size_t N)
 944      Appends no more than N units of SRC onto DEST.
 945
 946      This function is similar to ‘strncat’ and ‘wcsncat’, except that it
 947      operates on Unicode strings.
 948
 949    The following functions compare two Unicode strings.
 950
 951  -- Function: int u8_strcmp (const uint8_t *S1, const uint8_t *S2)
 952  -- Function: int u16_strcmp (const uint16_t *S1, const uint16_t *S2)
 953  -- Function: int u32_strcmp (const uint32_t *S1, const uint32_t *S2)
 954      Compares S1 and S2, lexicographically.  Returns a negative value if
 955      S1 compares smaller than S2, a positive value if S1 compares larger
 956      than S2, or 0 if they compare equal.
 957
 958      This function is similar to ‘strcmp’ and ‘wcscmp’, except that it
 959      operates on Unicode strings.
 960
 961  -- Function: int u8_strcoll (const uint8_t *S1, const uint8_t *S2)
 962  -- Function: int u16_strcoll (const uint16_t *S1, const uint16_t *S2)
 963  -- Function: int u32_strcoll (const uint32_t *S1, const uint32_t *S2)
 964      Compares S1 and S2 using the collation rules of the current locale.
 965      Returns -1 if S1 < S2, 0 if S1 = S2, 1 if S1 > S2.  Upon failure,
 966      sets ‘errno’ and returns any value.
 967
 968      This function is similar to ‘strcoll’ and ‘wcscoll’, except that it
 969      operates on Unicode strings.
 970
 971      Note that this function may consider different canonical
 972      normalizations of the same string as having a large distance.  It
 973      is therefore better to use the function ‘u8_normcoll’ instead of
 974      this one; see *note uninorm.h::.
 975
 976  -- Function: int u8_strncmp (const uint8_t *S1, const uint8_t *S2,
 977           size_t N)
 978  -- Function: int u16_strncmp (const uint16_t *S1, const uint16_t *S2,
 979           size_t N)
 980  -- Function: int u32_strncmp (const uint32_t *S1, const uint32_t *S2,
 981           size_t N)
 982      Compares no more than N units of S1 and S2.
 983
 984      This function is similar to ‘strncmp’ and ‘wcsncmp’, except that it
 985      operates on Unicode strings.
 986
 987    The following function allocates a duplicate of a Unicode string.
 988
 989  -- Function: uint8_t * u8_strdup (const uint8_t *S)
 990  -- Function: uint16_t * u16_strdup (const uint16_t *S)
 991  -- Function: uint32_t * u32_strdup (const uint32_t *S)
 992      Duplicates S, returning an identical malloc’d string.
 993
 994      This function is similar to ‘strdup’ and ‘wcsdup’, except that it
 995      operates on Unicode strings.
 996
 997    The following functions search for a given Unicode character.
 998
 999  -- Function: uint8_t * u8_strchr (const uint8_t *STR, ucs4_t UC)
1000  -- Function: uint16_t * u16_strchr (const uint16_t *STR, ucs4_t UC)
1001  -- Function: uint32_t * u32_strchr (const uint32_t *STR, ucs4_t UC)
1002      Finds the first occurrence of UC in STR.
1003
1004      This function is similar to ‘strchr’ and ‘wcschr’, except that it
1005      operates on Unicode strings.
1006
1007  -- Function: uint8_t * u8_strrchr (const uint8_t *STR, ucs4_t UC)
1008  -- Function: uint16_t * u16_strrchr (const uint16_t *STR, ucs4_t UC)
1009  -- Function: uint32_t * u32_strrchr (const uint32_t *STR, ucs4_t UC)
1010      Finds the last occurrence of UC in STR.
1011
1012      This function is similar to ‘strrchr’ and ‘wcsrchr’, except that it
1013      operates on Unicode strings.
1014
1015    The following functions search for the first occurrence of some
1016 Unicode character in or outside a given set of Unicode characters.
1017
1018  -- Function: size_t u8_strcspn (const uint8_t *STR, const uint8_t
1019           *REJECT)
1020  -- Function: size_t u16_strcspn (const uint16_t *STR, const uint16_t
1021           *REJECT)
1022  -- Function: size_t u32_strcspn (const uint32_t *STR, const uint32_t
1023           *REJECT)
1024      Returns the length of the initial segment of STR which consists
1025      entirely of Unicode characters not in REJECT.
1026
1027      This function is similar to ‘strcspn’ and ‘wcscspn’, except that it
1028      operates on Unicode strings.
1029
1030  -- Function: size_t u8_strspn (const uint8_t *STR, const uint8_t
1031           *ACCEPT)
1032  -- Function: size_t u16_strspn (const uint16_t *STR, const uint16_t
1033           *ACCEPT)
1034  -- Function: size_t u32_strspn (const uint32_t *STR, const uint32_t
1035           *ACCEPT)
1036      Returns the length of the initial segment of STR which consists
1037      entirely of Unicode characters in ACCEPT.
1038
1039      This function is similar to ‘strspn’ and ‘wcsspn’, except that it
1040      operates on Unicode strings.
1041
1042  -- Function: uint8_t * u8_strpbrk (const uint8_t *STR, const uint8_t
1043           *ACCEPT)
1044  -- Function: uint16_t * u16_strpbrk (const uint16_t *STR, const
1045           uint16_t *ACCEPT)
1046  -- Function: uint32_t * u32_strpbrk (const uint32_t *STR, const
1047           uint32_t *ACCEPT)
1048      Finds the first occurrence in STR of any character in ACCEPT.
1049
1050      This function is similar to ‘strpbrk’ and ‘wcspbrk’, except that it
1051      operates on Unicode strings.
1052
1053    The following functions search whether a given Unicode string is a
1054 substring of another Unicode string.
1055
1056  -- Function: uint8_t * u8_strstr (const uint8_t *HAYSTACK, const
1057           uint8_t *NEEDLE)
1058  -- Function: uint16_t * u16_strstr (const uint16_t *HAYSTACK, const
1059           uint16_t *NEEDLE)
1060  -- Function: uint32_t * u32_strstr (const uint32_t *HAYSTACK, const
1061           uint32_t *NEEDLE)
1062      Finds the first occurrence of NEEDLE in HAYSTACK.
1063
1064      This function is similar to ‘strstr’ and ‘wcsstr’, except that it
1065      operates on Unicode strings.
1066
1067  -- Function: bool u8_startswith (const uint8_t *STR, const uint8_t
1068           *PREFIX)
1069  -- Function: bool u16_startswith (const uint16_t *STR, const uint16_t
1070           *PREFIX)
1071  -- Function: bool u32_startswith (const uint32_t *STR, const uint32_t
1072           *PREFIX)
1073      Tests whether STR starts with PREFIX.
1074
1075  -- Function: bool u8_endswith (const uint8_t *STR, const uint8_t
1076           *SUFFIX)
1077  -- Function: bool u16_endswith (const uint16_t *STR, const uint16_t
1078           *SUFFIX)
1079  -- Function: bool u32_endswith (const uint32_t *STR, const uint32_t
1080           *SUFFIX)
1081      Tests whether STR ends with SUFFIX.
1082
1083    The following function does one step in tokenizing a Unicode string.
1084
1085  -- Function: uint8_t * u8_strtok (uint8_t *STR, const uint8_t *DELIM,
1086           uint8_t **PTR)
1087  -- Function: uint16_t * u16_strtok (uint16_t *STR, const uint16_t
1088           *DELIM, uint16_t **PTR)
1089  -- Function: uint32_t * u32_strtok (uint32_t *STR, const uint32_t
1090           *DELIM, uint32_t **PTR)
1091      Divides STR into tokens separated by characters in DELIM.
1092
1093      This function is similar to ‘strtok_r’ and ‘wcstok’, except that it
1094      operates on Unicode strings.  Its interface is actually more
1095      similar to ‘wcstok’ than to ‘strtok’.
1096
1097 \1f
1098 File: libunistring.info,  Node: uniconv.h,  Next: unistdio.h,  Prev: unistr.h,  Up: Top
1099
1100 5 Conversions between Unicode and encodings ‘<uniconv.h>’
1101 *********************************************************
1102
1103    This include file declares functions for converting between Unicode
1104 strings and ‘char *’ strings in locale encoding or in other specified
1105 encodings.
1106
1107    The following function returns the locale encoding.
1108
1109  -- Function: const char * locale_charset ()
1110      Determines the current locale’s character encoding, and
1111      canonicalizes it into one of the canonical names listed in
1112      ‘config.charset’.  If the canonical name cannot be determined, the
1113      result is a non-canonical name.
1114
1115      The result must not be freed; it is statically allocated.
1116
1117      The result of this function can be used as an argument to the
1118      ‘iconv_open’ function in GNU libc, in GNU libiconv, or in the
1119      gnulib provided wrapper around the native ‘iconv_open’ function.
1120      It may not work as an argument to the native ‘iconv_open’ function
1121      directly.
1122
1123    The handling of unconvertible characters during the conversions can
1124 be parametrized through the following enumeration type:
1125
1126  -- Type: enum iconv_ilseq_handler
1127      This type specifies how unconvertible characters in the input are
1128      handled.
1129
1130  -- Constant: enum iconv_ilseq_handler iconveh_error
1131      This handler causes the function to return with ‘errno’ set to
1132      ‘EILSEQ’.
1133
1134  -- Constant: enum iconv_ilseq_handler iconveh_question_mark
1135      This handler produces one question mark ‘?’ per unconvertible
1136      character.
1137
1138  -- Constant: enum iconv_ilseq_handler iconveh_escape_sequence
1139      This handler produces an escape sequence ‘\uXXXX’ or ‘\UXXXXXXXX’
1140      for each unconvertible character.
1141
1142    The following functions convert between strings in a specified
1143 encoding and Unicode strings.
1144
1145  -- Function: uint8_t * u8_conv_from_encoding (const char *FROMCODE,
1146           enum iconv_ilseq_handler HANDLER, const char *SRC, size_t
1147           SRCLEN, size_t *OFFSETS, uint8_t *RESULTBUF, size_t *LENGTHP)
1148  -- Function: uint16_t * u16_conv_from_encoding (const char *FROMCODE,
1149           enum iconv_ilseq_handler HANDLER, const char *SRC, size_t
1150           SRCLEN, size_t *OFFSETS, uint16_t *RESULTBUF, size_t *LENGTHP)
1151  -- Function: uint32_t * u32_conv_from_encoding (const char *FROMCODE,
1152           enum iconv_ilseq_handler HANDLER, const char *SRC, size_t
1153           SRCLEN, size_t *OFFSETS, uint32_t *RESULTBUF, size_t *LENGTHP)
1154      Converts an entire string, possibly including NUL bytes, from one
1155      encoding to UTF-8 encoding.
1156
1157      Converts a memory region given in encoding FROMCODE.  FROMCODE is
1158      as for the ‘iconv_open’ function.
1159
1160      The input is in the memory region between SRC (inclusive) and ‘SRC
1161      + SRCLEN’ (exclusive).
1162
1163      If OFFSETS is not NULL, it should point to an array of SRCLEN
1164      integers; this array is filled with offsets into the result, i.e.
1165      the character starting at ‘SRC[i]’ corresponds to the character
1166      starting at ‘RESULT[OFFSETS[i]]’, and other offsets are set to
1167      ‘(size_t)(-1)’.
1168
1169      ‘RESULTBUF’ and ‘*LENGTHP’ should be a scratch buffer and its size,
1170      or ‘RESULTBUF’ can be NULL.
1171
1172      May erase the contents of the memory at ‘RESULTBUF’.
1173
1174      If successful: The resulting Unicode string (non-NULL) is returned
1175      and its length stored in ‘*LENGTHP’.  The resulting string is
1176      ‘RESULTBUF’ if no dynamic memory allocation was necessary, or a
1177      freshly allocated memory block otherwise.
1178
1179      In case of error: NULL is returned and ‘errno’ is set.  Particular
1180      ‘errno’ values: ‘EINVAL’, ‘EILSEQ’, ‘ENOMEM’.
1181
1182  -- Function: char * u8_conv_to_encoding (const char *TOCODE, enum
1183           iconv_ilseq_handler HANDLER, const uint8_t *SRC, size_t
1184           SRCLEN, size_t *OFFSETS, char *RESULTBUF, size_t *LENGTHP)
1185  -- Function: char * u16_conv_to_encoding (const char *TOCODE, enum
1186           iconv_ilseq_handler HANDLER, const uint16_t *SRC, size_t
1187           SRCLEN, size_t *OFFSETS, char *RESULTBUF, size_t *LENGTHP)
1188  -- Function: char * u32_conv_to_encoding (const char *TOCODE, enum
1189           iconv_ilseq_handler HANDLER, const uint32_t *SRC, size_t
1190           SRCLEN, size_t *OFFSETS, char *RESULTBUF, size_t *LENGTHP)
1191      Converts an entire Unicode string, possibly including NUL units,
1192      from UTF-8 encoding to a given encoding.
1193
1194      Converts a memory region to encoding TOCODE.  TOCODE is as for the
1195      ‘iconv_open’ function.
1196
1197      The input is in the memory region between SRC (inclusive) and ‘SRC
1198      + SRCLEN’ (exclusive).
1199
1200      If OFFSETS is not NULL, it should point to an array of SRCLEN
1201      integers; this array is filled with offsets into the result, i.e.
1202      the character starting at ‘SRC[i]’ corresponds to the character
1203      starting at ‘RESULT[OFFSETS[i]]’, and other offsets are set to
1204      ‘(size_t)(-1)’.
1205
1206      ‘RESULTBUF’ and ‘*LENGTHP’ should be a scratch buffer and its size,
1207      or ‘RESULTBUF’ can be NULL.
1208
1209      May erase the contents of the memory at ‘RESULTBUF’.
1210
1211      If successful: The resulting Unicode string (non-NULL) is returned
1212      and its length stored in ‘*LENGTHP’.  The resulting string is
1213      ‘RESULTBUF’ if no dynamic memory allocation was necessary, or a
1214      freshly allocated memory block otherwise.
1215
1216      In case of error: NULL is returned and ‘errno’ is set.  Particular
1217      ‘errno’ values: ‘EINVAL’, ‘EILSEQ’, ‘ENOMEM’.
1218
1219    The following functions convert between NUL terminated strings in a
1220 specified encoding and NUL terminated Unicode strings.
1221
1222  -- Function: uint8_t * u8_strconv_from_encoding (const char *STRING,
1223           const char *FROMCODE, enum iconv_ilseq_handler HANDLER)
1224  -- Function: uint16_t * u16_strconv_from_encoding (const char *STRING,
1225           const char *FROMCODE, enum iconv_ilseq_handler HANDLER)
1226  -- Function: uint32_t * u32_strconv_from_encoding (const char *STRING,
1227           const char *FROMCODE, enum iconv_ilseq_handler HANDLER)
1228      Converts a NUL terminated string from a given encoding.
1229
1230      The result is ‘malloc’ allocated, or NULL (with ERRNO set) in case
1231      of error.
1232
1233      Particular ‘errno’ values: ‘EILSEQ’, ‘ENOMEM’.
1234
1235  -- Function: char * u8_strconv_to_encoding (const uint8_t *STRING,
1236           const char *TOCODE, enum iconv_ilseq_handler HANDLER)
1237  -- Function: char * u16_strconv_to_encoding (const uint16_t *STRING,
1238           const char *TOCODE, enum iconv_ilseq_handler HANDLER)
1239  -- Function: char * u32_strconv_to_encoding (const uint32_t *STRING,
1240           const char *TOCODE, enum iconv_ilseq_handler HANDLER)
1241      Converts a NUL terminated string to a given encoding.
1242
1243      The result is ‘malloc’ allocated, or NULL (with ‘errno’ set) in
1244      case of error.
1245
1246      Particular ‘errno’ values: ‘EILSEQ’, ‘ENOMEM’.
1247
1248    The following functions are shorthands that convert between NUL
1249 terminated strings in locale encoding and NUL terminated Unicode
1250 strings.
1251
1252  -- Function: uint8_t * u8_strconv_from_locale (const char *STRING)
1253  -- Function: uint16_t * u16_strconv_from_locale (const char *STRING)
1254  -- Function: uint32_t * u32_strconv_from_locale (const char *STRING)
1255      Converts a NUL terminated string from the locale encoding.
1256
1257      The result is ‘malloc’ allocated, or NULL (with ‘errno’ set) in
1258      case of error.
1259
1260      Particular ‘errno’ values: ‘ENOMEM’.
1261
1262  -- Function: char * u8_strconv_to_locale (const uint8_t *STRING)
1263  -- Function: char * u16_strconv_to_locale (const uint16_t *STRING)
1264  -- Function: char * u32_strconv_to_locale (const uint32_t *STRING)
1265      Converts a NUL terminated string to the locale encoding.
1266
1267      The result is ‘malloc’ allocated, or NULL (with ‘errno’ set) in
1268      case of error.
1269
1270      Particular ‘errno’ values: ‘ENOMEM’.
1271
1272 \1f
1273 File: libunistring.info,  Node: unistdio.h,  Next: uniname.h,  Prev: uniconv.h,  Up: Top
1274
1275 6 Output with Unicode strings ‘<unistdio.h>’
1276 ********************************************
1277
1278    This include file declares functions for doing formatted output with
1279 Unicode strings.  It defines a set of functions similar to ‘fprintf’ and
1280 ‘sprintf’, which are declared in ‘<stdio.h>’.
1281
1282    These functions work like the ‘printf’ function family.  In the
1283 format string:
1284    • The format directive ‘U’ takes an UTF-8 string (‘const uint8_t *’).
1285    • The format directive ‘lU’ takes an UTF-16 string (‘const uint16_t
1286      *’).
1287    • The format directive ‘llU’ takes an UTF-32 string (‘const uint32_t
1288      *’).
1289
1290    A function name with an infix ‘v’ indicates that a ‘va_list’ is
1291 passed instead of multiple arguments.
1292
1293    The functions ‘*sprintf’ have a BUF argument that is assumed to be
1294 large enough.  (_DANGEROUS! Overflowing the buffer will crash the
1295 program._)
1296
1297    The functions ‘*snprintf’ have a BUF argument that is assumed to be
1298 SIZE units large.  (_DANGEROUS! The resulting string might be truncated
1299 in the middle of a multibyte character._)
1300
1301    The functions ‘*asprintf’ have a RESULTP argument.  The result will
1302 be freshly allocated and stored in ‘*resultp’.
1303
1304    The functions ‘*asnprintf’ have a (RESULTBUF, LENGTHP) argument pair.
1305 If RESULTBUF is not NULL and the result fits into ‘*LENGTHP’ units, it
1306 is put in RESULTBUF, and RESULTBUF is returned.  Otherwise, a freshly
1307 allocated string is returned.  In both cases, ‘*LENGTHP’ is set to the
1308 length (number of units) of the returned string.  In case of error, NULL
1309 is returned and ‘errno’ is set.
1310
1311    The following functions take an ASCII format string and return a
1312 result that is a ‘char *’ string in locale encoding.
1313
1314  -- Function: int ulc_sprintf (char *BUF, const char *FORMAT, ...)
1315
1316  -- Function: int ulc_snprintf (char *BUF, size_t size, const char
1317           *FORMAT, ...)
1318
1319  -- Function: int ulc_asprintf (char **RESULTP, const char *FORMAT, ...)
1320
1321  -- Function: char * ulc_asnprintf (char *RESULTBUF, size_t *LENGTHP,
1322           const char *FORMAT, ...)
1323
1324  -- Function: int ulc_vsprintf (char *BUF, const char *FORMAT, va_list
1325           AP)
1326
1327  -- Function: int ulc_vsnprintf (char *BUF, size_t size, const char
1328           *FORMAT, va_list AP)
1329
1330  -- Function: int ulc_vasprintf (char **RESULTP, const char *FORMAT,
1331           va_list AP)
1332
1333  -- Function: char * ulc_vasnprintf (char *RESULTBUF, size_t *LENGTHP,
1334           const char *FORMAT, va_list AP)
1335
1336    The following functions take an ASCII format string and return a
1337 result in UTF-8 format.
1338
1339  -- Function: int u8_sprintf (uint8_t *BUF, const char *FORMAT, ...)
1340  -- Function: int u8_snprintf (uint8_t *BUF, size_t SIZE, const char
1341           *FORMAT, ...)
1342  -- Function: int u8_asprintf (uint8_t **RESULTP, const char *FORMAT,
1343           ...)
1344  -- Function: uint8_t * u8_asnprintf (uint8_t *RESULTBUF, size_t
1345           *LENGTHP, const char *FORMAT, ...)
1346  -- Function: int u8_vsprintf (uint8_t *BUF, const char *FORMAT, va_list
1347           ap)
1348  -- Function: int u8_vsnprintf (uint8_t *BUF, size_t SIZE, const char
1349           *FORMAT, va_list AP)
1350  -- Function: int u8_vasprintf (uint8_t **RESULTP, const char *FORMAT,
1351           va_list AP)
1352  -- Function: uint8_t * u8_vasnprintf (uint8_t *resultbuf, size_t
1353           *LENGTHP, const char *FORMAT, va_list AP)
1354
1355    The following functions take an UTF-8 format string and return a
1356 result in UTF-8 format.
1357
1358  -- Function: int u8_u8_sprintf (uint8_t *BUF, const uint8_t *FORMAT,
1359           ...)
1360  -- Function: int u8_u8_snprintf (uint8_t *BUF, size_t SIZE, const
1361           uint8_t *FORMAT, ...)
1362  -- Function: int u8_u8_asprintf (uint8_t **RESULTP, const uint8_t
1363           *FORMAT, ...)
1364  -- Function: uint8_t * u8_u8_asnprintf (uint8_t *resultbuf, size_t
1365           *LENGTHP, const uint8_t *FORMAT, ...)
1366  -- Function: int u8_u8_vsprintf (uint8_t *BUF, const uint8_t *FORMAT,
1367           va_list AP)
1368  -- Function: int u8_u8_vsnprintf (uint8_t *BUF, size_t SIZE, const
1369           uint8_t *FORMAT, va_list AP)
1370  -- Function: int u8_u8_vasprintf (uint8_t **RESULTP, const uint8_t
1371           *FORMAT, va_list AP)
1372  -- Function: uint8_t * u8_u8_vasnprintf (uint8_t *resultbuf, size_t
1373           *LENGTHP, const uint8_t *FORMAT, va_list AP)
1374
1375    The following functions take an ASCII format string and return a
1376 result in UTF-16 format.
1377
1378  -- Function: int u16_sprintf (uint16_t *BUF, const char *FORMAT, ...)
1379  -- Function: int u16_snprintf (uint16_t *BUF, size_t SIZE, const char
1380           *FORMAT, ...)
1381  -- Function: int u16_asprintf (uint16_t **RESULTP, const char *FORMAT,
1382           ...)
1383  -- Function: uint16_t * u16_asnprintf (uint16_t *RESULTBUF, size_t
1384           *LENGTHP, const char *FORMAT, ...)
1385  -- Function: int u16_vsprintf (uint16_t *BUF, const char *FORMAT,
1386           va_list ap)
1387  -- Function: int u16_vsnprintf (uint16_t *BUF, size_t SIZE, const char
1388           *FORMAT, va_list AP)
1389  -- Function: int u16_vasprintf (uint16_t **RESULTP, const char *FORMAT,
1390           va_list AP)
1391  -- Function: uint16_t * u16_vasnprintf (uint16_t *resultbuf, size_t
1392           *LENGTHP, const char *FORMAT, va_list AP)
1393
1394    The following functions take an UTF-16 format string and return a
1395 result in UTF-16 format.
1396
1397  -- Function: int u16_u16_sprintf (uint16_t *BUF, const uint16_t
1398           *FORMAT, ...)
1399  -- Function: int u16_u16_snprintf (uint16_t *BUF, size_t SIZE, const
1400           uint16_t *FORMAT, ...)
1401  -- Function: int u16_u16_asprintf (uint16_t **RESULTP, const uint16_t
1402           *FORMAT, ...)
1403  -- Function: uint16_t * u16_u16_asnprintf (uint16_t *resultbuf, size_t
1404           *LENGTHP, const uint16_t *FORMAT, ...)
1405  -- Function: int u16_u16_vsprintf (uint16_t *BUF, const uint16_t
1406           *FORMAT, va_list AP)
1407  -- Function: int u16_u16_vsnprintf (uint16_t *BUF, size_t SIZE, const
1408           uint16_t *FORMAT, va_list AP)
1409  -- Function: int u16_u16_vasprintf (uint16_t **RESULTP, const uint16_t
1410           *FORMAT, va_list AP)
1411  -- Function: uint16_t * u16_u16_vasnprintf (uint16_t *resultbuf, size_t
1412           *LENGTHP, const uint16_t *FORMAT, va_list AP)
1413
1414    The following functions take an ASCII format string and return a
1415 result in UTF-32 format.
1416
1417  -- Function: int u32_sprintf (uint32_t *BUF, const char *FORMAT, ...)
1418  -- Function: int u32_snprintf (uint32_t *BUF, size_t SIZE, const char
1419           *FORMAT, ...)
1420  -- Function: int u32_asprintf (uint32_t **RESULTP, const char *FORMAT,
1421           ...)
1422  -- Function: uint32_t * u32_asnprintf (uint32_t *RESULTBUF, size_t
1423           *LENGTHP, const char *FORMAT, ...)
1424  -- Function: int u32_vsprintf (uint32_t *BUF, const char *FORMAT,
1425           va_list ap)
1426  -- Function: int u32_vsnprintf (uint32_t *BUF, size_t SIZE, const char
1427           *FORMAT, va_list AP)
1428  -- Function: int u32_vasprintf (uint32_t **RESULTP, const char *FORMAT,
1429           va_list AP)
1430  -- Function: uint32_t * u32_vasnprintf (uint32_t *resultbuf, size_t
1431           *LENGTHP, const char *FORMAT, va_list AP)
1432
1433    The following functions take an UTF-32 format string and return a
1434 result in UTF-32 format.
1435
1436  -- Function: int u32_u32_sprintf (uint32_t *BUF, const uint32_t
1437           *FORMAT, ...)
1438  -- Function: int u32_u32_snprintf (uint32_t *BUF, size_t SIZE, const
1439           uint32_t *FORMAT, ...)
1440  -- Function: int u32_u32_asprintf (uint32_t **RESULTP, const uint32_t
1441           *FORMAT, ...)
1442  -- Function: uint32_t * u32_u32_asnprintf (uint32_t *resultbuf, size_t
1443           *LENGTHP, const uint32_t *FORMAT, ...)
1444  -- Function: int u32_u32_vsprintf (uint32_t *BUF, const uint32_t
1445           *FORMAT, va_list AP)
1446  -- Function: int u32_u32_vsnprintf (uint32_t *BUF, size_t SIZE, const
1447           uint32_t *FORMAT, va_list AP)
1448  -- Function: int u32_u32_vasprintf (uint32_t **RESULTP, const uint32_t
1449           *FORMAT, va_list AP)
1450  -- Function: uint32_t * u32_u32_vasnprintf (uint32_t *resultbuf, size_t
1451           *LENGTHP, const uint32_t *FORMAT, va_list AP)
1452
1453    The following functions take an ASCII format string and produce
1454 output in locale encoding to a ‘FILE’ stream.
1455
1456  -- Function: int ulc_fprintf (FILE *STREAM, const char *FORMAT, ...)
1457  -- Function: int ulc_vfprintf (FILE *STREAM, const char *FORMAT,
1458           va_list AP)
1459
1460 \1f
1461 File: libunistring.info,  Node: uniname.h,  Next: unictype.h,  Prev: unistdio.h,  Up: Top
1462
1463 7 Names of Unicode characters ‘<uniname.h>’
1464 *******************************************
1465
1466    This include file implements the association between a Unicode
1467 character and its name.
1468
1469    The name of a Unicode character allows to distinguish it from other,
1470 similar looking characters.  For example, the character ‘x’ has the name
1471 ‘"LATIN SMALL LETTER X"’ and is therefore different from the character
1472 named ‘"MULTIPLICATION SIGN"’.
1473
1474  -- Macro: unsigned int UNINAME_MAX
1475      This macro expands to a constant that is the required size of
1476      buffer for a Unicode character name.
1477
1478  -- Function: char * unicode_character_name (ucs4_t UC, char *BUF)
1479      Looks up the name of a Unicode character, in uppercase ASCII. BUF
1480      must point to a buffer, at least ‘UNINAME_MAX’ bytes in size.
1481      Returns the filled BUF, or NULL if the character does not have a
1482      name.
1483
1484  -- Function: ucs4_t unicode_name_character (const char *NAME)
1485      Looks up the Unicode character with a given name, in upper- or
1486      lowercase ASCII. Returns the character if found, or
1487      ‘UNINAME_INVALID’ if not found.
1488
1489  -- Macro: ucs4_t UNINAME_INVALID
1490      This macro expands to a constant that is a special return value of
1491      the ‘unicode_name_character’ function.
1492
1493 \1f
1494 File: libunistring.info,  Node: unictype.h,  Next: uniwidth.h,  Prev: uniname.h,  Up: Top
1495
1496 8 Unicode character classification and properties ‘<unictype.h>’
1497 ****************************************************************
1498
1499    This include file declares functions that classify Unicode characters
1500 and that test whether Unicode characters have specific properties.
1501
1502    The classification assigns a “general category” to every Unicode
1503 character.  This is similar to the classification provided by ISO C in
1504 ‘<wctype.h>’.
1505
1506    Properties are the data that guides various text processing
1507 algorithms in the presence of specific Unicode characters.
1508
1509 * Menu:
1510
1511 * General category::
1512 * Canonical combining class::
1513 * Bidi class::
1514 * Decimal digit value::
1515 * Digit value::
1516 * Numeric value::
1517 * Mirrored character::
1518 * Arabic shaping::
1519 * Properties::
1520 * Scripts::
1521 * Blocks::
1522 * ISO C and Java syntax::
1523 * Classifications like in ISO C::
1524
1525 \1f
1526 File: libunistring.info,  Node: General category,  Next: Canonical combining class,  Up: unictype.h
1527
1528 8.1 General category
1529 ====================
1530
1531    Every Unicode character or code point has a _general category_
1532 assigned to it.  This classification is important for most algorithms
1533 that work on Unicode text.
1534
1535    The GNU libunistring library provides two kinds of API for working
1536 with general categories.  The object oriented API uses a variable to
1537 denote every predefined general category value or combinations thereof.
1538 The low-level API uses a bit mask instead.  The advantage of the object
1539 oriented API is that if only a few predefined general category values
1540 are used, the data tables are relatively small.  When you combine
1541 general category values (using ‘uc_general_category_or’,
1542 ‘uc_general_category_and’, or ‘uc_general_category_and_not’), or when
1543 you use the low level bit masks, a big table is used thats holds the
1544 complete general category information for all Unicode characters.
1545
1546 * Menu:
1547
1548 * Object oriented API::
1549 * Bit mask API::
1550
1551 \1f
1552 File: libunistring.info,  Node: Object oriented API,  Next: Bit mask API,  Up: General category
1553
1554 8.1.1 The object oriented API for general category
1555 --------------------------------------------------
1556
1557  -- Type: uc_general_category_t
1558      This data type denotes a general category value.  It is an
1559      immediate type that can be copied by simple assignment, without
1560      involving memory allocation.  It is not an array type.
1561
1562    The following are the predefined general category value.  Additional
1563 general categories may be added in the future.
1564
1565  -- Constant: uc_general_category_t UC_CATEGORY_L
1566  -- Constant: uc_general_category_t UC_CATEGORY_LC
1567  -- Constant: uc_general_category_t UC_CATEGORY_Lu
1568  -- Constant: uc_general_category_t UC_CATEGORY_Ll
1569  -- Constant: uc_general_category_t UC_CATEGORY_Lt
1570  -- Constant: uc_general_category_t UC_CATEGORY_Lm
1571  -- Constant: uc_general_category_t UC_CATEGORY_Lo
1572  -- Constant: uc_general_category_t UC_CATEGORY_M
1573  -- Constant: uc_general_category_t UC_CATEGORY_Mn
1574  -- Constant: uc_general_category_t UC_CATEGORY_Mc
1575  -- Constant: uc_general_category_t UC_CATEGORY_Me
1576  -- Constant: uc_general_category_t UC_CATEGORY_N
1577  -- Constant: uc_general_category_t UC_CATEGORY_Nd
1578  -- Constant: uc_general_category_t UC_CATEGORY_Nl
1579  -- Constant: uc_general_category_t UC_CATEGORY_No
1580  -- Constant: uc_general_category_t UC_CATEGORY_P
1581  -- Constant: uc_general_category_t UC_CATEGORY_Pc
1582  -- Constant: uc_general_category_t UC_CATEGORY_Pd
1583  -- Constant: uc_general_category_t UC_CATEGORY_Ps
1584  -- Constant: uc_general_category_t UC_CATEGORY_Pe
1585  -- Constant: uc_general_category_t UC_CATEGORY_Pi
1586  -- Constant: uc_general_category_t UC_CATEGORY_Pf
1587  -- Constant: uc_general_category_t UC_CATEGORY_Po
1588  -- Constant: uc_general_category_t UC_CATEGORY_S
1589  -- Constant: uc_general_category_t UC_CATEGORY_Sm
1590  -- Constant: uc_general_category_t UC_CATEGORY_Sc
1591  -- Constant: uc_general_category_t UC_CATEGORY_Sk
1592  -- Constant: uc_general_category_t UC_CATEGORY_So
1593  -- Constant: uc_general_category_t UC_CATEGORY_Z
1594  -- Constant: uc_general_category_t UC_CATEGORY_Zs
1595  -- Constant: uc_general_category_t UC_CATEGORY_Zl
1596  -- Constant: uc_general_category_t UC_CATEGORY_Zp
1597  -- Constant: uc_general_category_t UC_CATEGORY_C
1598  -- Constant: uc_general_category_t UC_CATEGORY_Cc
1599  -- Constant: uc_general_category_t UC_CATEGORY_Cf
1600  -- Constant: uc_general_category_t UC_CATEGORY_Cs
1601  -- Constant: uc_general_category_t UC_CATEGORY_Co
1602  -- Constant: uc_general_category_t UC_CATEGORY_Cn
1603
1604    The following are alias names for predefined General category values.
1605
1606  -- Macro: uc_general_category_t UC_LETTER
1607      This is another name for ‘UC_CATEGORY_L’.
1608
1609  -- Macro: uc_general_category_t UC_CASED_LETTER
1610      This is another name for ‘UC_CATEGORY_LC’.
1611
1612  -- Macro: uc_general_category_t UC_UPPERCASE_LETTER
1613      This is another name for ‘UC_CATEGORY_Lu’.
1614
1615  -- Macro: uc_general_category_t UC_LOWERCASE_LETTER
1616      This is another name for ‘UC_CATEGORY_Ll’.
1617
1618  -- Macro: uc_general_category_t UC_TITLECASE_LETTER
1619      This is another name for ‘UC_CATEGORY_Lt’.
1620
1621  -- Macro: uc_general_category_t UC_MODIFIER_LETTER
1622      This is another name for ‘UC_CATEGORY_Lm’.
1623
1624  -- Macro: uc_general_category_t UC_OTHER_LETTER
1625      This is another name for ‘UC_CATEGORY_Lo’.
1626
1627  -- Macro: uc_general_category_t UC_MARK
1628      This is another name for ‘UC_CATEGORY_M’.
1629
1630  -- Macro: uc_general_category_t UC_NON_SPACING_MARK
1631      This is another name for ‘UC_CATEGORY_Mn’.
1632
1633  -- Macro: uc_general_category_t UC_COMBINING_SPACING_MARK
1634      This is another name for ‘UC_CATEGORY_Mc’.
1635
1636  -- Macro: uc_general_category_t UC_ENCLOSING_MARK
1637      This is another name for ‘UC_CATEGORY_Me’.
1638
1639  -- Macro: uc_general_category_t UC_NUMBER
1640      This is another name for ‘UC_CATEGORY_N’.
1641
1642  -- Macro: uc_general_category_t UC_DECIMAL_DIGIT_NUMBER
1643      This is another name for ‘UC_CATEGORY_Nd’.
1644
1645  -- Macro: uc_general_category_t UC_LETTER_NUMBER
1646      This is another name for ‘UC_CATEGORY_Nl’.
1647
1648  -- Macro: uc_general_category_t UC_OTHER_NUMBER
1649      This is another name for ‘UC_CATEGORY_No’.
1650
1651  -- Macro: uc_general_category_t UC_PUNCTUATION
1652      This is another name for ‘UC_CATEGORY_P’.
1653
1654  -- Macro: uc_general_category_t UC_CONNECTOR_PUNCTUATION
1655      This is another name for ‘UC_CATEGORY_Pc’.
1656
1657  -- Macro: uc_general_category_t UC_DASH_PUNCTUATION
1658      This is another name for ‘UC_CATEGORY_Pd’.
1659
1660  -- Macro: uc_general_category_t UC_OPEN_PUNCTUATION
1661      This is another name for ‘UC_CATEGORY_Ps’ (“start punctuation”).
1662
1663  -- Macro: uc_general_category_t UC_CLOSE_PUNCTUATION
1664      This is another name for ‘UC_CATEGORY_Pe’ (“end punctuation”).
1665
1666  -- Macro: uc_general_category_t UC_INITIAL_QUOTE_PUNCTUATION
1667      This is another name for ‘UC_CATEGORY_Pi’.
1668
1669  -- Macro: uc_general_category_t UC_FINAL_QUOTE_PUNCTUATION
1670      This is another name for ‘UC_CATEGORY_Pf’.
1671
1672  -- Macro: uc_general_category_t UC_OTHER_PUNCTUATION
1673      This is another name for ‘UC_CATEGORY_Po’.
1674
1675  -- Macro: uc_general_category_t UC_SYMBOL
1676      This is another name for ‘UC_CATEGORY_S’.
1677
1678  -- Macro: uc_general_category_t UC_MATH_SYMBOL
1679      This is another name for ‘UC_CATEGORY_Sm’.
1680
1681  -- Macro: uc_general_category_t UC_CURRENCY_SYMBOL
1682      This is another name for ‘UC_CATEGORY_Sc’.
1683
1684  -- Macro: uc_general_category_t UC_MODIFIER_SYMBOL
1685      This is another name for ‘UC_CATEGORY_Sk’.
1686
1687  -- Macro: uc_general_category_t UC_OTHER_SYMBOL
1688      This is another name for ‘UC_CATEGORY_So’.
1689
1690  -- Macro: uc_general_category_t UC_SEPARATOR
1691      This is another name for ‘UC_CATEGORY_Z’.
1692
1693  -- Macro: uc_general_category_t UC_SPACE_SEPARATOR
1694      This is another name for ‘UC_CATEGORY_Zs’.
1695
1696  -- Macro: uc_general_category_t UC_LINE_SEPARATOR
1697      This is another name for ‘UC_CATEGORY_Zl’.
1698
1699  -- Macro: uc_general_category_t UC_PARAGRAPH_SEPARATOR
1700      This is another name for ‘UC_CATEGORY_Zp’.
1701
1702  -- Macro: uc_general_category_t UC_OTHER
1703      This is another name for ‘UC_CATEGORY_C’.
1704
1705  -- Macro: uc_general_category_t UC_CONTROL
1706      This is another name for ‘UC_CATEGORY_Cc’.
1707
1708  -- Macro: uc_general_category_t UC_FORMAT
1709      This is another name for ‘UC_CATEGORY_Cf’.
1710
1711  -- Macro: uc_general_category_t UC_SURROGATE
1712      This is another name for ‘UC_CATEGORY_Cs’.  All code points in this
1713      category are invalid characters.
1714
1715  -- Macro: uc_general_category_t UC_PRIVATE_USE
1716      This is another name for ‘UC_CATEGORY_Co’.
1717
1718  -- Macro: uc_general_category_t UC_UNASSIGNED
1719      This is another name for ‘UC_CATEGORY_Cn’.  Some code points in
1720      this category are invalid characters.
1721
1722    The following functions combine general categories, like in a boolean
1723 algebra, except that there is no ‘not’ operation.
1724
1725  -- Function: uc_general_category_t uc_general_category_or
1726           (uc_general_category_t CATEGORY1, uc_general_category_t
1727           CATEGORY2)
1728      Returns the union of two general categories.  This corresponds to
1729      the unions of the two sets of characters.
1730
1731  -- Function: uc_general_category_t uc_general_category_and
1732           (uc_general_category_t CATEGORY1, uc_general_category_t
1733           CATEGORY2)
1734      Returns the intersection of two general categories as bit masks.
1735      This _does not_ correspond to the intersection of the two sets of
1736      characters.
1737
1738  -- Function: uc_general_category_t uc_general_category_and_not
1739           (uc_general_category_t CATEGORY1, uc_general_category_t
1740           CATEGORY2)
1741      Returns the intersection of a general category with the complement
1742      of a second general category, as bit masks.  This _does not_
1743      correspond to the intersection with complement, when viewing the
1744      categories as sets of characters.
1745
1746    The following functions associate general categories with their name.
1747
1748  -- Function: const char * uc_general_category_name
1749           (uc_general_category_t CATEGORY)
1750      Returns the name of a general category, more precisely, the
1751      abbreviated name.  Returns NULL if the general category corresponds
1752      to a bit mask that does not have a name.
1753
1754  -- Function: const char * uc_general_category_long_name
1755           (uc_general_category_t CATEGORY)
1756      Returns the long name of a general category.  Returns NULL if the
1757      general category corresponds to a bit mask that does not have a
1758      name.
1759
1760  -- Function: uc_general_category_t uc_general_category_byname (const
1761           char *CATEGORY_NAME)
1762      Returns the general category given by name, e.g.  ‘"Lu"’, or by
1763      long name, e.g.  ‘"Uppercase Letter"’.  This lookup ignores spaces,
1764      underscores, or hyphens as word separators and is
1765      case-insignificant.
1766
1767    The following functions view general categories as sets of Unicode
1768 characters.
1769
1770  -- Function: uc_general_category_t uc_general_category (ucs4_t UC)
1771      Returns the general category of a Unicode character.
1772
1773      This function uses a big table.
1774
1775  -- Function: bool uc_is_general_category (ucs4_t UC,
1776           uc_general_category_t CATEGORY)
1777      Tests whether a Unicode character belongs to a given category.  The
1778      CATEGORY argument can be a predefined general category or the
1779      combination of several predefined general categories.
1780
1781 \1f
1782 File: libunistring.info,  Node: Bit mask API,  Prev: Object oriented API,  Up: General category
1783
1784 8.1.2 The bit mask API for general category
1785 -------------------------------------------
1786
1787    The following are the predefined general category value as bit masks.
1788 Additional general categories may be added in the future.
1789
1790  -- Macro: uint32_t UC_CATEGORY_MASK_L
1791  -- Macro: uint32_t UC_CATEGORY_MASK_LC
1792  -- Macro: uint32_t UC_CATEGORY_MASK_Lu
1793  -- Macro: uint32_t UC_CATEGORY_MASK_Ll
1794  -- Macro: uint32_t UC_CATEGORY_MASK_Lt
1795  -- Macro: uint32_t UC_CATEGORY_MASK_Lm
1796  -- Macro: uint32_t UC_CATEGORY_MASK_Lo
1797  -- Macro: uint32_t UC_CATEGORY_MASK_M
1798  -- Macro: uint32_t UC_CATEGORY_MASK_Mn
1799  -- Macro: uint32_t UC_CATEGORY_MASK_Mc
1800  -- Macro: uint32_t UC_CATEGORY_MASK_Me
1801  -- Macro: uint32_t UC_CATEGORY_MASK_N
1802  -- Macro: uint32_t UC_CATEGORY_MASK_Nd
1803  -- Macro: uint32_t UC_CATEGORY_MASK_Nl
1804  -- Macro: uint32_t UC_CATEGORY_MASK_No
1805  -- Macro: uint32_t UC_CATEGORY_MASK_P
1806  -- Macro: uint32_t UC_CATEGORY_MASK_Pc
1807  -- Macro: uint32_t UC_CATEGORY_MASK_Pd
1808  -- Macro: uint32_t UC_CATEGORY_MASK_Ps
1809  -- Macro: uint32_t UC_CATEGORY_MASK_Pe
1810  -- Macro: uint32_t UC_CATEGORY_MASK_Pi
1811  -- Macro: uint32_t UC_CATEGORY_MASK_Pf
1812  -- Macro: uint32_t UC_CATEGORY_MASK_Po
1813  -- Macro: uint32_t UC_CATEGORY_MASK_S
1814  -- Macro: uint32_t UC_CATEGORY_MASK_Sm
1815  -- Macro: uint32_t UC_CATEGORY_MASK_Sc
1816  -- Macro: uint32_t UC_CATEGORY_MASK_Sk
1817  -- Macro: uint32_t UC_CATEGORY_MASK_So
1818  -- Macro: uint32_t UC_CATEGORY_MASK_Z
1819  -- Macro: uint32_t UC_CATEGORY_MASK_Zs
1820  -- Macro: uint32_t UC_CATEGORY_MASK_Zl
1821  -- Macro: uint32_t UC_CATEGORY_MASK_Zp
1822  -- Macro: uint32_t UC_CATEGORY_MASK_C
1823  -- Macro: uint32_t UC_CATEGORY_MASK_Cc
1824  -- Macro: uint32_t UC_CATEGORY_MASK_Cf
1825  -- Macro: uint32_t UC_CATEGORY_MASK_Cs
1826  -- Macro: uint32_t UC_CATEGORY_MASK_Co
1827  -- Macro: uint32_t UC_CATEGORY_MASK_Cn
1828
1829    The following function views general categories as sets of Unicode
1830 characters.
1831
1832  -- Function: bool uc_is_general_category_withtable (ucs4_t UC, uint32_t
1833           BITMASK)
1834      Tests whether a Unicode character belongs to a given category.  The
1835      BITMASK argument can be a predefined general category bitmask or
1836      the combination of several predefined general category bitmasks.
1837
1838      This function uses a big table comprising all general categories.
1839
1840 \1f
1841 File: libunistring.info,  Node: Canonical combining class,  Next: Bidi class,  Prev: General category,  Up: unictype.h
1842
1843 8.2 Canonical combining class
1844 =============================
1845
1846    Every Unicode character or code point has a _canonical combining
1847 class_ assigned to it.
1848
1849    What is the meaning of the canonical combining class?  Essentially,
1850 it indicates the priority with which a combining character is attached
1851 to its base character.  The characters for which the canonical combining
1852 class is 0 are the base characters, and the characters for which it is
1853 greater than 0 are the combining characters.  Combining characters are
1854 rendered near/attached/around their base character, and combining
1855 characters with small combining classes are attached "first" or "closer"
1856 to the base character.
1857
1858    The canonical combining class of a character is a number in the range
1859 0..255.  The possible values are described in the Unicode Character
1860 Database <http://www.unicode.org/Public/UNIDATA/UCD.html>.  The list
1861 here is not definitive; more values can be added in future versions.
1862
1863  -- Constant: int UC_CCC_NR
1864      The canonical combining class value for “Not Reordered” characters.
1865      The value is 0.
1866
1867  -- Constant: int UC_CCC_OV
1868      The canonical combining class value for “Overlay” characters.
1869
1870  -- Constant: int UC_CCC_NK
1871      The canonical combining class value for “Nukta” characters.
1872
1873  -- Constant: int UC_CCC_KV
1874      The canonical combining class value for “Kana Voicing” characters.
1875
1876  -- Constant: int UC_CCC_VR
1877      The canonical combining class value for “Virama” characters.
1878
1879  -- Constant: int UC_CCC_ATBL
1880      The canonical combining class value for “Attached Below Left”
1881      characters.
1882
1883  -- Constant: int UC_CCC_ATB
1884      The canonical combining class value for “Attached Below”
1885      characters.
1886
1887  -- Constant: int UC_CCC_ATA
1888      The canonical combining class value for “Attached Above”
1889      characters.
1890
1891  -- Constant: int UC_CCC_ATAR
1892      The canonical combining class value for “Attached Above Right”
1893      characters.
1894
1895  -- Constant: int UC_CCC_BL
1896      The canonical combining class value for “Below Left” characters.
1897
1898  -- Constant: int UC_CCC_B
1899      The canonical combining class value for “Below” characters.
1900
1901  -- Constant: int UC_CCC_BR
1902      The canonical combining class value for “Below Right” characters.
1903
1904  -- Constant: int UC_CCC_L
1905      The canonical combining class value for “Left” characters.
1906
1907  -- Constant: int UC_CCC_R
1908      The canonical combining class value for “Right” characters.
1909
1910  -- Constant: int UC_CCC_AL
1911      The canonical combining class value for “Above Left” characters.
1912
1913  -- Constant: int UC_CCC_A
1914      The canonical combining class value for “Above” characters.
1915
1916  -- Constant: int UC_CCC_AR
1917      The canonical combining class value for “Above Right” characters.
1918
1919  -- Constant: int UC_CCC_DB
1920      The canonical combining class value for “Double Below” characters.
1921
1922  -- Constant: int UC_CCC_DA
1923      The canonical combining class value for “Double Above” characters.
1924
1925  -- Constant: int UC_CCC_IS
1926      The canonical combining class value for “Iota Subscript”
1927      characters.
1928
1929    The following functions associate canonical combining classes with
1930 their name.
1931
1932  -- Function: const char * uc_combining_class_name (int CCC)
1933      Returns the name of a canonical combining class, more precisely,
1934      the abbreviated name.  Returns NULL if the canonical combining
1935      class is a numeric value without a name.
1936
1937  -- Function: const char * uc_combining_class_long_name (int CCC)
1938      Returns the long name of a canonical combining class.  Returns NULL
1939      if the canonical combining class is a numeric value without a name.
1940
1941  -- Function: int uc_combining_class_byname (const char *CCC_NAME)
1942      Returns the canonical combining class given by name, e.g.  ‘"BL"’,
1943      or by long name, e.g.  ‘"Below Left"’.  This lookup ignores spaces,
1944      underscores, or hyphens as word separators and is
1945      case-insignificant.
1946
1947    The following function looks up the canonical combining class of a
1948 character.
1949
1950  -- Function: int uc_combining_class (ucs4_t UC)
1951      Returns the canonical combining class of a Unicode character.
1952
1953 \1f
1954 File: libunistring.info,  Node: Bidi class,  Next: Decimal digit value,  Prev: Canonical combining class,  Up: unictype.h
1955
1956 8.3 Bidi class
1957 ==============
1958
1959    Every Unicode character or code point has a _bidi class_ assigned to
1960 it.  Before Unicode 4.0, this concept was known as _bidirectional
1961 category_.
1962
1963    The bidi class guides the bidirectional algorithm
1964 (<http://www.unicode.org/reports/tr9/>).  The possible values are the
1965 following.
1966
1967  -- Constant: int UC_BIDI_L
1968      The bidi class for ‘Left-to-Right‘” characters.
1969
1970  -- Constant: int UC_BIDI_LRE
1971      The bidi class for “Left-to-Right Embedding” characters.
1972
1973  -- Constant: int UC_BIDI_LRO
1974      The bidi class for “Left-to-Right Override” characters.
1975
1976  -- Constant: int UC_BIDI_R
1977      The bidi class for “Right-to-Left” characters.
1978
1979  -- Constant: int UC_BIDI_AL
1980      The bidi class for “Right-to-Left Arabic” characters.
1981
1982  -- Constant: int UC_BIDI_RLE
1983      The bidi class for “Right-to-Left Embedding” characters.
1984
1985  -- Constant: int UC_BIDI_RLO
1986      The bidi class for “Right-to-Left Override” characters.
1987
1988  -- Constant: int UC_BIDI_PDF
1989      The bidi class for “Pop Directional Format” characters.
1990
1991  -- Constant: int UC_BIDI_EN
1992      The bidi class for “European Number” characters.
1993
1994  -- Constant: int UC_BIDI_ES
1995      The bidi class for “European Number Separator” characters.
1996
1997  -- Constant: int UC_BIDI_ET
1998      The bidi class for “European Number Terminator” characters.
1999
2000  -- Constant: int UC_BIDI_AN
2001      The bidi class for “Arabic Number” characters.
2002
2003  -- Constant: int UC_BIDI_CS
2004      The bidi class for “Common Number Separator” characters.
2005
2006  -- Constant: int UC_BIDI_NSM
2007      The bidi class for “Non-Spacing Mark” characters.
2008
2009  -- Constant: int UC_BIDI_BN
2010      The bidi class for “Boundary Neutral” characters.
2011
2012  -- Constant: int UC_BIDI_B
2013      The bidi class for “Paragraph Separator” characters.
2014
2015  -- Constant: int UC_BIDI_S
2016      The bidi class for “Segment Separator” characters.
2017
2018  -- Constant: int UC_BIDI_WS
2019      The bidi class for “Whitespace” characters.
2020
2021  -- Constant: int UC_BIDI_ON
2022      The bidi class for “Other Neutral” characters.
2023
2024    The following functions implement the association between a
2025 bidirectional category and its name.
2026
2027  -- Function: const char * uc_bidi_class_name (int BIDI_CLASS)
2028  -- Function: const char * uc_bidi_category_name (int CATEGORY)
2029      Returns the name of a bidi class, more precisely, the abbreviated
2030      name.
2031
2032  -- Function: const char * uc_bidi_class_long_name (int BIDI_CLASS)
2033      Returns the long name of a bidi class.
2034
2035  -- Function: int uc_bidi_class_byname (const char *BIDI_CLASS_NAME)
2036  -- Function: int uc_bidi_category_byname (const char *CATEGORY_NAME)
2037      Returns the bidi class given by name, e.g.  ‘"LRE"’, or by long
2038      name, e.g.  ‘"Left-to-Right Embedding"’.  This lookup ignores
2039      spaces, underscores, or hyphens as word separators and is
2040      case-insignificant.
2041
2042    The following functions view bidirectional categories as sets of
2043 Unicode characters.
2044
2045  -- Function: int uc_bidi_class (ucs4_t UC)
2046  -- Function: int uc_bidi_category (ucs4_t UC)
2047      Returns the bidi class of a Unicode character.
2048
2049  -- Function: bool uc_is_bidi_class (ucs4_t UC, int BIDI_CLASS)
2050  -- Function: bool uc_is_bidi_category (ucs4_t UC, int CATEGORY)
2051      Tests whether a Unicode character belongs to a given bidi class.
2052
2053 \1f
2054 File: libunistring.info,  Node: Decimal digit value,  Next: Digit value,  Prev: Bidi class,  Up: unictype.h
2055
2056 8.4 Decimal digit value
2057 =======================
2058
2059    Decimal digits (like the digits from ‘0’ to ‘9’) exist in many
2060 scripts.  The following function converts a decimal digit character to
2061 its numerical value.
2062
2063  -- Function: int uc_decimal_value (ucs4_t UC)
2064      Returns the decimal digit value of a Unicode character.  The return
2065      value is an integer in the range 0..9, or -1 for characters that do
2066      not represent a decimal digit.
2067
2068 \1f
2069 File: libunistring.info,  Node: Digit value,  Next: Numeric value,  Prev: Decimal digit value,  Up: unictype.h
2070
2071 8.5 Digit value
2072 ===============
2073
2074    Digit characters are like decimal digit characters, possibly in
2075 special forms, like as superscript, subscript, or circled.  The
2076 following function converts a digit character to its numerical value.
2077
2078  -- Function: int uc_digit_value (ucs4_t UC)
2079      Returns the digit value of a Unicode character.  The return value
2080      is an integer in the range 0..9, or -1 for characters that do not
2081      represent a digit.
2082
2083 \1f
2084 File: libunistring.info,  Node: Numeric value,  Next: Mirrored character,  Prev: Digit value,  Up: unictype.h
2085
2086 8.6 Numeric value
2087 =================
2088
2089    There are also characters that represent numbers without a digit
2090 system, like the Roman numerals, and fractional numbers, like 1/4 or
2091 3/4.
2092
2093    The following type represents the numeric value of a Unicode
2094 character.
2095  -- Type: uc_fraction_t
2096      This is a structure type with the following fields:
2097           int numerator;
2098           int denominator;
2099      An integer N is represented by ‘numerator = N’, ‘denominator = 1’.
2100
2101    The following function converts a number character to its numerical
2102 value.
2103
2104  -- Function: uc_fraction_t uc_numeric_value (ucs4_t UC)
2105      Returns the numeric value of a Unicode character.  The return value
2106      is a fraction, or the pseudo-fraction ‘{ 0, 0 }’ for characters
2107      that do not represent a number.
2108
2109 \1f
2110 File: libunistring.info,  Node: Mirrored character,  Next: Arabic shaping,  Prev: Numeric value,  Up: unictype.h
2111
2112 8.7 Mirrored character
2113 ======================
2114
2115    Character mirroring is used to associate the closing parenthesis
2116 character to the opening parenthesis character, the closing brace
2117 character with the opening brace character, and so on.
2118
2119    The following function looks up the mirrored character of a Unicode
2120 character.
2121
2122  -- Function: bool uc_mirror_char (ucs4_t UC, ucs4_t *PUC)
2123      Stores the mirrored character of a Unicode character UC in ‘*PUC’
2124      and returns ‘true’, if it exists.  Otherwise it stores UC
2125      unmodified in ‘*PUC’ and returns ‘false’.
2126
2127 \1f
2128 File: libunistring.info,  Node: Arabic shaping,  Next: Properties,  Prev: Mirrored character,  Up: unictype.h
2129
2130 8.8 Arabic shaping
2131 ==================
2132
2133    When Arabic characters are rendered, after bidi reordering has taken
2134 place, the shape of the glyphs are modified so that many adjacent glyphs
2135 are joined.  Two character properties describe how this “Arabic shaping”
2136 takes place: the joining type and the joining group.
2137
2138 * Menu:
2139
2140 * Joining type::
2141 * Joining group::
2142
2143 \1f
2144 File: libunistring.info,  Node: Joining type,  Next: Joining group,  Up: Arabic shaping
2145
2146 8.8.1 Joining type of Arabic characters
2147 ---------------------------------------
2148
2149    The joining type of a character describes on which of the left and
2150 right neighbour characters the character’s shape depends, and which of
2151 the two neighbour characters are rendered depending on this character.
2152
2153    The joining type has the following possible values:
2154
2155  -- Constant: int UC_JOINING_TYPE_U
2156      “Non joining”: Characters of this joining type prohibit joining.
2157
2158  -- Constant: int UC_JOINING_TYPE_T
2159      “Transparent”: Characters of this joining type are skipped when
2160      considering joining.
2161
2162  -- Constant: int UC_JOINING_TYPE_C
2163      “Join causing”: Characters of this joining type cause their
2164      neighbour characters to change their shapes but don’t change their
2165      own shape.
2166
2167  -- Constant: int UC_JOINING_TYPE_L
2168      “Left joining”: Characters of this joining type have two shapes,
2169      isolated and initial.  Such characters currently don’t exist.
2170
2171  -- Constant: int UC_JOINING_TYPE_R
2172      “Right joining”: Characters of this joining type have two shapes,
2173      isolated and final.
2174
2175  -- Constant: int UC_JOINING_TYPE_D
2176      “Dual joining”: Characters of this joining type have four shapes,
2177      initial, medial, final, and isolated.
2178
2179    The following functions implement the association between a joining
2180 type and its name.
2181
2182  -- Function: const char * uc_joining_type_name (int JOINING_TYPE)
2183      Returns the name of a joining type.
2184
2185  -- Function: const char * uc_joining_type_long_name (int JOINING_TYPE)
2186      Returns the long name of a joining type.
2187
2188  -- Function: int uc_joining_type_byname (const char *JOINING_TYPE_NAME)
2189      Returns the joining type given by name, e.g.  ‘"D"’, or by long
2190      name, e.g.  ‘"Dual Joining’.  This lookup ignores spaces,
2191      underscores, or hyphens as word separators and is
2192      case-insignificant.
2193
2194    The following function gives the joining type of every Unicode
2195 character.
2196
2197  -- Function: int uc_joining_type (ucs4_t UC)
2198      Returns the joining type of a Unicode character.
2199
2200 \1f
2201 File: libunistring.info,  Node: Joining group,  Prev: Joining type,  Up: Arabic shaping
2202
2203 8.8.2 Joining group of Arabic characters
2204 ----------------------------------------
2205
2206    The joining group of a character describes how the character’s shape
2207 is modified in the four contexts of dual-joining characters or in the
2208 two contexts of right-joining characters.
2209
2210    The joining group has the following possible values:
2211
2212  -- Constant: int UC_JOINING_GROUP_NONE
2213  -- Constant: int UC_JOINING_GROUP_AIN
2214  -- Constant: int UC_JOINING_GROUP_ALAPH
2215  -- Constant: int UC_JOINING_GROUP_ALEF
2216  -- Constant: int UC_JOINING_GROUP_BEH
2217  -- Constant: int UC_JOINING_GROUP_BETH
2218  -- Constant: int UC_JOINING_GROUP_BURUSHASKI_YEH_BARREE
2219  -- Constant: int UC_JOINING_GROUP_DAL
2220  -- Constant: int UC_JOINING_GROUP_DALATH_RISH
2221  -- Constant: int UC_JOINING_GROUP_E
2222  -- Constant: int UC_JOINING_GROUP_FARSI_YEH
2223  -- Constant: int UC_JOINING_GROUP_FE
2224  -- Constant: int UC_JOINING_GROUP_FEH
2225  -- Constant: int UC_JOINING_GROUP_FINAL_SEMKATH
2226  -- Constant: int UC_JOINING_GROUP_GAF
2227  -- Constant: int UC_JOINING_GROUP_GAMAL
2228  -- Constant: int UC_JOINING_GROUP_HAH
2229  -- Constant: int UC_JOINING_GROUP_HE
2230  -- Constant: int UC_JOINING_GROUP_HEH
2231  -- Constant: int UC_JOINING_GROUP_HEH_GOAL
2232  -- Constant: int UC_JOINING_GROUP_HETH
2233  -- Constant: int UC_JOINING_GROUP_KAF
2234  -- Constant: int UC_JOINING_GROUP_KAPH
2235  -- Constant: int UC_JOINING_GROUP_KHAPH
2236  -- Constant: int UC_JOINING_GROUP_KNOTTED_HEH
2237  -- Constant: int UC_JOINING_GROUP_LAM
2238  -- Constant: int UC_JOINING_GROUP_LAMADH
2239  -- Constant: int UC_JOINING_GROUP_MEEM
2240  -- Constant: int UC_JOINING_GROUP_MIM
2241  -- Constant: int UC_JOINING_GROUP_NOON
2242  -- Constant: int UC_JOINING_GROUP_NUN
2243  -- Constant: int UC_JOINING_GROUP_NYA
2244  -- Constant: int UC_JOINING_GROUP_PE
2245  -- Constant: int UC_JOINING_GROUP_QAF
2246  -- Constant: int UC_JOINING_GROUP_QAPH
2247  -- Constant: int UC_JOINING_GROUP_REH
2248  -- Constant: int UC_JOINING_GROUP_REVERSED_PE
2249  -- Constant: int UC_JOINING_GROUP_SAD
2250  -- Constant: int UC_JOINING_GROUP_SADHE
2251  -- Constant: int UC_JOINING_GROUP_SEEN
2252  -- Constant: int UC_JOINING_GROUP_SEMKATH
2253  -- Constant: int UC_JOINING_GROUP_SHIN
2254  -- Constant: int UC_JOINING_GROUP_SWASH_KAF
2255  -- Constant: int UC_JOINING_GROUP_SYRIAC_WAW
2256  -- Constant: int UC_JOINING_GROUP_TAH
2257  -- Constant: int UC_JOINING_GROUP_TAW
2258  -- Constant: int UC_JOINING_GROUP_TEH_MARBUTA
2259  -- Constant: int UC_JOINING_GROUP_TEH_MARBUTA_GOAL
2260  -- Constant: int UC_JOINING_GROUP_TETH
2261  -- Constant: int UC_JOINING_GROUP_WAW
2262  -- Constant: int UC_JOINING_GROUP_YEH
2263  -- Constant: int UC_JOINING_GROUP_YEH_BARREE
2264  -- Constant: int UC_JOINING_GROUP_YEH_WITH_TAIL
2265  -- Constant: int UC_JOINING_GROUP_YUDH
2266  -- Constant: int UC_JOINING_GROUP_YUDH_HE
2267  -- Constant: int UC_JOINING_GROUP_ZAIN
2268  -- Constant: int UC_JOINING_GROUP_ZHAIN
2269
2270    The following functions implement the association between a joining
2271 group and its name.
2272
2273  -- Function: const char * uc_joining_group_name (int JOINING_GROUP)
2274      Returns the name of a joining group.
2275
2276  -- Function: int uc_joining_group_byname (const char
2277           *JOINING_GROUP_NAME)
2278      Returns the joining group given by name, e.g.  ‘"Teh_Marbuta"’.
2279      This lookup ignores spaces, underscores, or hyphens as word
2280      separators and is case-insignificant.
2281
2282    The following function gives the joining group of every Unicode
2283 character.
2284
2285  -- Function: int uc_joining_group (ucs4_t UC)
2286      Returns the joining group of a Unicode character.
2287
2288 \1f
2289 File: libunistring.info,  Node: Properties,  Next: Scripts,  Prev: Arabic shaping,  Up: unictype.h
2290
2291 8.9 Properties
2292 ==============
2293
2294    This section defines boolean properties of Unicode characters.  This
2295 means, a character either has the given property or does not have it.
2296 In other words, the property can be viewed as a subset of the set of
2297 Unicode characters.
2298
2299    The GNU libunistring library provides two kinds of API for working
2300 with properties.  The object oriented API uses a type ‘uc_property_t’ to
2301 designate a property.  In the function-based API, which is a bit more
2302 low level, a property is merely a function.
2303
2304 * Menu:
2305
2306 * Properties as objects::
2307 * Properties as functions::
2308
2309 \1f
2310 File: libunistring.info,  Node: Properties as objects,  Next: Properties as functions,  Up: Properties
2311
2312 8.9.1 Properties as objects – the object oriented API
2313 -----------------------------------------------------
2314
2315    The following type designates a property on Unicode characters.
2316
2317  -- Type: uc_property_t
2318      This data type denotes a boolean property on Unicode characters.
2319      It is an immediate type that can be copied by simple assignment,
2320      without involving memory allocation.  It is not an array type.
2321
2322    Many Unicode properties are predefined.
2323
2324    The following are general properties.
2325
2326  -- Constant: uc_property_t UC_PROPERTY_WHITE_SPACE
2327  -- Constant: uc_property_t UC_PROPERTY_ALPHABETIC
2328  -- Constant: uc_property_t UC_PROPERTY_OTHER_ALPHABETIC
2329  -- Constant: uc_property_t UC_PROPERTY_NOT_A_CHARACTER
2330  -- Constant: uc_property_t UC_PROPERTY_DEFAULT_IGNORABLE_CODE_POINT
2331  -- Constant: uc_property_t
2332           UC_PROPERTY_OTHER_DEFAULT_IGNORABLE_CODE_POINT
2333  -- Constant: uc_property_t UC_PROPERTY_DEPRECATED
2334  -- Constant: uc_property_t UC_PROPERTY_LOGICAL_ORDER_EXCEPTION
2335  -- Constant: uc_property_t UC_PROPERTY_VARIATION_SELECTOR
2336  -- Constant: uc_property_t UC_PROPERTY_PRIVATE_USE
2337  -- Constant: uc_property_t UC_PROPERTY_UNASSIGNED_CODE_VALUE
2338
2339    The following properties are related to case folding.
2340
2341  -- Constant: uc_property_t UC_PROPERTY_UPPERCASE
2342  -- Constant: uc_property_t UC_PROPERTY_OTHER_UPPERCASE
2343  -- Constant: uc_property_t UC_PROPERTY_LOWERCASE
2344  -- Constant: uc_property_t UC_PROPERTY_OTHER_LOWERCASE
2345  -- Constant: uc_property_t UC_PROPERTY_TITLECASE
2346  -- Constant: uc_property_t UC_PROPERTY_CASED
2347  -- Constant: uc_property_t UC_PROPERTY_CASE_IGNORABLE
2348  -- Constant: uc_property_t UC_PROPERTY_CHANGES_WHEN_LOWERCASED
2349  -- Constant: uc_property_t UC_PROPERTY_CHANGES_WHEN_UPPERCASED
2350  -- Constant: uc_property_t UC_PROPERTY_CHANGES_WHEN_TITLECASED
2351  -- Constant: uc_property_t UC_PROPERTY_CHANGES_WHEN_CASEFOLDED
2352  -- Constant: uc_property_t UC_PROPERTY_CHANGES_WHEN_CASEMAPPED
2353  -- Constant: uc_property_t UC_PROPERTY_SOFT_DOTTED
2354
2355    The following properties are related to identifiers.
2356
2357  -- Constant: uc_property_t UC_PROPERTY_ID_START
2358  -- Constant: uc_property_t UC_PROPERTY_OTHER_ID_START
2359  -- Constant: uc_property_t UC_PROPERTY_ID_CONTINUE
2360  -- Constant: uc_property_t UC_PROPERTY_OTHER_ID_CONTINUE
2361  -- Constant: uc_property_t UC_PROPERTY_XID_START
2362  -- Constant: uc_property_t UC_PROPERTY_XID_CONTINUE
2363  -- Constant: uc_property_t UC_PROPERTY_PATTERN_WHITE_SPACE
2364  -- Constant: uc_property_t UC_PROPERTY_PATTERN_SYNTAX
2365
2366    The following properties have an influence on shaping and rendering.
2367
2368  -- Constant: uc_property_t UC_PROPERTY_JOIN_CONTROL
2369  -- Constant: uc_property_t UC_PROPERTY_GRAPHEME_BASE
2370  -- Constant: uc_property_t UC_PROPERTY_GRAPHEME_EXTEND
2371  -- Constant: uc_property_t UC_PROPERTY_OTHER_GRAPHEME_EXTEND
2372  -- Constant: uc_property_t UC_PROPERTY_GRAPHEME_LINK
2373
2374    The following properties relate to bidirectional reordering.
2375
2376  -- Constant: uc_property_t UC_PROPERTY_BIDI_CONTROL
2377  -- Constant: uc_property_t UC_PROPERTY_BIDI_LEFT_TO_RIGHT
2378  -- Constant: uc_property_t UC_PROPERTY_BIDI_HEBREW_RIGHT_TO_LEFT
2379  -- Constant: uc_property_t UC_PROPERTY_BIDI_ARABIC_RIGHT_TO_LEFT
2380  -- Constant: uc_property_t UC_PROPERTY_BIDI_EUROPEAN_DIGIT
2381  -- Constant: uc_property_t UC_PROPERTY_BIDI_EUR_NUM_SEPARATOR
2382  -- Constant: uc_property_t UC_PROPERTY_BIDI_EUR_NUM_TERMINATOR
2383  -- Constant: uc_property_t UC_PROPERTY_BIDI_ARABIC_DIGIT
2384  -- Constant: uc_property_t UC_PROPERTY_BIDI_COMMON_SEPARATOR
2385  -- Constant: uc_property_t UC_PROPERTY_BIDI_BLOCK_SEPARATOR
2386  -- Constant: uc_property_t UC_PROPERTY_BIDI_SEGMENT_SEPARATOR
2387  -- Constant: uc_property_t UC_PROPERTY_BIDI_WHITESPACE
2388  -- Constant: uc_property_t UC_PROPERTY_BIDI_NON_SPACING_MARK
2389  -- Constant: uc_property_t UC_PROPERTY_BIDI_BOUNDARY_NEUTRAL
2390  -- Constant: uc_property_t UC_PROPERTY_BIDI_PDF
2391  -- Constant: uc_property_t UC_PROPERTY_BIDI_EMBEDDING_OR_OVERRIDE
2392  -- Constant: uc_property_t UC_PROPERTY_BIDI_OTHER_NEUTRAL
2393
2394    The following properties deal with number representations.
2395
2396  -- Constant: uc_property_t UC_PROPERTY_HEX_DIGIT
2397  -- Constant: uc_property_t UC_PROPERTY_ASCII_HEX_DIGIT
2398
2399    The following properties deal with CJK.
2400
2401  -- Constant: uc_property_t UC_PROPERTY_IDEOGRAPHIC
2402  -- Constant: uc_property_t UC_PROPERTY_UNIFIED_IDEOGRAPH
2403  -- Constant: uc_property_t UC_PROPERTY_RADICAL
2404  -- Constant: uc_property_t UC_PROPERTY_IDS_BINARY_OPERATOR
2405  -- Constant: uc_property_t UC_PROPERTY_IDS_TRINARY_OPERATOR
2406
2407    Other miscellaneous properties are:
2408
2409  -- Constant: uc_property_t UC_PROPERTY_ZERO_WIDTH
2410  -- Constant: uc_property_t UC_PROPERTY_SPACE
2411  -- Constant: uc_property_t UC_PROPERTY_NON_BREAK
2412  -- Constant: uc_property_t UC_PROPERTY_ISO_CONTROL
2413  -- Constant: uc_property_t UC_PROPERTY_FORMAT_CONTROL
2414  -- Constant: uc_property_t UC_PROPERTY_DASH
2415  -- Constant: uc_property_t UC_PROPERTY_HYPHEN
2416  -- Constant: uc_property_t UC_PROPERTY_PUNCTUATION
2417  -- Constant: uc_property_t UC_PROPERTY_LINE_SEPARATOR
2418  -- Constant: uc_property_t UC_PROPERTY_PARAGRAPH_SEPARATOR
2419  -- Constant: uc_property_t UC_PROPERTY_QUOTATION_MARK
2420  -- Constant: uc_property_t UC_PROPERTY_SENTENCE_TERMINAL
2421  -- Constant: uc_property_t UC_PROPERTY_TERMINAL_PUNCTUATION
2422  -- Constant: uc_property_t UC_PROPERTY_CURRENCY_SYMBOL
2423  -- Constant: uc_property_t UC_PROPERTY_MATH
2424  -- Constant: uc_property_t UC_PROPERTY_OTHER_MATH
2425  -- Constant: uc_property_t UC_PROPERTY_PAIRED_PUNCTUATION
2426  -- Constant: uc_property_t UC_PROPERTY_LEFT_OF_PAIR
2427  -- Constant: uc_property_t UC_PROPERTY_COMBINING
2428  -- Constant: uc_property_t UC_PROPERTY_COMPOSITE
2429  -- Constant: uc_property_t UC_PROPERTY_DECIMAL_DIGIT
2430  -- Constant: uc_property_t UC_PROPERTY_NUMERIC
2431  -- Constant: uc_property_t UC_PROPERTY_DIACRITIC
2432  -- Constant: uc_property_t UC_PROPERTY_EXTENDER
2433  -- Constant: uc_property_t UC_PROPERTY_IGNORABLE_CONTROL
2434
2435    The following function looks up a property by its name.
2436
2437  -- Function: uc_property_t uc_property_byname (const char
2438           *PROPERTY_NAME)
2439      Returns the property given by name, e.g.  ‘"White space"’.  If a
2440      property with the given name exists, the result will satisfy the
2441      ‘uc_property_is_valid’ predicate.  Otherwise the result will not
2442      satisfy this predicate and must not be passed to functions that
2443      expect an ‘uc_property_t’ argument.
2444
2445      This lookup ignores spaces, underscores, or hyphens as word
2446      separators, is case-insignificant, and supports the aliases listed
2447      in Unicode’s ‘PropertyAliases.txt’ file.
2448
2449      This function references a big table of all predefined properties.
2450      Its use can significantly increase the size of your application.
2451
2452  -- Function: bool uc_property_is_valid (uc_property_t property)
2453      Returns ‘true’ when the given property is valid, or ‘false’
2454      otherwise.
2455
2456    The following function views a property as a set of Unicode
2457 characters.
2458
2459  -- Function: bool uc_is_property (ucs4_t UC, uc_property_t PROPERTY)
2460      Tests whether the Unicode character UC has the given property.
2461
2462 \1f
2463 File: libunistring.info,  Node: Properties as functions,  Prev: Properties as objects,  Up: Properties
2464
2465 8.9.2 Properties as functions – the functional API
2466 --------------------------------------------------
2467
2468    The following are general properties.
2469
2470  -- Function: bool uc_is_property_white_space (ucs4_t UC)
2471  -- Function: bool uc_is_property_alphabetic (ucs4_t UC)
2472  -- Function: bool uc_is_property_other_alphabetic (ucs4_t UC)
2473  -- Function: bool uc_is_property_not_a_character (ucs4_t UC)
2474  -- Function: bool uc_is_property_default_ignorable_code_point (ucs4_t
2475           UC)
2476  -- Function: bool uc_is_property_other_default_ignorable_code_point
2477           (ucs4_t UC)
2478  -- Function: bool uc_is_property_deprecated (ucs4_t UC)
2479  -- Function: bool uc_is_property_logical_order_exception (ucs4_t UC)
2480  -- Function: bool uc_is_property_variation_selector (ucs4_t UC)
2481  -- Function: bool uc_is_property_private_use (ucs4_t UC)
2482  -- Function: bool uc_is_property_unassigned_code_value (ucs4_t UC)
2483
2484    The following properties are related to case folding.
2485
2486  -- Function: bool uc_is_property_uppercase (ucs4_t UC)
2487  -- Function: bool uc_is_property_other_uppercase (ucs4_t UC)
2488  -- Function: bool uc_is_property_lowercase (ucs4_t UC)
2489  -- Function: bool uc_is_property_other_lowercase (ucs4_t UC)
2490  -- Function: bool uc_is_property_titlecase (ucs4_t UC)
2491  -- Function: bool uc_is_property_cased (ucs4_t UC)
2492  -- Function: bool uc_is_property_case_ignorable (ucs4_t UC)
2493  -- Function: bool uc_is_property_changes_when_lowercased (ucs4_t UC)
2494  -- Function: bool uc_is_property_changes_when_uppercased (ucs4_t UC)
2495  -- Function: bool uc_is_property_changes_when_titlecased (ucs4_t UC)
2496  -- Function: bool uc_is_property_changes_when_casefolded (ucs4_t UC)
2497  -- Function: bool uc_is_property_changes_when_casemapped (ucs4_t UC)
2498  -- Function: bool uc_is_property_soft_dotted (ucs4_t UC)
2499
2500    The following properties are related to identifiers.
2501
2502  -- Function: bool uc_is_property_id_start (ucs4_t UC)
2503  -- Function: bool uc_is_property_other_id_start (ucs4_t UC)
2504  -- Function: bool uc_is_property_id_continue (ucs4_t UC)
2505  -- Function: bool uc_is_property_other_id_continue (ucs4_t UC)
2506  -- Function: bool uc_is_property_xid_start (ucs4_t UC)
2507  -- Function: bool uc_is_property_xid_continue (ucs4_t UC)
2508  -- Function: bool uc_is_property_pattern_white_space (ucs4_t UC)
2509  -- Function: bool uc_is_property_pattern_syntax (ucs4_t UC)
2510
2511    The following properties have an influence on shaping and rendering.
2512
2513  -- Function: bool uc_is_property_join_control (ucs4_t UC)
2514  -- Function: bool uc_is_property_grapheme_base (ucs4_t UC)
2515  -- Function: bool uc_is_property_grapheme_extend (ucs4_t UC)
2516  -- Function: bool uc_is_property_other_grapheme_extend (ucs4_t UC)
2517  -- Function: bool uc_is_property_grapheme_link (ucs4_t UC)
2518
2519    The following properties relate to bidirectional reordering.
2520
2521  -- Function: bool uc_is_property_bidi_control (ucs4_t UC)
2522  -- Function: bool uc_is_property_bidi_left_to_right (ucs4_t UC)
2523  -- Function: bool uc_is_property_bidi_hebrew_right_to_left (ucs4_t UC)
2524  -- Function: bool uc_is_property_bidi_arabic_right_to_left (ucs4_t UC)
2525  -- Function: bool uc_is_property_bidi_european_digit (ucs4_t UC)
2526  -- Function: bool uc_is_property_bidi_eur_num_separator (ucs4_t UC)
2527  -- Function: bool uc_is_property_bidi_eur_num_terminator (ucs4_t UC)
2528  -- Function: bool uc_is_property_bidi_arabic_digit (ucs4_t UC)
2529  -- Function: bool uc_is_property_bidi_common_separator (ucs4_t UC)
2530  -- Function: bool uc_is_property_bidi_block_separator (ucs4_t UC)
2531  -- Function: bool uc_is_property_bidi_segment_separator (ucs4_t UC)
2532  -- Function: bool uc_is_property_bidi_whitespace (ucs4_t UC)
2533  -- Function: bool uc_is_property_bidi_non_spacing_mark (ucs4_t UC)
2534  -- Function: bool uc_is_property_bidi_boundary_neutral (ucs4_t UC)
2535  -- Function: bool uc_is_property_bidi_pdf (ucs4_t UC)
2536  -- Function: bool uc_is_property_bidi_embedding_or_override (ucs4_t UC)
2537  -- Function: bool uc_is_property_bidi_other_neutral (ucs4_t UC)
2538
2539    The following properties deal with number representations.
2540
2541  -- Function: bool uc_is_property_hex_digit (ucs4_t UC)
2542  -- Function: bool uc_is_property_ascii_hex_digit (ucs4_t UC)
2543
2544    The following properties deal with CJK.
2545
2546  -- Function: bool uc_is_property_ideographic (ucs4_t UC)
2547  -- Function: bool uc_is_property_unified_ideograph (ucs4_t UC)
2548  -- Function: bool uc_is_property_radical (ucs4_t UC)
2549  -- Function: bool uc_is_property_ids_binary_operator (ucs4_t UC)
2550  -- Function: bool uc_is_property_ids_trinary_operator (ucs4_t UC)
2551
2552    Other miscellaneous properties are:
2553
2554  -- Function: bool uc_is_property_zero_width (ucs4_t UC)
2555  -- Function: bool uc_is_property_space (ucs4_t UC)
2556  -- Function: bool uc_is_property_non_break (ucs4_t UC)
2557  -- Function: bool uc_is_property_iso_control (ucs4_t UC)
2558  -- Function: bool uc_is_property_format_control (ucs4_t UC)
2559  -- Function: bool uc_is_property_dash (ucs4_t UC)
2560  -- Function: bool uc_is_property_hyphen (ucs4_t UC)
2561  -- Function: bool uc_is_property_punctuation (ucs4_t UC)
2562  -- Function: bool uc_is_property_line_separator (ucs4_t UC)
2563  -- Function: bool uc_is_property_paragraph_separator (ucs4_t UC)
2564  -- Function: bool uc_is_property_quotation_mark (ucs4_t UC)
2565  -- Function: bool uc_is_property_sentence_terminal (ucs4_t UC)
2566  -- Function: bool uc_is_property_terminal_punctuation (ucs4_t UC)
2567  -- Function: bool uc_is_property_currency_symbol (ucs4_t UC)
2568  -- Function: bool uc_is_property_math (ucs4_t UC)
2569  -- Function: bool uc_is_property_other_math (ucs4_t UC)
2570  -- Function: bool uc_is_property_paired_punctuation (ucs4_t UC)
2571  -- Function: bool uc_is_property_left_of_pair (ucs4_t UC)
2572  -- Function: bool uc_is_property_combining (ucs4_t UC)
2573  -- Function: bool uc_is_property_composite (ucs4_t UC)
2574  -- Function: bool uc_is_property_decimal_digit (ucs4_t UC)
2575  -- Function: bool uc_is_property_numeric (ucs4_t UC)
2576  -- Function: bool uc_is_property_diacritic (ucs4_t UC)
2577  -- Function: bool uc_is_property_extender (ucs4_t UC)
2578  -- Function: bool uc_is_property_ignorable_control (ucs4_t UC)
2579
2580 \1f
2581 File: libunistring.info,  Node: Scripts,  Next: Blocks,  Prev: Properties,  Up: unictype.h
2582
2583 8.10 Scripts
2584 ============
2585
2586    The Unicode characters are subdivided into scripts.
2587
2588    The following type is used to represent a script:
2589
2590  -- Type: uc_script_t
2591      This data type is a structure type that refers to statically
2592      allocated read-only data.  It contains the following fields:
2593           const char *name;
2594
2595      The ‘name’ field contains the name of the script.
2596
2597    The following functions look up a script.
2598
2599  -- Function: const uc_script_t * uc_script (ucs4_t UC)
2600      Returns the script of a Unicode character.  Returns NULL if UC does
2601      not belong to any script.
2602
2603  -- Function: const uc_script_t * uc_script_byname (const char
2604           *SCRIPT_NAME)
2605      Returns the script given by its name, e.g.  ‘"HAN"’.  Returns NULL
2606      if a script with the given name does not exist.
2607
2608    The following function views a script as a set of Unicode characters.
2609
2610  -- Function: bool uc_is_script (ucs4_t UC, const uc_script_t *SCRIPT)
2611      Tests whether a Unicode character belongs to a given script.
2612
2613    The following gives a global picture of all scripts.
2614
2615  -- Function: void uc_all_scripts (const uc_script_t **SCRIPTS, size_t
2616           *COUNT)
2617      Get the list of all scripts.  Stores a pointer to an array of all
2618      scripts in ‘*SCRIPTS’ and the length of this array in ‘*COUNT’.
2619
2620 \1f
2621 File: libunistring.info,  Node: Blocks,  Next: ISO C and Java syntax,  Prev: Scripts,  Up: unictype.h
2622
2623 8.11 Blocks
2624 ===========
2625
2626    The Unicode characters are subdivided into blocks.  A block is an
2627 interval of Unicode code points.
2628
2629    The following type is used to represent a block.
2630
2631  -- Type: uc_block_t
2632      This data type is a structure type that refers to statically
2633      allocated data.  It contains the following fields:
2634           ucs4_t start;
2635           ucs4_t end;
2636           const char *name;
2637
2638      The ‘start’ field is the first Unicode code point in the block.
2639
2640      The ‘end’ field is the last Unicode code point in the block.
2641
2642      The ‘name’ field is the name of the block.
2643
2644    The following function looks up a block.
2645
2646  -- Function: const uc_block_t * uc_block (ucs4_t UC)
2647      Returns the block a character belongs to.
2648
2649    The following function views a block as a set of Unicode characters.
2650
2651  -- Function: bool uc_is_block (ucs4_t UC, const uc_block_t *BLOCK)
2652      Tests whether a Unicode character belongs to a given block.
2653
2654    The following gives a global picture of all block.
2655
2656  -- Function: void uc_all_blocks (const uc_block_t **BLOCKS, size_t
2657           *COUNT)
2658      Get the list of all blocks.  Stores a pointer to an array of all
2659      blocks in ‘*BLOCKS’ and the length of this array in ‘*COUNT’.
2660
2661 \1f
2662 File: libunistring.info,  Node: ISO C and Java syntax,  Next: Classifications like in ISO C,  Prev: Blocks,  Up: unictype.h
2663
2664 8.12 ISO C and Java syntax
2665 ==========================
2666
2667    The following properties are taken from language standards.  The
2668 supported language standards are ISO C 99 and Java.
2669
2670  -- Function: bool uc_is_c_whitespace (ucs4_t UC)
2671      Tests whether a Unicode character is considered whitespace in ISO C
2672      99.
2673
2674  -- Function: bool uc_is_java_whitespace (ucs4_t UC)
2675      Tests whether a Unicode character is considered whitespace in Java.
2676
2677    The following enumerated values are the possible return values of the
2678 functions ‘uc_c_ident_category’ and ‘uc_java_ident_category’.
2679
2680  -- Constant: int UC_IDENTIFIER_START
2681      This return value means that the given character is valid as first
2682      or subsequent character in an identifier.
2683
2684  -- Constant: int UC_IDENTIFIER_VALID
2685      This return value means that the given character is valid as
2686      subsequent character only.
2687
2688  -- Constant: int UC_IDENTIFIER_INVALID
2689      This return value means that the given character is not valid in an
2690      identifier.
2691
2692  -- Constant: int UC_IDENTIFIER_IGNORABLE
2693      This return value (only for Java) means that the given character is
2694      ignorable.
2695
2696    The following function determine whether a given character can be a
2697 constituent of an identifier in the given programming language.
2698
2699  -- Function: int uc_c_ident_category (ucs4_t UC)
2700      Returns the categorization of a Unicode character with respect to
2701      the ISO C 99 identifier syntax.
2702
2703  -- Function: int uc_java_ident_category (ucs4_t UC)
2704      Returns the categorization of a Unicode character with respect to
2705      the Java identifier syntax.
2706
2707 \1f
2708 File: libunistring.info,  Node: Classifications like in ISO C,  Prev: ISO C and Java syntax,  Up: unictype.h
2709
2710 8.13 Classifications like in ISO C
2711 ==================================
2712
2713    The following character classifications mimic those declared in the
2714 ISO C header files ‘<ctype.h>’ and ‘<wctype.h>’.  These functions are
2715 deprecated, because this set of functions was designed with ASCII in
2716 mind and cannot reflect the more diverse reality of the Unicode
2717 character set.  But they can be a quick-and-dirty porting aid when
2718 migrating from ‘wchar_t’ APIs to Unicode strings.
2719
2720  -- Function: bool uc_is_alnum (ucs4_t UC)
2721      Tests for any character for which ‘uc_is_alpha’ or ‘uc_is_digit’ is
2722      true.
2723
2724  -- Function: bool uc_is_alpha (ucs4_t UC)
2725      Tests for any character for which ‘uc_is_upper’ or ‘uc_is_lower’ is
2726      true, or any character that is one of a locale-specific set of
2727      characters for which none of ‘uc_is_cntrl’, ‘uc_is_digit’,
2728      ‘uc_is_punct’, or ‘uc_is_space’ is true.
2729
2730  -- Function: bool uc_is_cntrl (ucs4_t UC)
2731      Tests for any control character.
2732
2733  -- Function: bool uc_is_digit (ucs4_t UC)
2734      Tests for any character that corresponds to a decimal-digit
2735      character.
2736
2737  -- Function: bool uc_is_graph (ucs4_t UC)
2738      Tests for any character for which ‘uc_is_print’ is true and
2739      ‘uc_is_space’ is false.
2740
2741  -- Function: bool uc_is_lower (ucs4_t UC)
2742      Tests for any character that corresponds to a lowercase letter or
2743      is one of a locale-specific set of characters for which none of
2744      ‘uc_is_cntrl’, ‘uc_is_digit’, ‘uc_is_punct’, or ‘uc_is_space’ is
2745      true.
2746
2747  -- Function: bool uc_is_print (ucs4_t UC)
2748      Tests for any printing character.
2749
2750  -- Function: bool uc_is_punct (ucs4_t UC)
2751      Tests for any printing character that is one of a locale-specific
2752      set of characters for which neither ‘uc_is_space’ nor ‘uc_is_alnum’
2753      is true.
2754
2755  -- Function: bool uc_is_space (ucs4_t UC)
2756      Test for any character that corresponds to a locale-specific set of
2757      characters for which none of ‘uc_is_alnum’, ‘uc_is_graph’, or
2758      ‘uc_is_punct’ is true.
2759
2760  -- Function: bool uc_is_upper (ucs4_t UC)
2761      Tests for any character that corresponds to an uppercase letter or
2762      is one of a locale-specific set of characters for which none of
2763      ‘uc_is_cntrl’, ‘uc_is_digit’, ‘uc_is_punct’, or ‘uc_is_space’ is
2764      true.
2765
2766  -- Function: bool uc_is_xdigit (ucs4_t UC)
2767      Tests for any character that corresponds to a hexadecimal-digit
2768      character.
2769
2770  -- Function: bool uc_is_blank (ucs4_t UC)
2771      Tests for any character that corresponds to a standard blank
2772      character or a locale-specific set of characters for which
2773      ‘uc_is_alnum’ is false.
2774
2775 \1f
2776 File: libunistring.info,  Node: uniwidth.h,  Next: unigbrk.h,  Prev: unictype.h,  Up: Top
2777
2778 9 Display width ‘<uniwidth.h>’
2779 ******************************
2780
2781    This include file declares functions that return the display width,
2782 measured in columns, of characters or strings, when output to a device
2783 that uses non-proportional fonts.
2784
2785    Note that for some rarely used characters the actual fonts or
2786 terminal emulators can use a different width.  There is no mechanism for
2787 communicating the display width of characters across a Unix
2788 pseudo-terminal (tty).  Also, there are scripts with complex rendering,
2789 like the Indic scripts.  For these scripts, there is no such concept as
2790 non-proportional fonts.  Therefore the results of these functions
2791 usually work fine on most scripts and on most characters but can fail to
2792 represent the actual display width.
2793
2794    These functions are locale dependent.  The ENCODING argument
2795 identifies the encoding (e.g.  ‘"ISO-8859-2"’ for Polish).
2796
2797  -- Function: int uc_width (ucs4_t UC, const char *ENCODING)
2798      Determines and returns the number of column positions required for
2799      UC.  Returns -1 if UC is a control character that has an influence
2800      on the column position when output.
2801
2802  -- Function: int u8_width (const uint8_t *S, size_t N, const char
2803           *ENCODING)
2804  -- Function: int u16_width (const uint16_t *S, size_t N, const char
2805           *ENCODING)
2806  -- Function: int u32_width (const uint32_t *S, size_t N, const char
2807           *ENCODING)
2808      Determines and returns the number of column positions required for
2809      first N units (or fewer if S ends before this) in S.  This function
2810      ignores control characters in the string.
2811
2812  -- Function: int u8_strwidth (const uint8_t *S, const char *ENCODING)
2813  -- Function: int u16_strwidth (const uint16_t *S, const char *ENCODING)
2814  -- Function: int u32_strwidth (const uint32_t *S, const char *ENCODING)
2815      Determines and returns the number of column positions required for
2816      S.  This function ignores control characters in the string.
2817
2818 \1f
2819 File: libunistring.info,  Node: unigbrk.h,  Next: uniwbrk.h,  Prev: uniwidth.h,  Up: Top
2820
2821 10 Grapheme cluster breaks in strings ‘<unigbrk.h>’
2822 ***************************************************
2823
2824    This include file declares functions for determining where in a
2825 string “grapheme clusters” start and end.  A “grapheme cluster” is an
2826 approximation to a user-perceived character, which sometimes corresponds
2827 to multiple Unicode characters.  Editing operations such as mouse
2828 selection, cursor movement, and backspacing often operate on grapheme
2829 clusters as units, not on individual characters.
2830
2831    Some grapheme clusters are built from a base character and a
2832 combining character.  The letter ‘é’, for example, is most commonly
2833 represented in Unicode as a single character U+00E8 LATIN SMALL LETTER E
2834 WITH ACUTE. It is, however, equally valid to use the pair of characters
2835 U+0065 LATIN SMALL LETTER E followed by U+0301 COMBINING ACUTE ACCENT.
2836 Since the user would perceive this pair of characters as a single
2837 character, they would be grouped into a single grapheme cluster.
2838
2839    But there are also grapheme clusters that consist of several base
2840 characters.  For example, a Devanagari letter and a Devanagari vowel
2841 sign that follows it may form a grapheme cluster.  Similarly, some pairs
2842 of Thai characters and Hangul syllables (formed by two or three Hangul
2843 characters) are grapheme clusters.
2844
2845 * Menu:
2846
2847 * Grapheme cluster breaks in a string::
2848 * Grapheme cluster break property::
2849
2850 \1f
2851 File: libunistring.info,  Node: Grapheme cluster breaks in a string,  Next: Grapheme cluster break property,  Up: unigbrk.h
2852
2853 10.1 Grapheme cluster breaks in a string
2854 ========================================
2855
2856    The following functions find a single boundary between grapheme
2857 clusters in a string.
2858
2859  -- Function: void u8_grapheme_next (const uint8_t *S, const uint8_t
2860           *END)
2861  -- Function: void u16_grapheme_next (const uint16_t *S, const uint16_t
2862           *END)
2863  -- Function: void u32_grapheme_next (const uint32_t *S, const uint32_t
2864           *END)
2865      Returns the start of the next grapheme cluster following S, or END
2866      if no grapheme cluster break is encountered before it.  Returns
2867      NULL if and only if ‘S == END’.
2868
2869  -- Function: void u8_grapheme_prev (const uint8_t *S, const uint8_t
2870           *START)
2871  -- Function: void u16_grapheme_prev (const uint16_t *S, const uint16_t
2872           *START)
2873  -- Function: void u32_grapheme_prev (const uint32_t *S, const uint32_t
2874           *START)
2875      Returns the start of the grapheme cluster preceding S, or START if
2876      no grapheme cluster break is encountered before it.  Returns NULL
2877      if and only if ‘S == START’.
2878
2879    The following functions determine all of the grapheme cluster
2880 boundaries in a string.
2881
2882  -- Function: void u8_grapheme_breaks (const uint8_t *S, size_t N, char
2883           *P)
2884  -- Function: void u16_grapheme_breaks (const uint16_t *S, size_t N,
2885           char *P)
2886  -- Function: void u32_grapheme_breaks (const uint32_t *S, size_t N,
2887           char *P)
2888  -- Function: void ulc_grapheme_breaks (const char *S, size_t N, char
2889           *P)
2890      Determines the grapheme cluster break points in S, an array of N
2891      units, and stores the result at ‘P[0..N-1]’.
2892      ‘P[i] = 1’
2893           means that there is a grapheme cluster boundary between
2894           ‘S[i-1]’ and ‘S[i]’.
2895      ‘P[i] = 0’
2896           means that ‘S[i-1]’ and ‘S[i]’ are part of the same grapheme
2897           cluster.
2898      ‘P[0]’ is always set to 1, because there is always a grapheme
2899      cluster break at start of text.
2900
2901 \1f
2902 File: libunistring.info,  Node: Grapheme cluster break property,  Prev: Grapheme cluster breaks in a string,  Up: unigbrk.h
2903
2904 10.2 Grapheme cluster break property
2905 ====================================
2906
2907    This is a more low-level API. The grapheme cluster break property is
2908 a property defined in Unicode Standard Annex #29, section “Grapheme
2909 Cluster Boundaries”, see
2910 <http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries>.  It
2911 is used for determining the grapheme cluster breaks in a string.
2912
2913    The following are the possible values of the grapheme cluster break
2914 property.  More values may be added in the future.
2915
2916  -- Constant: int GBP_OTHER
2917  -- Constant: int GBP_CR
2918  -- Constant: int GBP_LF
2919  -- Constant: int GBP_CONTROL
2920  -- Constant: int GBP_EXTEND
2921  -- Constant: int GBP_PREPEND
2922  -- Constant: int GBP_SPACINGMARK
2923  -- Constant: int GBP_L
2924  -- Constant: int GBP_V
2925  -- Constant: int GBP_T
2926  -- Constant: int GBP_LV
2927  -- Constant: int GBP_LVT
2928
2929    The following function looks up the grapheme cluster break property
2930 of a character.
2931
2932  -- Function: int uc_graphemeclusterbreak_property (ucs4_t UC)
2933      Returns the Grapheme_Cluster_Break property of a Unicode character.
2934
2935    The following function determines whether there is a grapheme cluster
2936 break between two Unicode characters.  It is the primitive upon which
2937 the higher-level functions in the previous section are directly based.
2938
2939  -- Function: bool uc_is_grapheme_break (ucs4_t A, ucs4_t B)
2940      Returns true if there is an grapheme cluster boundary between
2941      Unicode characters A and B.
2942
2943      There is always a grapheme cluster break at the start or end of
2944      text.  You can specify zero for A or B to indicate start of text or
2945      end of text, respectively.
2946
2947      This implements the extended (not legacy) grapheme cluster rules
2948      described in the Unicode standard, because the standard says that
2949      they are preferred.
2950
2951 \1f
2952 File: libunistring.info,  Node: uniwbrk.h,  Next: unilbrk.h,  Prev: unigbrk.h,  Up: Top
2953
2954 11 Word breaks in strings ‘<uniwbrk.h>’
2955 ***************************************
2956
2957    This include file declares functions for determining where in a
2958 string “words” start and end.  Here “words” are not necessarily the same
2959 as entities that can be looked up in dictionaries, but rather groups of
2960 consecutive characters that should not be split by text processing
2961 operations.
2962
2963 * Menu:
2964
2965 * Word breaks in a string::
2966 * Word break property::
2967
2968 \1f
2969 File: libunistring.info,  Node: Word breaks in a string,  Next: Word break property,  Up: uniwbrk.h
2970
2971 11.1 Word breaks in a string
2972 ============================
2973
2974    The following functions determine the word breaks in a string.
2975
2976  -- Function: void u8_wordbreaks (const uint8_t *S, size_t N, char *P)
2977  -- Function: void u16_wordbreaks (const uint16_t *S, size_t N, char *P)
2978  -- Function: void u32_wordbreaks (const uint32_t *S, size_t N, char *P)
2979  -- Function: void ulc_wordbreaks (const char *S, size_t N, char *P)
2980      Determines the word break points in S, an array of N units, and
2981      stores the result at ‘P[0..N-1]’.
2982      ‘P[i] = 1’
2983           means that there is a word boundary between ‘S[i-1]’ and
2984           ‘S[i]’.
2985      ‘P[i] = 0’
2986           means that ‘S[i-1]’ and ‘S[i]’ must not be separated.
2987      ‘P[0]’ is always set to 0.  If an application wants to consider a
2988      word break to be present at the beginning of the string (before
2989      ‘S[0]’) or at the end of the string (after ‘S[0..N-1]’), it has to
2990      treat these cases explicitly.
2991
2992 \1f
2993 File: libunistring.info,  Node: Word break property,  Prev: Word breaks in a string,  Up: uniwbrk.h
2994
2995 11.2 Word break property
2996 ========================
2997
2998    This is a more low-level API. The word break property is a property
2999 defined in Unicode Standard Annex #29, section “Word Boundaries”, see
3000 <http://www.unicode.org/reports/tr29/#Word_Boundaries>.  It is used for
3001 determining the word breaks in a string.
3002
3003    The following are the possible values of the word break property.
3004 More values may be added in the future.
3005
3006  -- Constant: int WBP_OTHER
3007  -- Constant: int WBP_CR
3008  -- Constant: int WBP_LF
3009  -- Constant: int WBP_NEWLINE
3010  -- Constant: int WBP_EXTEND
3011  -- Constant: int WBP_FORMAT
3012  -- Constant: int WBP_KATAKANA
3013  -- Constant: int WBP_ALETTER
3014  -- Constant: int WBP_MIDNUMLET
3015  -- Constant: int WBP_MIDLETTER
3016  -- Constant: int WBP_MIDNUM
3017  -- Constant: int WBP_NUMERIC
3018  -- Constant: int WBP_EXTENDNUMLET
3019
3020    The following function looks up the word break property of a
3021 character.
3022
3023  -- Function: int uc_wordbreak_property (ucs4_t UC)
3024      Returns the Word_Break property of a Unicode character.
3025
3026 \1f
3027 File: libunistring.info,  Node: unilbrk.h,  Next: uninorm.h,  Prev: uniwbrk.h,  Up: Top
3028
3029 12 Line breaking ‘<unilbrk.h>’
3030 ******************************
3031
3032    This include file declares functions for determining where in a
3033 string line breaks could or should be introduced, in order to make the
3034 displayed string fit into a column of given width.
3035
3036    These functions are locale dependent.  The ENCODING argument
3037 identifies the encoding (e.g.  ‘"ISO-8859-2"’ for Polish).
3038
3039    The following enumerated values indicate whether, at a given
3040 position, a line break is possible or not.  Given an string S as an
3041 array ‘S[0..N-1]’ and a position I, the values have the following
3042 meanings:
3043
3044  -- Constant: int UC_BREAK_MANDATORY
3045      This value indicates that ‘S[I]’ is a line break character.
3046
3047  -- Constant: int UC_BREAK_POSSIBLE
3048      This value indicates that a line break may be inserted between
3049      ‘S[I-1]’ and ‘S[I]’.
3050
3051  -- Constant: int UC_BREAK_HYPHENATION
3052      This value indicates that a hyphen and a line break may be inserted
3053      between ‘S[I-1]’ and ‘S[I]’.  But beware of language dependent
3054      hyphenation rules.
3055
3056  -- Constant: int UC_BREAK_PROHIBITED
3057      This value indicates that ‘S[I-1]’ and ‘S[I]’ must not be
3058      separated.
3059
3060  -- Constant: int UC_BREAK_UNDEFINED
3061      This value is not used as a return value; rather, in the overriding
3062      argument of the ‘u*_width_linebreaks’ functions, it indicates the
3063      absence of an override.
3064
3065    The following functions determine the positions at which line breaks
3066 are possible.
3067
3068  -- Function: void u8_possible_linebreaks (const uint8_t *S, size_t N,
3069           const char *ENCODING, char *P)
3070  -- Function: void u16_possible_linebreaks (const uint16_t *S, size_t N,
3071           const char *ENCODING, char *P)
3072  -- Function: void u32_possible_linebreaks (const uint32_t *S, size_t N,
3073           const char *ENCODING, char *P)
3074  -- Function: void ulc_possible_linebreaks (const char *S, size_t N,
3075           const char *ENCODING, char *P)
3076      Determines the line break points in S, and stores the result at
3077      ‘P[0..N-1]’.  Every ‘P[I]’ is assigned one of the values
3078      ‘UC_BREAK_MANDATORY’, ‘UC_BREAK_POSSIBLE’, ‘UC_BREAK_HYPHENATION’,
3079      ‘UC_BREAK_PROHIBITED’.
3080
3081    The following functions determine where line breaks should be
3082 inserted so that each line fits in a given width, when output to a
3083 device that uses non-proportional fonts.
3084
3085  -- Function: int u8_width_linebreaks (const uint8_t *S, size_t N, int
3086           WIDTH, int START_COLUMN, int AT_END_COLUMNS, const char
3087           *OVERRIDE, const char *ENCODING, char *P)
3088  -- Function: int u16_width_linebreaks (const uint16_t *S, size_t N, int
3089           WIDTH, int START_COLUMN, int AT_END_COLUMNS, const char
3090           *OVERRIDE, const char *ENCODING, char *P)
3091  -- Function: int u32_width_linebreaks (const uint32_t *S, size_t N, int
3092           WIDTH, int START_COLUMN, int AT_END_COLUMNS, const char
3093           *OVERRIDE, const char *ENCODING, char *P)
3094  -- Function: int ulc_width_linebreaks (const char *S, size_t N, int
3095           WIDTH, int START_COLUMN, int AT_END_COLUMNS, const char
3096           *OVERRIDE, const char *ENCODING, char *P)
3097      Chooses the best line breaks, assuming that every character
3098      occupies a width given by the ‘uc_width’ function (see *note
3099      uniwidth.h::).
3100
3101      The string is ‘S[0..N-1]’.
3102
3103      The maximum number of columns per line is given as WIDTH.  The
3104      starting column of the string is given as START_COLUMN.  If the
3105      algorithm shall keep room after the last piece, this amount of room
3106      can be given as AT_END_COLUMNS.
3107
3108      OVERRIDE is an optional override; if ‘OVERRIDE[I] !=
3109      UC_BREAK_UNDEFINED’, ‘OVERRIDE[I]’ takes precedence over ‘P[I]’ as
3110      returned by the ‘u*_possible_linebreaks’ function.
3111
3112      The given ENCODING is used for disambiguating widths in ‘uc_width’.
3113
3114      Returns the column after the end of the string, and stores the
3115      result at ‘P[0..N-1]’.  Every ‘P[I]’ is assigned one of the values
3116      ‘UC_BREAK_MANDATORY’, ‘UC_BREAK_POSSIBLE’, ‘UC_BREAK_HYPHENATION’,
3117      ‘UC_BREAK_PROHIBITED’.  Here the value ‘UC_BREAK_POSSIBLE’
3118      indicates that a line break _should_ be inserted.
3119
3120 \1f
3121 File: libunistring.info,  Node: uninorm.h,  Next: unicase.h,  Prev: unilbrk.h,  Up: Top
3122
3123 13 Normalization forms (composition and decomposition) ‘<uninorm.h>’
3124 ********************************************************************
3125
3126    This include file defines functions for transforming Unicode strings
3127 to one of the four normal forms, known as NFC, NFD, NKFC, NFKD. These
3128 transformations involve decomposition and — for NFC and NFKC —
3129 composition of Unicode characters.
3130
3131 * Menu:
3132
3133 * Decomposition of characters::
3134 * Composition of characters::
3135 * Normalization of strings::
3136 * Normalizing comparisons::
3137 * Normalization of streams::
3138
3139 \1f
3140 File: libunistring.info,  Node: Decomposition of characters,  Next: Composition of characters,  Up: uninorm.h
3141
3142 13.1 Decomposition of Unicode characters
3143 ========================================
3144
3145    The following enumerated values are the possible types of
3146 decomposition of a Unicode character.
3147
3148  -- Constant: int UC_DECOMP_CANONICAL
3149      Denotes canonical decomposition.
3150
3151  -- Constant: int UC_DECOMP_FONT
3152      UCD marker: ‘<font>’.  Denotes a font variant (e.g.  a blackletter
3153      form).
3154
3155  -- Constant: int UC_DECOMP_NOBREAK
3156      UCD marker: ‘<noBreak>’.  Denotes a no-break version of a space or
3157      hyphen.
3158
3159  -- Constant: int UC_DECOMP_INITIAL
3160      UCD marker: ‘<initial>’.  Denotes an initial presentation form
3161      (Arabic).
3162
3163  -- Constant: int UC_DECOMP_MEDIAL
3164      UCD marker: ‘<medial>’.  Denotes a medial presentation form
3165      (Arabic).
3166
3167  -- Constant: int UC_DECOMP_FINAL
3168      UCD marker: ‘<final>’.  Denotes a final presentation form (Arabic).
3169
3170  -- Constant: int UC_DECOMP_ISOLATED
3171      UCD marker: ‘<isolated>’.  Denotes an isolated presentation form
3172      (Arabic).
3173
3174  -- Constant: int UC_DECOMP_CIRCLE
3175      UCD marker: ‘<circle>’.  Denotes an encircled form.
3176
3177  -- Constant: int UC_DECOMP_SUPER
3178      UCD marker: ‘<super>’.  Denotes a superscript form.
3179
3180  -- Constant: int UC_DECOMP_SUB
3181      UCD marker: ‘<sub>’.  Denotes a subscript form.
3182
3183  -- Constant: int UC_DECOMP_VERTICAL
3184      UCD marker: ‘<vertical>’.  Denotes a vertical layout presentation
3185      form.
3186
3187  -- Constant: int UC_DECOMP_WIDE
3188      UCD marker: ‘<wide>’.  Denotes a wide (or zenkaku) compatibility
3189      character.
3190
3191  -- Constant: int UC_DECOMP_NARROW
3192      UCD marker: ‘<narrow>’.  Denotes a narrow (or hankaku)
3193      compatibility character.
3194
3195  -- Constant: int UC_DECOMP_SMALL
3196      UCD marker: ‘<small>’.  Denotes a small variant form (CNS
3197      compatibility).
3198
3199  -- Constant: int UC_DECOMP_SQUARE
3200      UCD marker: ‘<square>’.  Denotes a CJK squared font variant.
3201
3202  -- Constant: int UC_DECOMP_FRACTION
3203      UCD marker: ‘<fraction>’.  Denotes a vulgar fraction form.
3204
3205  -- Constant: int UC_DECOMP_COMPAT
3206      UCD marker: ‘<compat>’.  Denotes an otherwise unspecified
3207      compatibility character.
3208
3209    The following constant denotes the maximum size of decomposition of a
3210 single Unicode character.
3211
3212  -- Macro: unsigned int UC_DECOMPOSITION_MAX_LENGTH
3213      This macro expands to a constant that is the required size of
3214      buffer passed to the ‘uc_decomposition’ and
3215      ‘uc_canonical_decomposition’ functions.
3216
3217    The following functions decompose a Unicode character.
3218
3219  -- Function: int uc_decomposition (ucs4_t UC, int *DECOMP_TAG, ucs4_t
3220           *DECOMPOSITION)
3221      Returns the character decomposition mapping of the Unicode
3222      character UC.  DECOMPOSITION must point to an array of at least
3223      ‘UC_DECOMPOSITION_MAX_LENGTH’ ‘ucs_t’ elements.
3224
3225      When a decomposition exists, ‘DECOMPOSITION[0..N-1]’ and
3226      ‘*DECOMP_TAG’ are filled and N is returned.  Otherwise -1 is
3227      returned.
3228
3229  -- Function: int uc_canonical_decomposition (ucs4_t UC, ucs4_t
3230           *DECOMPOSITION)
3231      Returns the canonical character decomposition mapping of the
3232      Unicode character UC.  DECOMPOSITION must point to an array of at
3233      least ‘UC_DECOMPOSITION_MAX_LENGTH’ ‘ucs_t’ elements.
3234
3235      When a decomposition exists, ‘DECOMPOSITION[0..N-1]’ is filled and
3236      N is returned.  Otherwise -1 is returned.
3237
3238 \1f
3239 File: libunistring.info,  Node: Composition of characters,  Next: Normalization of strings,  Prev: Decomposition of characters,  Up: uninorm.h
3240
3241 13.2 Composition of Unicode characters
3242 ======================================
3243
3244    The following function composes a Unicode character from two Unicode
3245 characters.
3246
3247  -- Function: ucs4_t uc_composition (ucs4_t UC1, ucs4_t UC2)
3248      Attempts to combine the Unicode characters UC1, UC2.  UC1 is known
3249      to have canonical combining class 0.
3250
3251      Returns the combination of UC1 and UC2, if it exists.  Returns 0
3252      otherwise.
3253
3254      Not all decompositions can be recombined using this function.  See
3255      the Unicode file ‘CompositionExclusions.txt’ for details.
3256
3257 \1f
3258 File: libunistring.info,  Node: Normalization of strings,  Next: Normalizing comparisons,  Prev: Composition of characters,  Up: uninorm.h
3259
3260 13.3 Normalization of strings
3261 =============================
3262
3263    The Unicode standard defines four normalization forms for Unicode
3264 strings.  The following type is used to denote a normalization form.
3265
3266  -- Type: uninorm_t
3267      An object of type ‘uninorm_t’ denotes a Unicode normalization form.
3268      This is a scalar type; its values can be compared with ‘==’.
3269
3270    The following constants denote the four normalization forms.
3271
3272  -- Macro: uninorm_t UNINORM_NFD
3273      Denotes Normalization form D: canonical decomposition.
3274
3275  -- Macro: uninorm_t UNINORM_NFC
3276      Normalization form C: canonical decomposition, then canonical
3277      composition.
3278
3279  -- Macro: uninorm_t UNINORM_NFKD
3280      Normalization form KD: compatibility decomposition.
3281
3282  -- Macro: uninorm_t UNINORM_NFKC
3283      Normalization form KC: compatibility decomposition, then canonical
3284      composition.
3285
3286    The following functions operate on ‘uninorm_t’ objects.
3287
3288  -- Function: bool uninorm_is_compat_decomposing (uninorm_t NF)
3289      Tests whether the normalization form NF does compatibility
3290      decomposition.
3291
3292  -- Function: bool uninorm_is_composing (uninorm_t NF)
3293      Tests whether the normalization form NF includes canonical
3294      composition.
3295
3296  -- Function: uninorm_t uninorm_decomposing_form (uninorm_t NF)
3297      Returns the decomposing variant of the normalization form NF.  This
3298      maps NFC,NFD → NFD and NFKC,NFKD → NFKD.
3299
3300    The following functions apply a Unicode normalization form to a
3301 Unicode string.
3302
3303  -- Function: uint8_t * u8_normalize (uninorm_t NF, const uint8_t *S,
3304           size_t N, uint8_t *RESULTBUF, size_t *LENGTHP)
3305  -- Function: uint16_t * u16_normalize (uninorm_t NF, const uint16_t *S,
3306           size_t N, uint16_t *RESULTBUF, size_t *LENGTHP)
3307  -- Function: uint32_t * u32_normalize (uninorm_t NF, const uint32_t *S,
3308           size_t N, uint32_t *RESULTBUF, size_t *LENGTHP)
3309      Returns the specified normalization form of a string.
3310
3311 \1f
3312 File: libunistring.info,  Node: Normalizing comparisons,  Next: Normalization of streams,  Prev: Normalization of strings,  Up: uninorm.h
3313
3314 13.4 Normalizing comparisons
3315 ============================
3316
3317    The following functions compare Unicode string, ignoring differences
3318 in normalization.
3319
3320  -- Function: int u8_normcmp (const uint8_t *S1, size_t N1, const
3321           uint8_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
3322  -- Function: int u16_normcmp (const uint16_t *S1, size_t N1, const
3323           uint16_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
3324  -- Function: int u32_normcmp (const uint32_t *S1, size_t N1, const
3325           uint32_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
3326      Compares S1 and S2, ignoring differences in normalization.
3327
3328      NF must be either ‘UNINORM_NFD’ or ‘UNINORM_NFKD’.
3329
3330      If successful, sets ‘*RESULTP’ to -1 if S1 < S2, 0 if S1 = S2, 1 if
3331      S1 > S2, and returns 0.  Upon failure, returns -1 with ‘errno’ set.
3332
3333  -- Function: char * u8_normxfrm (const uint8_t *S, size_t N, uninorm_t
3334           NF, char *RESULTBUF, size_t *LENGTHP)
3335  -- Function: char * u16_normxfrm (const uint16_t *S, size_t N,
3336           uninorm_t NF, char *RESULTBUF, size_t *LENGTHP)
3337  -- Function: char * u32_normxfrm (const uint32_t *S, size_t N,
3338           uninorm_t NF, char *RESULTBUF, size_t *LENGTHP)
3339      Converts the string S of length N to a NUL-terminated byte
3340      sequence, in such a way that comparing ‘u8_normxfrm (S1)’ and
3341      ‘u8_normxfrm (S2)’ with the ‘u8_cmp2’ function is equivalent to
3342      comparing S1 and S2 with the ‘u8_normcoll’ function.
3343
3344      NF must be either ‘UNINORM_NFC’ or ‘UNINORM_NFKC’.
3345
3346  -- Function: int u8_normcoll (const uint8_t *S1, size_t N1, const
3347           uint8_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
3348  -- Function: int u16_normcoll (const uint16_t *S1, size_t N1, const
3349           uint16_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
3350  -- Function: int u32_normcoll (const uint32_t *S1, size_t N1, const
3351           uint32_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
3352      Compares S1 and S2, ignoring differences in normalization, using
3353      the collation rules of the current locale.
3354
3355      NF must be either ‘UNINORM_NFC’ or ‘UNINORM_NFKC’.
3356
3357      If successful, sets ‘*RESULTP’ to -1 if S1 < S2, 0 if S1 = S2, 1 if
3358      S1 > S2, and returns 0.  Upon failure, returns -1 with ‘errno’ set.
3359
3360 \1f
3361 File: libunistring.info,  Node: Normalization of streams,  Prev: Normalizing comparisons,  Up: uninorm.h
3362
3363 13.5 Normalization of streams of Unicode characters
3364 ===================================================
3365
3366    A “stream of Unicode characters” is essentially a function that
3367 accepts an ‘ucs4_t’ argument repeatedly, optionally combined with a
3368 function that “flushes” the stream.
3369
3370  -- Type: struct uninorm_filter
3371      This is the data type of a stream of Unicode characters that
3372      normalizes its input according to a given normalization form and
3373      passes the normalized character sequence to the encapsulated stream
3374      of Unicode characters.
3375
3376  -- Function: struct uninorm_filter * uninorm_filter_create (uninorm_t
3377           NF, int (*STREAM_FUNC) (void *STREAM_DATA, ucs4_t UC), void
3378           *STREAM_DATA)
3379      Creates and returns a normalization filter for Unicode characters.
3380
3381      The pair (STREAM_FUNC, STREAM_DATA) is the encapsulated stream.
3382      ‘STREAM_FUNC (STREAM_DATA, UC)’ receives the Unicode character UC
3383      and returns 0 if successful, or -1 with ‘errno’ set upon failure.
3384
3385      Returns the new filter, or NULL with ‘errno’ set upon failure.
3386
3387  -- Function: int uninorm_filter_write (struct uninorm_filter *FILTER,
3388           ucs4_t UC)
3389      Stuffs a Unicode character into a normalizing filter.  Returns 0 if
3390      successful, or -1 with ‘errno’ set upon failure.
3391
3392  -- Function: int uninorm_filter_flush (struct uninorm_filter *FILTER)
3393      Brings data buffered in the filter to its destination, the
3394      encapsulated stream.
3395
3396      Returns 0 if successful, or -1 with ‘errno’ set upon failure.
3397
3398      Note!  If after calling this function, additional characters are
3399      written into the filter, the resulting character sequence in the
3400      encapsulated stream will not necessarily be normalized.
3401
3402  -- Function: int uninorm_filter_free (struct uninorm_filter *FILTER)
3403      Brings data buffered in the filter to its destination, the
3404      encapsulated stream, then closes and frees the filter.
3405
3406      Returns 0 if successful, or -1 with ‘errno’ set upon failure.
3407
3408 \1f
3409 File: libunistring.info,  Node: unicase.h,  Next: uniregex.h,  Prev: uninorm.h,  Up: Top
3410
3411 14 Case mappings ‘<unicase.h>’
3412 ******************************
3413
3414    This include file defines functions for case mapping for Unicode
3415 strings and case insensitive comparison of Unicode strings and C
3416 strings.
3417
3418    These string functions fix the problems that were mentioned in *note
3419 char * strings::, namely, they handle the Croatian LETTER DZ WITH CARON,
3420 the German LATIN SMALL LETTER SHARP S, the Greek sigma and the
3421 Lithuanian i correctly.
3422
3423 * Menu:
3424
3425 * Case mappings of characters::
3426 * Case mappings of strings::
3427 * Case mappings of substrings::
3428 * Case insensitive comparison::
3429 * Case detection::
3430
3431 \1f
3432 File: libunistring.info,  Node: Case mappings of characters,  Next: Case mappings of strings,  Up: unicase.h
3433
3434 14.1 Case mappings of characters
3435 ================================
3436
3437    The following functions implement case mappings on Unicode characters
3438 — for those cases only where the result of the mapping is a again a
3439 single Unicode character.
3440
3441    These mappings are locale and context independent.
3442
3443    *WARNING!* These functions are not sufficient for languages such as
3444 German, Greek and Lithuanian.  Better use the functions below that treat
3445 an entire string at once and are language aware.
3446
3447  -- Function: ucs4_t uc_toupper (ucs4_t UC)
3448      Returns the uppercase mapping of the Unicode character UC.
3449
3450  -- Function: ucs4_t uc_tolower (ucs4_t UC)
3451      Returns the lowercase mapping of the Unicode character UC.
3452
3453  -- Function: ucs4_t uc_totitle (ucs4_t UC)
3454      Returns the titlecase mapping of the Unicode character UC.
3455
3456      The titlecase mapping of a character is to be used when the
3457      character should look like upper case and the following characters
3458      are lower cased.
3459
3460      For most characters, this is the same as the uppercase mapping.
3461      There are only few characters where the title case variant and the
3462      uuper case variant are different.  These characters occur in the
3463      Latin writing of the Croatian, Bosnian, and Serbian languages.
3464
3465      Lower case             Title case             Upper case
3466      ---------------------------------------------------------------------
3467      LATIN SMALL LETTER     LATIN CAPITAL LETTER   LATIN CAPITAL LETTER
3468      LJ                     L WITH SMALL LETTER    LJ
3469                             J
3470      LATIN SMALL LETTER     LATIN CAPITAL LETTER   LATIN CAPITAL LETTER
3471      NJ                     N WITH SMALL LETTER    NJ
3472                             J
3473      LATIN SMALL LETTER     LATIN CAPITAL LETTER   LATIN CAPITAL LETTER
3474      DZ                     D WITH SMALL LETTER    DZ
3475                             Z
3476      LATIN SMALL LETTER     LATIN CAPITAL LETTER   LATIN CAPITAL LETTER
3477      DZ WITH CARON          D WITH SMALL LETTER    DZ WITH CARON
3478                             Z WITH CARON
3479
3480 \1f
3481 File: libunistring.info,  Node: Case mappings of strings,  Next: Case mappings of substrings,  Prev: Case mappings of characters,  Up: unicase.h
3482
3483 14.2 Case mappings of strings
3484 =============================
3485
3486    Case mapping should always be performed on entire strings, not on
3487 individual characters.  The functions in this sections do so.
3488
3489    These functions allow to apply a normalization after the case
3490 mapping.  The reason is that if you want to treat ‘ä’ and ‘Ä’ the same,
3491 you most often also want to treat the composed and decomposed forms of
3492 such a character, U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS and
3493 U+0041 LATIN CAPITAL LETTER A U+0308 COMBINING DIAERESIS the same.  The
3494 NF argument designates the normalization.
3495
3496    These functions are locale dependent.  The ISO639_LANGUAGE argument
3497 identifies the language (e.g.  ‘"tr"’ for Turkish).  NULL means to use
3498 locale independent case mappings.
3499
3500  -- Function: const char * uc_locale_language ()
3501      Returns the ISO 639 language code of the current locale.  Returns
3502      ‘""’ if it is unknown, or in the "C" locale.
3503
3504  -- Function: uint8_t * u8_toupper (const uint8_t *S, size_t N, const
3505           char *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF,
3506           size_t *LENGTHP)
3507  -- Function: uint16_t * u16_toupper (const uint16_t *S, size_t N, const
3508           char *ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF,
3509           size_t *LENGTHP)
3510  -- Function: uint32_t * u32_toupper (const uint32_t *S, size_t N, const
3511           char *ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF,
3512           size_t *LENGTHP)
3513      Returns the uppercase mapping of a string.
3514
3515      The NF argument identifies the normalization form to apply after
3516      the case-mapping.  It can also be NULL, for no normalization.
3517
3518  -- Function: uint8_t * u8_tolower (const uint8_t *S, size_t N, const
3519           char *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF,
3520           size_t *LENGTHP)
3521  -- Function: uint16_t * u16_tolower (const uint16_t *S, size_t N, const
3522           char *ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF,
3523           size_t *LENGTHP)
3524  -- Function: uint32_t * u32_tolower (const uint32_t *S, size_t N, const
3525           char *ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF,
3526           size_t *LENGTHP)
3527      Returns the lowercase mapping of a string.
3528
3529      The NF argument identifies the normalization form to apply after
3530      the case-mapping.  It can also be NULL, for no normalization.
3531
3532  -- Function: uint8_t * u8_totitle (const uint8_t *S, size_t N, const
3533           char *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF,
3534           size_t *LENGTHP)
3535  -- Function: uint16_t * u16_totitle (const uint16_t *S, size_t N, const
3536           char *ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF,
3537           size_t *LENGTHP)
3538  -- Function: uint32_t * u32_totitle (const uint32_t *S, size_t N, const
3539           char *ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF,
3540           size_t *LENGTHP)
3541      Returns the titlecase mapping of a string.
3542
3543      Mapping to title case means that, in each word, the first cased
3544      character is being mapped to title case and the remaining
3545      characters of the word are being mapped to lower case.
3546
3547      The NF argument identifies the normalization form to apply after
3548      the case-mapping.  It can also be NULL, for no normalization.
3549
3550 \1f
3551 File: libunistring.info,  Node: Case mappings of substrings,  Next: Case insensitive comparison,  Prev: Case mappings of strings,  Up: unicase.h
3552
3553 14.3 Case mappings of substrings
3554 ================================
3555
3556    Case mapping of a substring cannot simply be performed by extracting
3557 the substring and then applying the case mapping function to it.  This
3558 does not work because case mapping requires some information about the
3559 surrounding characters.  The following functions allow to apply case
3560 mappings to substrings of a given string, while taking into account the
3561 characters that precede it (the “prefix”) and the characters that follow
3562 it (the “suffix”).
3563
3564  -- Type: casing_prefix_context_t
3565      This data type denotes the case-mapping context that is given by a
3566      prefix string.  It is an immediate type that can be copied by
3567      simple assignment, without involving memory allocation.  It is not
3568      an array type.
3569
3570  -- Constant: casing_prefix_context_t unicase_empty_prefix_context
3571      This constant is the case-mapping context that corresponds to an
3572      empty prefix string.
3573
3574    The following functions return ‘casing_prefix_context_t’ objects:
3575
3576  -- Function: casing_prefix_context_t u8_casing_prefix_context (const
3577           uint8_t *S, size_t N)
3578  -- Function: casing_prefix_context_t u16_casing_prefix_context (const
3579           uint16_t *S, size_t N)
3580  -- Function: casing_prefix_context_t u32_casing_prefix_context (const
3581           uint32_t *S, size_t N)
3582      Returns the case-mapping context of a given prefix string.
3583
3584  -- Function: casing_prefix_context_t u8_casing_prefixes_context (const
3585           uint8_t *S, size_t N, casing_prefix_context_t A_CONTEXT)
3586  -- Function: casing_prefix_context_t u16_casing_prefixes_context (const
3587           uint16_t *S, size_t N, casing_prefix_context_t A_CONTEXT)
3588  -- Function: casing_prefix_context_t u32_casing_prefixes_context (const
3589           uint32_t *S, size_t N, casing_prefix_context_t A_CONTEXT)
3590      Returns the case-mapping context of the prefix concat(A, S), given
3591      the case-mapping context of the prefix A.
3592
3593  -- Type: casing_suffix_context_t
3594      This data type denotes the case-mapping context that is given by a
3595      suffix string.  It is an immediate type that can be copied by
3596      simple assignment, without involving memory allocation.  It is not
3597      an array type.
3598
3599  -- Constant: casing_suffix_context_t unicase_empty_suffix_context
3600      This constant is the case-mapping context that corresponds to an
3601      empty suffix string.
3602
3603    The following functions return ‘casing_suffix_context_t’ objects:
3604
3605  -- Function: casing_suffix_context_t u8_casing_suffix_context (const
3606           uint8_t *S, size_t N)
3607  -- Function: casing_suffix_context_t u16_casing_suffix_context (const
3608           uint16_t *S, size_t N)
3609  -- Function: casing_suffix_context_t u32_casing_suffix_context (const
3610           uint32_t *S, size_t N)
3611      Returns the case-mapping context of a given suffix string.
3612
3613  -- Function: casing_suffix_context_t u8_casing_suffixes_context (const
3614           uint8_t *S, size_t N, casing_suffix_context_t A_CONTEXT)
3615  -- Function: casing_suffix_context_t u16_casing_suffixes_context (const
3616           uint16_t *S, size_t N, casing_suffix_context_t A_CONTEXT)
3617  -- Function: casing_suffix_context_t u32_casing_suffixes_context (const
3618           uint32_t *S, size_t N, casing_suffix_context_t A_CONTEXT)
3619      Returns the case-mapping context of the suffix concat(S, A), given
3620      the case-mapping context of the suffix A.
3621
3622    The following functions perform a case mapping, considering the
3623 prefix context and the suffix context.
3624
3625  -- Function: uint8_t * u8_ct_toupper (const uint8_t *S, size_t N,
3626           casing_prefix_context_t PREFIX_CONTEXT,
3627           casing_suffix_context_t SUFFIX_CONTEXT, const char
3628           *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF, size_t
3629           *LENGTHP)
3630  -- Function: uint16_t * u16_ct_toupper (const uint16_t *S, size_t N,
3631           casing_prefix_context_t PREFIX_CONTEXT,
3632           casing_suffix_context_t SUFFIX_CONTEXT, const char
3633           *ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF, size_t
3634           *LENGTHP)
3635  -- Function: uint32_t * u32_ct_toupper (const uint32_t *S, size_t N,
3636           casing_prefix_context_t PREFIX_CONTEXT,
3637           casing_suffix_context_t SUFFIX_CONTEXT, const char
3638           *ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF, size_t
3639           *LENGTHP)
3640      Returns the uppercase mapping of a string that is surrounded by a
3641      prefix and a suffix.
3642
3643  -- Function: uint8_t * u8_ct_tolower (const uint8_t *S, size_t N,
3644           casing_prefix_context_t PREFIX_CONTEXT,
3645           casing_suffix_context_t SUFFIX_CONTEXT, const char
3646           *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF, size_t
3647           *LENGTHP)
3648  -- Function: uint16_t * u16_ct_tolower (const uint16_t *S, size_t N,
3649           casing_prefix_context_t PREFIX_CONTEXT,
3650           casing_suffix_context_t SUFFIX_CONTEXT, const char
3651           *ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF, size_t
3652           *LENGTHP)
3653  -- Function: uint32_t * u32_ct_tolower (const uint32_t *S, size_t N,
3654           casing_prefix_context_t PREFIX_CONTEXT,
3655           casing_suffix_context_t SUFFIX_CONTEXT, const char
3656           *ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF, size_t
3657           *LENGTHP)
3658      Returns the lowercase mapping of a string that is surrounded by a
3659      prefix and a suffix.
3660
3661  -- Function: uint8_t * u8_ct_totitle (const uint8_t *S, size_t N,
3662           casing_prefix_context_t PREFIX_CONTEXT,
3663           casing_suffix_context_t SUFFIX_CONTEXT, const char
3664           *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF, size_t
3665           *LENGTHP)
3666  -- Function: uint16_t * u16_ct_totitle (const uint16_t *S, size_t N,
3667           casing_prefix_context_t PREFIX_CONTEXT,
3668           casing_suffix_context_t SUFFIX_CONTEXT, const char
3669           *ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF, size_t
3670           *LENGTHP)
3671  -- Function: uint32_t * u32_ct_totitle (const uint32_t *S, size_t N,
3672           casing_prefix_context_t PREFIX_CONTEXT,
3673           casing_suffix_context_t SUFFIX_CONTEXT, const char
3674           *ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF, size_t
3675           *LENGTHP)
3676      Returns the titlecase mapping of a string that is surrounded by a
3677      prefix and a suffix.
3678
3679    For example, to uppercase the UTF-8 substring between ‘s +
3680 start_index’ and ‘s + end_index’ of a string that extends from ‘s’ to ‘s
3681 + u8_strlen (s)’, you can use the statements
3682
3683      size_t result_length;
3684      uint8_t result =
3685        u8_ct_toupper (s + start_index, end_index - start_index,
3686                       u8_casing_prefix_context (s, start_index),
3687                       u8_casing_suffix_context (s + end_index,
3688                                                 u8_strlen (s) - end_index),
3689                       iso639_language, NULL, NULL, &result_length);
3690
3691 \1f
3692 File: libunistring.info,  Node: Case insensitive comparison,  Next: Case detection,  Prev: Case mappings of substrings,  Up: unicase.h
3693
3694 14.4 Case insensitive comparison
3695 ================================
3696
3697    The following functions implement comparison that ignores differences
3698 in case and normalization.
3699
3700  -- Function: uint8_t * u8_casefold (const uint8_t *S, size_t N, const
3701           char *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF,
3702           size_t *LENGTHP)
3703  -- Function: uint16_t * u16_casefold (const uint16_t *S, size_t N,
3704           const char *ISO639_LANGUAGE, uninorm_t NF, uint16_t
3705           *RESULTBUF, size_t *LENGTHP)
3706  -- Function: uint32_t * u32_casefold (const uint32_t *S, size_t N,
3707           const char *ISO639_LANGUAGE, uninorm_t NF, uint32_t
3708           *RESULTBUF, size_t *LENGTHP)
3709      Returns the case folded string.
3710
3711      Comparing ‘u8_casefold (S1)’ and ‘u8_casefold (S2)’ with the
3712      ‘u8_cmp2’ function is equivalent to comparing S1 and S2 with
3713      ‘u8_casecmp’.
3714
3715      The NF argument identifies the normalization form to apply after
3716      the case-mapping.  It can also be NULL, for no normalization.
3717
3718  -- Function: uint8_t * u8_ct_casefold (const uint8_t *S, size_t N,
3719           casing_prefix_context_t PREFIX_CONTEXT,
3720           casing_suffix_context_t SUFFIX_CONTEXT, const char
3721           *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF, size_t
3722           *LENGTHP)
3723  -- Function: uint16_t * u16_ct_casefold (const uint16_t *S, size_t N,
3724           casing_prefix_context_t PREFIX_CONTEXT,
3725           casing_suffix_context_t SUFFIX_CONTEXT, const char
3726           *ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF, size_t
3727           *LENGTHP)
3728  -- Function: uint32_t * u32_ct_casefold (const uint32_t *S, size_t N,
3729           casing_prefix_context_t PREFIX_CONTEXT,
3730           casing_suffix_context_t SUFFIX_CONTEXT, const char
3731           *ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF, size_t
3732           *LENGTHP)
3733      Returns the case folded string.  The case folding takes into
3734      account the case mapping contexts of the prefix and suffix strings.
3735
3736  -- Function: int u8_casecmp (const uint8_t *S1, size_t N1, const
3737           uint8_t *S2, size_t N2, const char *ISO639_LANGUAGE, uninorm_t
3738           NF, int *RESULTP)
3739  -- Function: int u16_casecmp (const uint16_t *S1, size_t N1, const
3740           uint16_t *S2, size_t N2, const char *ISO639_LANGUAGE,
3741           uninorm_t NF, int *RESULTP)
3742  -- Function: int u32_casecmp (const uint32_t *S1, size_t N1, const
3743           uint32_t *S2, size_t N2, const char *ISO639_LANGUAGE,
3744           uninorm_t NF, int *RESULTP)
3745  -- Function: int ulc_casecmp (const char *S1, size_t N1, const char
3746           *S2, size_t N2, const char *ISO639_LANGUAGE, uninorm_t NF, int
3747           *RESULTP)
3748      Compares S1 and S2, ignoring differences in case and normalization.
3749
3750      The NF argument identifies the normalization form to apply after
3751      the case-mapping.  It can also be NULL, for no normalization.
3752
3753      If successful, sets ‘*RESULTP’ to -1 if S1 < S2, 0 if S1 = S2, 1 if
3754      S1 > S2, and returns 0.  Upon failure, returns -1 with ‘errno’ set.
3755
3756    The following functions additionally take into account the sorting
3757 rules of the current locale.
3758
3759  -- Function: char * u8_casexfrm (const uint8_t *S, size_t N, const char
3760           *ISO639_LANGUAGE, uninorm_t NF, char *RESULTBUF, size_t
3761           *LENGTHP)
3762  -- Function: char * u16_casexfrm (const uint16_t *S, size_t N, const
3763           char *ISO639_LANGUAGE, uninorm_t NF, char *RESULTBUF, size_t
3764           *LENGTHP)
3765  -- Function: char * u32_casexfrm (const uint32_t *S, size_t N, const
3766           char *ISO639_LANGUAGE, uninorm_t NF, char *RESULTBUF, size_t
3767           *LENGTHP)
3768  -- Function: char * ulc_casexfrm (const char *S, size_t N, const char
3769           *ISO639_LANGUAGE, uninorm_t NF, char *RESULTBUF, size_t
3770           *LENGTHP)
3771      Converts the string S of length N to a NUL-terminated byte
3772      sequence, in such a way that comparing ‘u8_casexfrm (S1)’ and
3773      ‘u8_casexfrm (S2)’ with the gnulib function ‘memcmp2’ is equivalent
3774      to comparing S1 and S2 with ‘u8_casecoll’.
3775
3776      NF must be either ‘UNINORM_NFC’, ‘UNINORM_NFKC’, or NULL for no
3777      normalization.
3778
3779  -- Function: int u8_casecoll (const uint8_t *S1, size_t N1, const
3780           uint8_t *S2, size_t N2, const char *ISO639_LANGUAGE, uninorm_t
3781           NF, int *RESULTP)
3782  -- Function: int u16_casecoll (const uint16_t *S1, size_t N1, const
3783           uint16_t *S2, size_t N2, const char *ISO639_LANGUAGE,
3784           uninorm_t NF, int *RESULTP)
3785  -- Function: int u32_casecoll (const uint32_t *S1, size_t N1, const
3786           uint32_t *S2, size_t N2, const char *ISO639_LANGUAGE,
3787           uninorm_t NF, int *RESULTP)
3788  -- Function: int ulc_casecoll (const char *S1, size_t N1, const char
3789           *S2, size_t N2, const char *ISO639_LANGUAGE, uninorm_t NF, int
3790           *RESULTP)
3791      Compares S1 and S2, ignoring differences in case and normalization,
3792      using the collation rules of the current locale.
3793
3794      The NF argument identifies the normalization form to apply after
3795      the case-mapping.  It must be either ‘UNINORM_NFC’ or
3796      ‘UNINORM_NFKC’.  It can also be NULL, for no normalization.
3797
3798      If successful, sets ‘*RESULTP’ to -1 if S1 < S2, 0 if S1 = S2, 1 if
3799      S1 > S2, and returns 0.  Upon failure, returns -1 with ‘errno’ set.
3800
3801 \1f
3802 File: libunistring.info,  Node: Case detection,  Prev: Case insensitive comparison,  Up: unicase.h
3803
3804 14.5 Case detection
3805 ===================
3806
3807    The following functions determine whether a Unicode string is
3808 entirely in upper case.  or entirely in lower case, or entirely in title
3809 case, or already case-folded.
3810
3811  -- Function: int u8_is_uppercase (const uint8_t *S, size_t N, const
3812           char *ISO639_LANGUAGE, bool *RESULTP)
3813  -- Function: int u16_is_uppercase (const uint16_t *S, size_t N, const
3814           char *ISO639_LANGUAGE, bool *RESULTP)
3815  -- Function: int u32_is_uppercase (const uint32_t *S, size_t N, const
3816           char *ISO639_LANGUAGE, bool *RESULTP)
3817      Sets ‘*RESULTP’ to true if mapping NFD(S) to upper case is a no-op,
3818      or to false otherwise, and returns 0.  Upon failure, returns -1
3819      with ‘errno’ set.
3820
3821  -- Function: int u8_is_lowercase (const uint8_t *S, size_t N, const
3822           char *ISO639_LANGUAGE, bool *RESULTP)
3823  -- Function: int u16_is_lowercase (const uint16_t *S, size_t N, const
3824           char *ISO639_LANGUAGE, bool *RESULTP)
3825  -- Function: int u32_is_lowercase (const uint32_t *S, size_t N, const
3826           char *ISO639_LANGUAGE, bool *RESULTP)
3827      Sets ‘*RESULTP’ to true if mapping NFD(S) to lower case is a no-op,
3828      or to false otherwise, and returns 0.  Upon failure, returns -1
3829      with ‘errno’ set.
3830
3831  -- Function: int u8_is_titlecase (const uint8_t *S, size_t N, const
3832           char *ISO639_LANGUAGE, bool *RESULTP)
3833  -- Function: int u16_is_titlecase (const uint16_t *S, size_t N, const
3834           char *ISO639_LANGUAGE, bool *RESULTP)
3835  -- Function: int u32_is_titlecase (const uint32_t *S, size_t N, const
3836           char *ISO639_LANGUAGE, bool *RESULTP)
3837      Sets ‘*RESULTP’ to true if mapping NFD(S) to title case is a no-op,
3838      or to false otherwise, and returns 0.  Upon failure, returns -1
3839      with ‘errno’ set.
3840
3841  -- Function: int u8_is_casefolded (const uint8_t *S, size_t N, const
3842           char *ISO639_LANGUAGE, bool *RESULTP)
3843  -- Function: int u16_is_casefolded (const uint16_t *S, size_t N, const
3844           char *ISO639_LANGUAGE, bool *RESULTP)
3845  -- Function: int u32_is_casefolded (const uint32_t *S, size_t N, const
3846           char *ISO639_LANGUAGE, bool *RESULTP)
3847      Sets ‘*RESULTP’ to true if applying case folding to NFD(S) is a
3848      no-op, or to false otherwise, and returns 0.  Upon failure, returns
3849      -1 with ‘errno’ set.
3850
3851    The following functions determine whether case mappings have any
3852 effect on a Unicode string.
3853
3854  -- Function: int u8_is_cased (const uint8_t *S, size_t N, const char
3855           *ISO639_LANGUAGE, bool *RESULTP)
3856  -- Function: int u16_is_cased (const uint16_t *S, size_t N, const char
3857           *ISO639_LANGUAGE, bool *RESULTP)
3858  -- Function: int u32_is_cased (const uint32_t *S, size_t N, const char
3859           *ISO639_LANGUAGE, bool *RESULTP)
3860      Sets ‘*RESULTP’ to true if case matters for S, that is, if mapping
3861      NFD(S) to either upper case or lower case or title case is not a
3862      no-op.  Set ‘*RESULTP’ to false if NFD(S) maps to itself under the
3863      upper case mapping, under the lower case mapping, and under the
3864      title case mapping; in other words, when NFD(S) consists entirely
3865      of caseless characters.  Upon failure, returns -1 with ‘errno’ set.
3866
3867 \1f
3868 File: libunistring.info,  Node: uniregex.h,  Next: Using the library,  Prev: unicase.h,  Up: Top
3869
3870 15 Regular expressions ‘<uniregex.h>’
3871 *************************************
3872
3873    This include file is not yet implemented.
3874
3875 \1f
3876 File: libunistring.info,  Node: Using the library,  Next: More functionality,  Prev: uniregex.h,  Up: Top
3877
3878 16 Using the library
3879 ********************
3880
3881    This chapter explains some practical considerations, regarding the
3882 installation and compiler options that are needed in order to use this
3883 library.
3884
3885 * Menu:
3886
3887 * Installation::
3888 * Compiler options::
3889 * Include files::
3890 * Autoconf macro::
3891 * Reporting problems::
3892
3893 \1f
3894 File: libunistring.info,  Node: Installation,  Next: Compiler options,  Up: Using the library
3895
3896 16.1 Installation
3897 =================
3898
3899    Before you can use the library, it must be installed.  First, you
3900 have to make sure all dependencies are installed.  They are listed in
3901 the file ‘DEPENDENCIES’.
3902
3903    Then you can proceed to build and install the library, as described
3904 in the file ‘INSTALL’.  For installation on Windows systems, please
3905 refer to the file ‘README.woe32’.
3906
3907 \1f
3908 File: libunistring.info,  Node: Compiler options,  Next: Include files,  Prev: Installation,  Up: Using the library
3909
3910 16.2 Compiler options
3911 =====================
3912
3913    Let’s denote as ‘LIBUNISTRING_PREFIX’ the value of the ‘--prefix’
3914 option that you passed to ‘configure’ while installing this package.  If
3915 you didn’t pass any ‘--prefix’ option, then the package is installed in
3916 ‘/usr/local’.
3917
3918    Let’s denote as ‘LIBUNISTRING_INCLUDEDIR’ the directory where the
3919 include files were installed.  This is usually the same as
3920 ‘${LIBUNISTRING_PREFIX}/include’.  Except that if you passed an
3921 ‘--includedir’ option to ‘configure’, it is the value of that option.
3922
3923    Let’s further denote as ‘LIBUNISTRING_LIBDIR’ the directory where the
3924 library itself was installed.  This is the value that you passed with
3925 the ‘--libdir’ option to ‘configure’, or otherwise the same as
3926 ‘${LIBUNISTRING_PREFIX}/lib’.  Recall that when building in 64-bit mode
3927 on a 64-bit GNU/Linux system that supports executables in either 64-bit
3928 mode or 32-bit mode, you should have used the option
3929 ‘--libdir=${LIBUNISTRING_PREFIX}/lib64’.
3930
3931    So that the compiler finds the include files, you have to pass it the
3932 option ‘-I${LIBUNISTRING_INCLUDEDIR}’.
3933
3934    So that the compiler finds the library during its linking pass, you
3935 have to pass it the options ‘-L${LIBUNISTRING_LIBDIR} -lunistring’.  On
3936 some systems, in some configurations, you also have to pass options
3937 needed for linking with ‘libiconv’.  The autoconf macro
3938 ‘gl_LIBUNISTRING’ (see *note Autoconf macro::) deals with this
3939 particularity.
3940
3941 \1f
3942 File: libunistring.info,  Node: Include files,  Next: Autoconf macro,  Prev: Compiler options,  Up: Using the library
3943
3944 16.3 Include files
3945 ==================
3946
3947    Most of the include files have been presented in the introduction,
3948 see *note Introduction::, and subsequent detailed chapters.
3949
3950    Another include file is ‘<unistring/version.h>’.  It contains the
3951 version number of the libunistring library.
3952
3953  -- Macro: int _LIBUNISTRING_VERSION
3954      This constant contains the version of libunistring that is being
3955      used at compile time.  It encodes the major and minor parts of the
3956      version number only.  These parts are encoded in the form
3957      ‘(major<<8) + minor’.
3958
3959  -- Constant: int _libunistring_version
3960      This constant contains the version of libunistring that is being
3961      used at run time.  It encodes the major and minor parts of the
3962      version number only.  These parts are encoded in the form
3963      ‘(major<<8) + minor’.
3964
3965    It is possible that ‘_libunistring_version’ is greater than
3966 ‘_LIBUNISTRING_VERSION’.  This can happen when you use ‘libunistring’ as
3967 a shared library, and a newer, binary backward-compatible version has
3968 been installed after your program that uses ‘libunistring’ was
3969 installed.
3970
3971 \1f
3972 File: libunistring.info,  Node: Autoconf macro,  Next: Reporting problems,  Prev: Include files,  Up: Using the library
3973
3974 16.4 Autoconf macro
3975 ===================
3976
3977    GNU Gnulib provides an autoconf macro that tests for the availability
3978 of ‘libunistring’.  It is contained in the Gnulib module ‘libunistring’,
3979 see
3980 <http://www.gnu.org/software/gnulib/MODULES.html#module=libunistring>.
3981
3982    The macro is called ‘gl_LIBUNISTRING’.  It searches for an installed
3983 libunistring.  If found, it sets and AC_SUBSTs ‘HAVE_LIBUNISTRING=yes’
3984 and the ‘LIBUNISTRING’ and ‘LTLIBUNISTRING’ variables and augments the
3985 ‘CPPFLAGS’ variable, and defines the C macro ‘HAVE_LIBUNISTRING’ to 1.
3986 Otherwise, it sets and AC_SUBSTs ‘HAVE_LIBUNISTRING=no’ and
3987 ‘LIBUNISTRING’ and ‘LTLIBUNISTRING’ to empty.
3988
3989    The complexities that ‘gl_LIBUNISTRING’ deals with are the following:
3990
3991    • On some operating systems, in some configurations, libunistring
3992      depends on ‘libiconv’, and the options for linking with libiconv
3993      must be mentioned explicitly on the link command line.
3994
3995    • GNU ‘libunistring’, if installed, is not necessarily already in the
3996      search path (‘CPPFLAGS’ for the include file search path, ‘LDFLAGS’
3997      for the library search path).
3998
3999    • GNU ‘libunistring’, if installed, is not necessarily already in the
4000      run time library search path.  To avoid the need for setting an
4001      environment variable like ‘LD_LIBRARY_PATH’, the macro adds the
4002      appropriate run time search path options to the ‘LIBUNISTRING’
4003      variable.  This works on most systems.
4004
4005 \1f
4006 File: libunistring.info,  Node: Reporting problems,  Prev: Autoconf macro,  Up: Using the library
4007
4008 16.5 Reporting problems
4009 =======================
4010
4011    If you encounter any problem, please don’t hesitate to send a
4012 detailed bug report to the ‘bug-libunistring@gnu.org’ mailing list.  You
4013 can alternatively also use the bug tracker at the project page
4014 <https://savannah.gnu.org/projects/libunistring>.
4015
4016    Please always include the version number of this library, and a short
4017 description of your operating system and compilation environment with
4018 corresponding version numbers.
4019
4020    For problems that appear while building and installing
4021 ‘libunistring’, for which you don’t find the remedy in the ‘INSTALL’
4022 file, please include a description of the options that you passed to the
4023 ‘configure’ script.
4024
4025 \1f
4026 File: libunistring.info,  Node: More functionality,  Next: Licenses,  Prev: Using the library,  Up: Top
4027
4028 17 More advanced functionality
4029 ******************************
4030
4031    For bidirectional reordering of strings, we recommend the GNU FriBidi
4032 library: <http://www.fribidi.org/>.
4033
4034    For the rendering of Unicode strings outside of the context of a
4035 given toolkit (KDE/Qt or GNOME/Gtk), we recommend the Pango library:
4036 <http://www.pango.org/>.
4037
4038 \1f
4039 File: libunistring.info,  Node: Licenses,  Next: Index,  Prev: More functionality,  Up: Top
4040
4041 Appendix A Licenses
4042 *******************
4043
4044    The files of this package are covered by the licenses indicated in
4045 each particular file or directory.  Here is a summary:
4046
4047    • The ‘libunistring’ library is covered by the GNU Lesser General
4048      Public License (LGPL). A copy of the license is included in *note
4049      GNU LGPL::.
4050
4051    • This manual is free documentation.  It is dually licensed under the
4052      GNU FDL and the GNU GPL. This means that you can redistribute this
4053      manual under either of these two licenses, at your choice.
4054      This manual is covered by the GNU FDL. Permission is granted to
4055      copy, distribute and/or modify this document under the terms of the
4056      GNU Free Documentation License (FDL), either version 1.2 of the
4057      License, or (at your option) any later version published by the
4058      Free Software Foundation (FSF); with no Invariant Sections, with no
4059      Front-Cover Text, and with no Back-Cover Texts.  A copy of the
4060      license is included in *note GNU FDL::.
4061      This manual is covered by the GNU GPL. You can redistribute it
4062      and/or modify it under the terms of the GNU General Public License
4063      (GPL), either version 3 of the License, or (at your option) any
4064      later version published by the Free Software Foundation (FSF). A
4065      copy of the license is included in *note GNU GPL::.
4066
4067 * Menu:
4068
4069 * GNU GPL::                     GNU General Public License
4070 * GNU LGPL::                    GNU Lesser General Public License
4071 * GNU FDL::                     GNU Free Documentation License
4072
4073 \1f
4074 File: libunistring.info,  Node: GNU GPL,  Next: GNU LGPL,  Up: Licenses
4075
4076 A.1 GNU GENERAL PUBLIC LICENSE
4077 ==============================
4078
4079                         Version 3, 29 June 2007
4080
4081      Copyright © 2007 Free Software Foundation, Inc. <http://fsf.org/>
4082
4083      Everyone is permitted to copy and distribute verbatim copies of this
4084      license document, but changing it is not allowed.
4085
4086 Preamble
4087 ========
4088
4089    The GNU General Public License is a free, copyleft license for
4090 software and other kinds of works.
4091
4092    The licenses for most software and other practical works are designed
4093 to take away your freedom to share and change the works.  By contrast,
4094 the GNU General Public License is intended to guarantee your freedom to
4095 share and change all versions of a program—to make sure it remains free
4096 software for all its users.  We, the Free Software Foundation, use the
4097 GNU General Public License for most of our software; it applies also to
4098 any other work released this way by its authors.  You can apply it to
4099 your programs, too.
4100
4101    When we speak of free software, we are referring to freedom, not
4102 price.  Our General Public Licenses are designed to make sure that you
4103 have the freedom to distribute copies of free software (and charge for
4104 them if you wish), that you receive source code or can get it if you
4105 want it, that you can change the software or use pieces of it in new
4106 free programs, and that you know you can do these things.
4107
4108    To protect your rights, we need to prevent others from denying you
4109 these rights or asking you to surrender the rights.  Therefore, you have
4110 certain responsibilities if you distribute copies of the software, or if
4111 you modify it: responsibilities to respect the freedom of others.
4112
4113    For example, if you distribute copies of such a program, whether
4114 gratis or for a fee, you must pass on to the recipients the same
4115 freedoms that you received.  You must make sure that they, too, receive
4116 or can get the source code.  And you must show them these terms so they
4117 know their rights.
4118
4119    Developers that use the GNU GPL protect your rights with two steps:
4120 (1) assert copyright on the software, and (2) offer you this License
4121 giving you legal permission to copy, distribute and/or modify it.
4122
4123    For the developers’ and authors’ protection, the GPL clearly explains
4124 that there is no warranty for this free software.  For both users’ and
4125 authors’ sake, the GPL requires that modified versions be marked as
4126 changed, so that their problems will not be attributed erroneously to
4127 authors of previous versions.
4128
4129    Some devices are designed to deny users access to install or run
4130 modified versions of the software inside them, although the manufacturer
4131 can do so.  This is fundamentally incompatible with the aim of
4132 protecting users’ freedom to change the software.  The systematic
4133 pattern of such abuse occurs in the area of products for individuals to
4134 use, which is precisely where it is most unacceptable.  Therefore, we
4135 have designed this version of the GPL to prohibit the practice for those
4136 products.  If such problems arise substantially in other domains, we
4137 stand ready to extend this provision to those domains in future versions
4138 of the GPL, as needed to protect the freedom of users.
4139
4140    Finally, every program is threatened constantly by software patents.
4141 States should not allow patents to restrict development and use of
4142 software on general-purpose computers, but in those that do, we wish to
4143 avoid the special danger that patents applied to a free program could
4144 make it effectively proprietary.  To prevent this, the GPL assures that
4145 patents cannot be used to render the program non-free.
4146
4147    The precise terms and conditions for copying, distribution and
4148 modification follow.
4149
4150 TERMS AND CONDITIONS
4151 ====================
4152
4153   0. Definitions.
4154
4155      “This License” refers to version 3 of the GNU General Public
4156      License.
4157
4158      “Copyright” also means copyright-like laws that apply to other
4159      kinds of works, such as semiconductor masks.
4160
4161      “The Program” refers to any copyrightable work licensed under this
4162      License.  Each licensee is addressed as “you”.  “Licensees” and
4163      “recipients” may be individuals or organizations.
4164
4165      To “modify” a work means to copy from or adapt all or part of the
4166      work in a fashion requiring copyright permission, other than the
4167      making of an exact copy.  The resulting work is called a “modified
4168      version” of the earlier work or a work “based on” the earlier work.
4169
4170      A “covered work” means either the unmodified Program or a work
4171      based on the Program.
4172
4173      To “propagate” a work means to do anything with it that, without
4174      permission, would make you directly or secondarily liable for
4175      infringement under applicable copyright law, except executing it on
4176      a computer or modifying a private copy.  Propagation includes
4177      copying, distribution (with or without modification), making
4178      available to the public, and in some countries other activities as
4179      well.
4180
4181      To “convey” a work means any kind of propagation that enables other
4182      parties to make or receive copies.  Mere interaction with a user
4183      through a computer network, with no transfer of a copy, is not
4184      conveying.
4185
4186      An interactive user interface displays “Appropriate Legal Notices”
4187      to the extent that it includes a convenient and prominently visible
4188      feature that (1) displays an appropriate copyright notice, and (2)
4189      tells the user that there is no warranty for the work (except to
4190      the extent that warranties are provided), that licensees may convey
4191      the work under this License, and how to view a copy of this
4192      License.  If the interface presents a list of user commands or
4193      options, such as a menu, a prominent item in the list meets this
4194      criterion.
4195
4196   1. Source Code.
4197
4198      The “source code” for a work means the preferred form of the work
4199      for making modifications to it.  “Object code” means any non-source
4200      form of a work.
4201
4202      A “Standard Interface” means an interface that either is an
4203      official standard defined by a recognized standards body, or, in
4204      the case of interfaces specified for a particular programming
4205      language, one that is widely used among developers working in that
4206      language.
4207
4208      The “System Libraries” of an executable work include anything,
4209      other than the work as a whole, that (a) is included in the normal
4210      form of packaging a Major Component, but which is not part of that
4211      Major Component, and (b) serves only to enable use of the work with
4212      that Major Component, or to implement a Standard Interface for
4213      which an implementation is available to the public in source code
4214      form.  A “Major Component”, in this context, means a major
4215      essential component (kernel, window system, and so on) of the
4216      specific operating system (if any) on which the executable work
4217      runs, or a compiler used to produce the work, or an object code
4218      interpreter used to run it.
4219
4220      The “Corresponding Source” for a work in object code form means all
4221      the source code needed to generate, install, and (for an executable
4222      work) run the object code and to modify the work, including scripts
4223      to control those activities.  However, it does not include the
4224      work’s System Libraries, or general-purpose tools or generally
4225      available free programs which are used unmodified in performing
4226      those activities but which are not part of the work.  For example,
4227      Corresponding Source includes interface definition files associated
4228      with source files for the work, and the source code for shared
4229      libraries and dynamically linked subprograms that the work is
4230      specifically designed to require, such as by intimate data
4231      communication or control flow between those subprograms and other
4232      parts of the work.
4233
4234      The Corresponding Source need not include anything that users can
4235      regenerate automatically from other parts of the Corresponding
4236      Source.
4237
4238      The Corresponding Source for a work in source code form is that
4239      same work.
4240
4241   2. Basic Permissions.
4242
4243      All rights granted under this License are granted for the term of
4244      copyright on the Program, and are irrevocable provided the stated
4245      conditions are met.  This License explicitly affirms your unlimited
4246      permission to run the unmodified Program.  The output from running
4247      a covered work is covered by this License only if the output, given
4248      its content, constitutes a covered work.  This License acknowledges
4249      your rights of fair use or other equivalent, as provided by
4250      copyright law.
4251
4252      You may make, run and propagate covered works that you do not
4253      convey, without conditions so long as your license otherwise
4254      remains in force.  You may convey covered works to others for the
4255      sole purpose of having them make modifications exclusively for you,
4256      or provide you with facilities for running those works, provided
4257      that you comply with the terms of this License in conveying all
4258      material for which you do not control copyright.  Those thus making
4259      or running the covered works for you must do so exclusively on your
4260      behalf, under your direction and control, on terms that prohibit
4261      them from making any copies of your copyrighted material outside
4262      their relationship with you.
4263
4264      Conveying under any other circumstances is permitted solely under
4265      the conditions stated below.  Sublicensing is not allowed; section
4266      10 makes it unnecessary.
4267
4268   3. Protecting Users’ Legal Rights From Anti-Circumvention Law.
4269
4270      No covered work shall be deemed part of an effective technological
4271      measure under any applicable law fulfilling obligations under
4272      article 11 of the WIPO copyright treaty adopted on 20 December
4273      1996, or similar laws prohibiting or restricting circumvention of
4274      such measures.
4275
4276      When you convey a covered work, you waive any legal power to forbid
4277      circumvention of technological measures to the extent such
4278      circumvention is effected by exercising rights under this License
4279      with respect to the covered work, and you disclaim any intention to
4280      limit operation or modification of the work as a means of
4281      enforcing, against the work’s users, your or third parties’ legal
4282      rights to forbid circumvention of technological measures.
4283
4284   4. Conveying Verbatim Copies.
4285
4286      You may convey verbatim copies of the Program’s source code as you
4287      receive it, in any medium, provided that you conspicuously and
4288      appropriately publish on each copy an appropriate copyright notice;
4289      keep intact all notices stating that this License and any
4290      non-permissive terms added in accord with section 7 apply to the
4291      code; keep intact all notices of the absence of any warranty; and
4292      give all recipients a copy of this License along with the Program.
4293
4294      You may charge any price or no price for each copy that you convey,
4295      and you may offer support or warranty protection for a fee.
4296
4297   5. Conveying Modified Source Versions.
4298
4299      You may convey a work based on the Program, or the modifications to
4300      produce it from the Program, in the form of source code under the
4301      terms of section 4, provided that you also meet all of these
4302      conditions:
4303
4304        a. The work must carry prominent notices stating that you
4305           modified it, and giving a relevant date.
4306
4307        b. The work must carry prominent notices stating that it is
4308           released under this License and any conditions added under
4309           section 7.  This requirement modifies the requirement in
4310           section 4 to “keep intact all notices”.
4311
4312        c. You must license the entire work, as a whole, under this
4313           License to anyone who comes into possession of a copy.  This
4314           License will therefore apply, along with any applicable
4315           section 7 additional terms, to the whole of the work, and all
4316           its parts, regardless of how they are packaged.  This License
4317           gives no permission to license the work in any other way, but
4318           it does not invalidate such permission if you have separately
4319           received it.
4320
4321        d. If the work has interactive user interfaces, each must display
4322           Appropriate Legal Notices; however, if the Program has
4323           interactive interfaces that do not display Appropriate Legal
4324           Notices, your work need not make them do so.
4325
4326      A compilation of a covered work with other separate and independent
4327      works, which are not by their nature extensions of the covered
4328      work, and which are not combined with it such as to form a larger
4329      program, in or on a volume of a storage or distribution medium, is
4330      called an “aggregate” if the compilation and its resulting
4331      copyright are not used to limit the access or legal rights of the
4332      compilation’s users beyond what the individual works permit.
4333      Inclusion of a covered work in an aggregate does not cause this
4334      License to apply to the other parts of the aggregate.
4335
4336   6. Conveying Non-Source Forms.
4337
4338      You may convey a covered work in object code form under the terms
4339      of sections 4 and 5, provided that you also convey the
4340      machine-readable Corresponding Source under the terms of this
4341      License, in one of these ways:
4342
4343        a. Convey the object code in, or embodied in, a physical product
4344           (including a physical distribution medium), accompanied by the
4345           Corresponding Source fixed on a durable physical medium
4346           customarily used for software interchange.
4347
4348        b. Convey the object code in, or embodied in, a physical product
4349           (including a physical distribution medium), accompanied by a
4350           written offer, valid for at least three years and valid for as
4351           long as you offer spare parts or customer support for that
4352           product model, to give anyone who possesses the object code
4353           either (1) a copy of the Corresponding Source for all the
4354           software in the product that is covered by this License, on a
4355           durable physical medium customarily used for software
4356           interchange, for a price no more than your reasonable cost of
4357           physically performing this conveying of source, or (2) access
4358           to copy the Corresponding Source from a network server at no
4359           charge.
4360
4361        c. Convey individual copies of the object code with a copy of the
4362           written offer to provide the Corresponding Source.  This
4363           alternative is allowed only occasionally and noncommercially,
4364           and only if you received the object code with such an offer,
4365           in accord with subsection 6b.
4366
4367        d. Convey the object code by offering access from a designated
4368           place (gratis or for a charge), and offer equivalent access to
4369           the Corresponding Source in the same way through the same
4370           place at no further charge.  You need not require recipients
4371           to copy the Corresponding Source along with the object code.
4372           If the place to copy the object code is a network server, the
4373           Corresponding Source may be on a different server (operated by
4374           you or a third party) that supports equivalent copying
4375           facilities, provided you maintain clear directions next to the
4376           object code saying where to find the Corresponding Source.
4377           Regardless of what server hosts the Corresponding Source, you
4378           remain obligated to ensure that it is available for as long as
4379           needed to satisfy these requirements.
4380
4381        e. Convey the object code using peer-to-peer transmission,
4382           provided you inform other peers where the object code and
4383           Corresponding Source of the work are being offered to the
4384           general public at no charge under subsection 6d.
4385
4386      A separable portion of the object code, whose source code is
4387      excluded from the Corresponding Source as a System Library, need
4388      not be included in conveying the object code work.
4389
4390      A “User Product” is either (1) a “consumer product”, which means
4391      any tangible personal property which is normally used for personal,
4392      family, or household purposes, or (2) anything designed or sold for
4393      incorporation into a dwelling.  In determining whether a product is
4394      a consumer product, doubtful cases shall be resolved in favor of
4395      coverage.  For a particular product received by a particular user,
4396      “normally used” refers to a typical or common use of that class of
4397      product, regardless of the status of the particular user or of the
4398      way in which the particular user actually uses, or expects or is
4399      expected to use, the product.  A product is a consumer product
4400      regardless of whether the product has substantial commercial,
4401      industrial or non-consumer uses, unless such uses represent the
4402      only significant mode of use of the product.
4403
4404      “Installation Information” for a User Product means any methods,
4405      procedures, authorization keys, or other information required to
4406      install and execute modified versions of a covered work in that
4407      User Product from a modified version of its Corresponding Source.
4408      The information must suffice to ensure that the continued
4409      functioning of the modified object code is in no case prevented or
4410      interfered with solely because modification has been made.
4411
4412      If you convey an object code work under this section in, or with,
4413      or specifically for use in, a User Product, and the conveying
4414      occurs as part of a transaction in which the right of possession
4415      and use of the User Product is transferred to the recipient in
4416      perpetuity or for a fixed term (regardless of how the transaction
4417      is characterized), the Corresponding Source conveyed under this
4418      section must be accompanied by the Installation Information.  But
4419      this requirement does not apply if neither you nor any third party
4420      retains the ability to install modified object code on the User
4421      Product (for example, the work has been installed in ROM).
4422
4423      The requirement to provide Installation Information does not
4424      include a requirement to continue to provide support service,
4425      warranty, or updates for a work that has been modified or installed
4426      by the recipient, or for the User Product in which it has been
4427      modified or installed.  Access to a network may be denied when the
4428      modification itself materially and adversely affects the operation
4429      of the network or violates the rules and protocols for
4430      communication across the network.
4431
4432      Corresponding Source conveyed, and Installation Information
4433      provided, in accord with this section must be in a format that is
4434      publicly documented (and with an implementation available to the
4435      public in source code form), and must require no special password
4436      or key for unpacking, reading or copying.
4437
4438   7. Additional Terms.
4439
4440      “Additional permissions” are terms that supplement the terms of
4441      this License by making exceptions from one or more of its
4442      conditions.  Additional permissions that are applicable to the
4443      entire Program shall be treated as though they were included in
4444      this License, to the extent that they are valid under applicable
4445      law.  If additional permissions apply only to part of the Program,
4446      that part may be used separately under those permissions, but the
4447      entire Program remains governed by this License without regard to
4448      the additional permissions.
4449
4450      When you convey a copy of a covered work, you may at your option
4451      remove any additional permissions from that copy, or from any part
4452      of it.  (Additional permissions may be written to require their own
4453      removal in certain cases when you modify the work.)  You may place
4454      additional permissions on material, added by you to a covered work,
4455      for which you have or can give appropriate copyright permission.
4456
4457      Notwithstanding any other provision of this License, for material
4458      you add to a covered work, you may (if authorized by the copyright
4459      holders of that material) supplement the terms of this License with
4460      terms:
4461
4462        a. Disclaiming warranty or limiting liability differently from
4463           the terms of sections 15 and 16 of this License; or
4464
4465        b. Requiring preservation of specified reasonable legal notices
4466           or author attributions in that material or in the Appropriate
4467           Legal Notices displayed by works containing it; or
4468
4469        c. Prohibiting misrepresentation of the origin of that material,
4470           or requiring that modified versions of such material be marked
4471           in reasonable ways as different from the original version; or
4472
4473        d. Limiting the use for publicity purposes of names of licensors
4474           or authors of the material; or
4475
4476        e. Declining to grant rights under trademark law for use of some
4477           trade names, trademarks, or service marks; or
4478
4479        f. Requiring indemnification of licensors and authors of that
4480           material by anyone who conveys the material (or modified
4481           versions of it) with contractual assumptions of liability to
4482           the recipient, for any liability that these contractual
4483           assumptions directly impose on those licensors and authors.
4484
4485      All other non-permissive additional terms are considered “further
4486      restrictions” within the meaning of section 10.  If the Program as
4487      you received it, or any part of it, contains a notice stating that
4488      it is governed by this License along with a term that is a further
4489      restriction, you may remove that term.  If a license document
4490      contains a further restriction but permits relicensing or conveying
4491      under this License, you may add to a covered work material governed
4492      by the terms of that license document, provided that the further
4493      restriction does not survive such relicensing or conveying.
4494
4495      If you add terms to a covered work in accord with this section, you
4496      must place, in the relevant source files, a statement of the
4497      additional terms that apply to those files, or a notice indicating
4498      where to find the applicable terms.
4499
4500      Additional terms, permissive or non-permissive, may be stated in
4501      the form of a separately written license, or stated as exceptions;
4502      the above requirements apply either way.
4503
4504   8. Termination.
4505
4506      You may not propagate or modify a covered work except as expressly
4507      provided under this License.  Any attempt otherwise to propagate or
4508      modify it is void, and will automatically terminate your rights
4509      under this License (including any patent licenses granted under the
4510      third paragraph of section 11).
4511
4512      However, if you cease all violation of this License, then your
4513      license from a particular copyright holder is reinstated (a)
4514      provisionally, unless and until the copyright holder explicitly and
4515      finally terminates your license, and (b) permanently, if the
4516      copyright holder fails to notify you of the violation by some
4517      reasonable means prior to 60 days after the cessation.
4518
4519      Moreover, your license from a particular copyright holder is
4520      reinstated permanently if the copyright holder notifies you of the
4521      violation by some reasonable means, this is the first time you have
4522      received notice of violation of this License (for any work) from
4523      that copyright holder, and you cure the violation prior to 30 days
4524      after your receipt of the notice.
4525
4526      Termination of your rights under this section does not terminate
4527      the licenses of parties who have received copies or rights from you
4528      under this License.  If your rights have been terminated and not
4529      permanently reinstated, you do not qualify to receive new licenses
4530      for the same material under section 10.
4531
4532   9. Acceptance Not Required for Having Copies.
4533
4534      You are not required to accept this License in order to receive or
4535      run a copy of the Program.  Ancillary propagation of a covered work
4536      occurring solely as a consequence of using peer-to-peer
4537      transmission to receive a copy likewise does not require
4538      acceptance.  However, nothing other than this License grants you
4539      permission to propagate or modify any covered work.  These actions
4540      infringe copyright if you do not accept this License.  Therefore,
4541      by modifying or propagating a covered work, you indicate your
4542      acceptance of this License to do so.
4543
4544   10. Automatic Licensing of Downstream Recipients.
4545
4546      Each time you convey a covered work, the recipient automatically
4547      receives a license from the original licensors, to run, modify and
4548      propagate that work, subject to this License.  You are not
4549      responsible for enforcing compliance by third parties with this
4550      License.
4551
4552      An “entity transaction” is a transaction transferring control of an
4553      organization, or substantially all assets of one, or subdividing an
4554      organization, or merging organizations.  If propagation of a
4555      covered work results from an entity transaction, each party to that
4556      transaction who receives a copy of the work also receives whatever
4557      licenses to the work the party’s predecessor in interest had or
4558      could give under the previous paragraph, plus a right to possession
4559      of the Corresponding Source of the work from the predecessor in
4560      interest, if the predecessor has it or can get it with reasonable
4561      efforts.
4562
4563      You may not impose any further restrictions on the exercise of the
4564      rights granted or affirmed under this License.  For example, you
4565      may not impose a license fee, royalty, or other charge for exercise
4566      of rights granted under this License, and you may not initiate
4567      litigation (including a cross-claim or counterclaim in a lawsuit)
4568      alleging that any patent claim is infringed by making, using,
4569      selling, offering for sale, or importing the Program or any portion
4570      of it.
4571
4572   11. Patents.
4573
4574      A “contributor” is a copyright holder who authorizes use under this
4575      License of the Program or a work on which the Program is based.
4576      The work thus licensed is called the contributor’s “contributor
4577      version”.
4578
4579      A contributor’s “essential patent claims” are all patent claims
4580      owned or controlled by the contributor, whether already acquired or
4581      hereafter acquired, that would be infringed by some manner,
4582      permitted by this License, of making, using, or selling its
4583      contributor version, but do not include claims that would be
4584      infringed only as a consequence of further modification of the
4585      contributor version.  For purposes of this definition, “control”
4586      includes the right to grant patent sublicenses in a manner
4587      consistent with the requirements of this License.
4588
4589      Each contributor grants you a non-exclusive, worldwide,
4590      royalty-free patent license under the contributor’s essential
4591      patent claims, to make, use, sell, offer for sale, import and
4592      otherwise run, modify and propagate the contents of its contributor
4593      version.
4594
4595      In the following three paragraphs, a “patent license” is any
4596      express agreement or commitment, however denominated, not to
4597      enforce a patent (such as an express permission to practice a
4598      patent or covenant not to sue for patent infringement).  To “grant”
4599      such a patent license to a party means to make such an agreement or
4600      commitment not to enforce a patent against the party.
4601
4602      If you convey a covered work, knowingly relying on a patent
4603      license, and the Corresponding Source of the work is not available
4604      for anyone to copy, free of charge and under the terms of this
4605      License, through a publicly available network server or other
4606      readily accessible means, then you must either (1) cause the
4607      Corresponding Source to be so available, or (2) arrange to deprive
4608      yourself of the benefit of the patent license for this particular
4609      work, or (3) arrange, in a manner consistent with the requirements
4610      of this License, to extend the patent license to downstream
4611      recipients.  “Knowingly relying” means you have actual knowledge
4612      that, but for the patent license, your conveying the covered work
4613      in a country, or your recipient’s use of the covered work in a
4614      country, would infringe one or more identifiable patents in that
4615      country that you have reason to believe are valid.
4616
4617      If, pursuant to or in connection with a single transaction or
4618      arrangement, you convey, or propagate by procuring conveyance of, a
4619      covered work, and grant a patent license to some of the parties
4620      receiving the covered work authorizing them to use, propagate,
4621      modify or convey a specific copy of the covered work, then the
4622      patent license you grant is automatically extended to all
4623      recipients of the covered work and works based on it.
4624
4625      A patent license is “discriminatory” if it does not include within
4626      the scope of its coverage, prohibits the exercise of, or is
4627      conditioned on the non-exercise of one or more of the rights that
4628      are specifically granted under this License.  You may not convey a
4629      covered work if you are a party to an arrangement with a third
4630      party that is in the business of distributing software, under which
4631      you make payment to the third party based on the extent of your
4632      activity of conveying the work, and under which the third party
4633      grants, to any of the parties who would receive the covered work
4634      from you, a discriminatory patent license (a) in connection with
4635      copies of the covered work conveyed by you (or copies made from
4636      those copies), or (b) primarily for and in connection with specific
4637      products or compilations that contain the covered work, unless you
4638      entered into that arrangement, or that patent license was granted,
4639      prior to 28 March 2007.
4640
4641      Nothing in this License shall be construed as excluding or limiting
4642      any implied license or other defenses to infringement that may
4643      otherwise be available to you under applicable patent law.
4644
4645   12. No Surrender of Others’ Freedom.
4646
4647      If conditions are imposed on you (whether by court order, agreement
4648      or otherwise) that contradict the conditions of this License, they
4649      do not excuse you from the conditions of this License.  If you
4650      cannot convey a covered work so as to satisfy simultaneously your
4651      obligations under this License and any other pertinent obligations,
4652      then as a consequence you may not convey it at all.  For example,
4653      if you agree to terms that obligate you to collect a royalty for
4654      further conveying from those to whom you convey the Program, the
4655      only way you could satisfy both those terms and this License would
4656      be to refrain entirely from conveying the Program.
4657
4658   13. Use with the GNU Affero General Public License.
4659
4660      Notwithstanding any other provision of this License, you have
4661      permission to link or combine any covered work with a work licensed
4662      under version 3 of the GNU Affero General Public License into a
4663      single combined work, and to convey the resulting work.  The terms
4664      of this License will continue to apply to the part which is the
4665      covered work, but the special requirements of the GNU Affero
4666      General Public License, section 13, concerning interaction through
4667      a network will apply to the combination as such.
4668
4669   14. Revised Versions of this License.
4670
4671      The Free Software Foundation may publish revised and/or new
4672      versions of the GNU General Public License from time to time.  Such
4673      new versions will be similar in spirit to the present version, but
4674      may differ in detail to address new problems or concerns.
4675
4676      Each version is given a distinguishing version number.  If the
4677      Program specifies that a certain numbered version of the GNU
4678      General Public License “or any later version” applies to it, you
4679      have the option of following the terms and conditions either of
4680      that numbered version or of any later version published by the Free
4681      Software Foundation.  If the Program does not specify a version
4682      number of the GNU General Public License, you may choose any
4683      version ever published by the Free Software Foundation.
4684
4685      If the Program specifies that a proxy can decide which future
4686      versions of the GNU General Public License can be used, that
4687      proxy’s public statement of acceptance of a version permanently
4688      authorizes you to choose that version for the Program.
4689
4690      Later license versions may give you additional or different
4691      permissions.  However, no additional obligations are imposed on any
4692      author or copyright holder as a result of your choosing to follow a
4693      later version.
4694
4695   15. Disclaimer of Warranty.
4696
4697      THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
4698      APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE
4699      COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM “AS IS”
4700      WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED,
4701      INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
4702      MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE
4703      RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.
4704      SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL
4705      NECESSARY SERVICING, REPAIR OR CORRECTION.
4706
4707   16. Limitation of Liability.
4708
4709      IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
4710      WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES
4711      AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR
4712      DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR
4713      CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE
4714      THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA
4715      BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
4716      PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
4717      PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF
4718      THE POSSIBILITY OF SUCH DAMAGES.
4719
4720   17. Interpretation of Sections 15 and 16.
4721
4722      If the disclaimer of warranty and limitation of liability provided
4723      above cannot be given local legal effect according to their terms,
4724      reviewing courts shall apply local law that most closely
4725      approximates an absolute waiver of all civil liability in
4726      connection with the Program, unless a warranty or assumption of
4727      liability accompanies a copy of the Program in return for a fee.
4728
4729 END OF TERMS AND CONDITIONS
4730 ===========================
4731
4732 How to Apply These Terms to Your New Programs
4733 =============================================
4734
4735    If you develop a new program, and you want it to be of the greatest
4736 possible use to the public, the best way to achieve this is to make it
4737 free software which everyone can redistribute and change under these
4738 terms.
4739
4740    To do so, attach the following notices to the program.  It is safest
4741 to attach them to the start of each source file to most effectively
4742 state the exclusion of warranty; and each file should have at least the
4743 “copyright” line and a pointer to where the full notice is found.
4744
4745      ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES.
4746      Copyright (C) YEAR NAME OF AUTHOR
4747
4748      This program is free software: you can redistribute it and/or modify
4749      it under the terms of the GNU General Public License as published by
4750      the Free Software Foundation, either version 3 of the License, or (at
4751      your option) any later version.
4752
4753      This program is distributed in the hope that it will be useful, but
4754      WITHOUT ANY WARRANTY; without even the implied warranty of
4755      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
4756      General Public License for more details.
4757
4758      You should have received a copy of the GNU General Public License
4759      along with this program.  If not, see <http://www.gnu.org/licenses/>.
4760
4761    Also add information on how to contact you by electronic and paper
4762 mail.
4763
4764    If the program does terminal interaction, make it output a short
4765 notice like this when it starts in an interactive mode:
4766
4767      PROGRAM Copyright (C) YEAR NAME OF AUTHOR
4768      This program comes with ABSOLUTELY NO WARRANTY; for details type ‘show w’.
4769      This is free software, and you are welcome to redistribute it
4770      under certain conditions; type ‘show c’ for details.
4771
4772    The hypothetical commands ‘show w’ and ‘show c’ should show the
4773 appropriate parts of the General Public License.  Of course, your
4774 program’s commands might be different; for a GUI interface, you would
4775 use an “about box”.
4776
4777    You should also get your employer (if you work as a programmer) or
4778 school, if any, to sign a “copyright disclaimer” for the program, if
4779 necessary.  For more information on this, and how to apply and follow
4780 the GNU GPL, see <http://www.gnu.org/licenses/>.
4781
4782    The GNU General Public License does not permit incorporating your
4783 program into proprietary programs.  If your program is a subroutine
4784 library, you may consider it more useful to permit linking proprietary
4785 applications with the library.  If this is what you want to do, use the
4786 GNU Lesser General Public License instead of this License.  But first,
4787 please read <http://www.gnu.org/philosophy/why-not-lgpl.html>.
4788
4789 \1f
4790 File: libunistring.info,  Node: GNU LGPL,  Next: GNU FDL,  Prev: GNU GPL,  Up: Licenses
4791
4792 A.2 GNU LESSER GENERAL PUBLIC LICENSE
4793 =====================================
4794
4795                         Version 3, 29 June 2007
4796
4797      Copyright © 2007 Free Software Foundation, Inc. <http://fsf.org/>
4798
4799      Everyone is permitted to copy and distribute verbatim copies of this
4800      license document, but changing it is not allowed.
4801
4802    This version of the GNU Lesser General Public License incorporates
4803 the terms and conditions of version 3 of the GNU General Public License,
4804 supplemented by the additional permissions listed below.
4805
4806   0. Additional Definitions.
4807
4808      As used herein, “this License” refers to version 3 of the GNU
4809      Lesser General Public License, and the “GNU GPL” refers to version
4810      3 of the GNU General Public License.
4811
4812      “The Library” refers to a covered work governed by this License,
4813      other than an Application or a Combined Work as defined below.
4814
4815      An “Application” is any work that makes use of an interface
4816      provided by the Library, but which is not otherwise based on the
4817      Library.  Defining a subclass of a class defined by the Library is
4818      deemed a mode of using an interface provided by the Library.
4819
4820      A “Combined Work” is a work produced by combining or linking an
4821      Application with the Library.  The particular version of the
4822      Library with which the Combined Work was made is also called the
4823      “Linked Version”.
4824
4825      The “Minimal Corresponding Source” for a Combined Work means the
4826      Corresponding Source for the Combined Work, excluding any source
4827      code for portions of the Combined Work that, considered in
4828      isolation, are based on the Application, and not on the Linked
4829      Version.
4830
4831      The “Corresponding Application Code” for a Combined Work means the
4832      object code and/or source code for the Application, including any
4833      data and utility programs needed for reproducing the Combined Work
4834      from the Application, but excluding the System Libraries of the
4835      Combined Work.
4836
4837   1. Exception to Section 3 of the GNU GPL.
4838
4839      You may convey a covered work under sections 3 and 4 of this
4840      License without being bound by section 3 of the GNU GPL.
4841
4842   2. Conveying Modified Versions.
4843
4844      If you modify a copy of the Library, and, in your modifications, a
4845      facility refers to a function or data to be supplied by an
4846      Application that uses the facility (other than as an argument
4847      passed when the facility is invoked), then you may convey a copy of
4848      the modified version:
4849
4850        a. under this License, provided that you make a good faith effort
4851           to ensure that, in the event an Application does not supply
4852           the function or data, the facility still operates, and
4853           performs whatever part of its purpose remains meaningful, or
4854
4855        b. under the GNU GPL, with none of the additional permissions of
4856           this License applicable to that copy.
4857
4858   3. Object Code Incorporating Material from Library Header Files.
4859
4860      The object code form of an Application may incorporate material
4861      from a header file that is part of the Library.  You may convey
4862      such object code under terms of your choice, provided that, if the
4863      incorporated material is not limited to numerical parameters, data
4864      structure layouts and accessors, or small macros, inline functions
4865      and templates (ten or fewer lines in length), you do both of the
4866      following:
4867
4868        a. Give prominent notice with each copy of the object code that
4869           the Library is used in it and that the Library and its use are
4870           covered by this License.
4871        b. Accompany the object code with a copy of the GNU GPL and this
4872           license document.
4873
4874   4. Combined Works.
4875
4876      You may convey a Combined Work under terms of your choice that,
4877      taken together, effectively do not restrict modification of the
4878      portions of the Library contained in the Combined Work and reverse
4879      engineering for debugging such modifications, if you also do each
4880      of the following:
4881
4882        a. Give prominent notice with each copy of the Combined Work that
4883           the Library is used in it and that the Library and its use are
4884           covered by this License.
4885        b. Accompany the Combined Work with a copy of the GNU GPL and
4886           this license document.
4887        c. For a Combined Work that displays copyright notices during
4888           execution, include the copyright notice for the Library among
4889           these notices, as well as a reference directing the user to
4890           the copies of the GNU GPL and this license document.
4891        d. Do one of the following:
4892
4893             0. Convey the Minimal Corresponding Source under the terms
4894                of this License, and the Corresponding Application Code
4895                in a form suitable for, and under terms that permit, the
4896                user to recombine or relink the Application with a
4897                modified version of the Linked Version to produce a
4898                modified Combined Work, in the manner specified by
4899                section 6 of the GNU GPL for conveying Corresponding
4900                Source.
4901             1. Use a suitable shared library mechanism for linking with
4902                the Library.  A suitable mechanism is one that (a) uses
4903                at run time a copy of the Library already present on the
4904                user’s computer system, and (b) will operate properly
4905                with a modified version of the Library that is
4906                interface-compatible with the Linked Version.
4907
4908        e. Provide Installation Information, but only if you would
4909           otherwise be required to provide such information under
4910           section 6 of the GNU GPL, and only to the extent that such
4911           information is necessary to install and execute a modified
4912           version of the Combined Work produced by recombining or
4913           relinking the Application with a modified version of the
4914           Linked Version.  (If you use option 4d0, the Installation
4915           Information must accompany the Minimal Corresponding Source
4916           and Corresponding Application Code.  If you use option 4d1,
4917           you must provide the Installation Information in the manner
4918           specified by section 6 of the GNU GPL for conveying
4919           Corresponding Source.)
4920
4921   5. Combined Libraries.
4922
4923      You may place library facilities that are a work based on the
4924      Library side by side in a single library together with other
4925      library facilities that are not Applications and are not covered by
4926      this License, and convey such a combined library under terms of
4927      your choice, if you do both of the following:
4928
4929        a. Accompany the combined library with a copy of the same work
4930           based on the Library, uncombined with any other library
4931           facilities, conveyed under the terms of this License.
4932        b. Give prominent notice with the combined library that part of
4933           it is a work based on the Library, and explaining where to
4934           find the accompanying uncombined form of the same work.
4935
4936   6. Revised Versions of the GNU Lesser General Public License.
4937
4938      The Free Software Foundation may publish revised and/or new
4939      versions of the GNU Lesser General Public License from time to
4940      time.  Such new versions will be similar in spirit to the present
4941      version, but may differ in detail to address new problems or
4942      concerns.
4943
4944      Each version is given a distinguishing version number.  If the
4945      Library as you received it specifies that a certain numbered
4946      version of the GNU Lesser General Public License “or any later
4947      version” applies to it, you have the option of following the terms
4948      and conditions either of that published version or of any later
4949      version published by the Free Software Foundation.  If the Library
4950      as you received it does not specify a version number of the GNU
4951      Lesser General Public License, you may choose any version of the
4952      GNU Lesser General Public License ever published by the Free
4953      Software Foundation.
4954
4955      If the Library as you received it specifies that a proxy can decide
4956      whether future versions of the GNU Lesser General Public License
4957      shall apply, that proxy’s public statement of acceptance of any
4958      version is permanent authorization for you to choose that version
4959      for the Library.
4960
4961 \1f
4962 File: libunistring.info,  Node: GNU FDL,  Prev: GNU LGPL,  Up: Licenses
4963
4964 A.3 GNU Free Documentation License
4965 ==================================
4966
4967                      Version 1.3, 3 November 2008
4968
4969      Copyright © 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc.
4970      <http://fsf.org/>
4971
4972      Everyone is permitted to copy and distribute verbatim copies
4973      of this license document, but changing it is not allowed.
4974
4975   0. PREAMBLE
4976
4977      The purpose of this License is to make a manual, textbook, or other
4978      functional and useful document "free" in the sense of freedom: to
4979      assure everyone the effective freedom to copy and redistribute it,
4980      with or without modifying it, either commercially or
4981      noncommercially.  Secondarily, this License preserves for the
4982      author and publisher a way to get credit for their work, while not
4983      being considered responsible for modifications made by others.
4984
4985      This License is a kind of “copyleft”, which means that derivative
4986      works of the document must themselves be free in the same sense.
4987      It complements the GNU General Public License, which is a copyleft
4988      license designed for free software.
4989
4990      We have designed this License in order to use it for manuals for
4991      free software, because free software needs free documentation: a
4992      free program should come with manuals providing the same freedoms
4993      that the software does.  But this License is not limited to
4994      software manuals; it can be used for any textual work, regardless
4995      of subject matter or whether it is published as a printed book.  We
4996      recommend this License principally for works whose purpose is
4997      instruction or reference.
4998
4999   1. APPLICABILITY AND DEFINITIONS
5000
5001      This License applies to any manual or other work, in any medium,
5002      that contains a notice placed by the copyright holder saying it can
5003      be distributed under the terms of this License.  Such a notice
5004      grants a world-wide, royalty-free license, unlimited in duration,
5005      to use that work under the conditions stated herein.  The
5006      “Document”, below, refers to any such manual or work.  Any member
5007      of the public is a licensee, and is addressed as “you”.  You accept
5008      the license if you copy, modify or distribute the work in a way
5009      requiring permission under copyright law.
5010
5011      A “Modified Version” of the Document means any work containing the
5012      Document or a portion of it, either copied verbatim, or with
5013      modifications and/or translated into another language.
5014
5015      A “Secondary Section” is a named appendix or a front-matter section
5016      of the Document that deals exclusively with the relationship of the
5017      publishers or authors of the Document to the Document’s overall
5018      subject (or to related matters) and contains nothing that could
5019      fall directly within that overall subject.  (Thus, if the Document
5020      is in part a textbook of mathematics, a Secondary Section may not
5021      explain any mathematics.)  The relationship could be a matter of
5022      historical connection with the subject or with related matters, or
5023      of legal, commercial, philosophical, ethical or political position
5024      regarding them.
5025
5026      The “Invariant Sections” are certain Secondary Sections whose
5027      titles are designated, as being those of Invariant Sections, in the
5028      notice that says that the Document is released under this License.
5029      If a section does not fit the above definition of Secondary then it
5030      is not allowed to be designated as Invariant.  The Document may
5031      contain zero Invariant Sections.  If the Document does not identify
5032      any Invariant Sections then there are none.
5033
5034      The “Cover Texts” are certain short passages of text that are
5035      listed, as Front-Cover Texts or Back-Cover Texts, in the notice
5036      that says that the Document is released under this License.  A
5037      Front-Cover Text may be at most 5 words, and a Back-Cover Text may
5038      be at most 25 words.
5039
5040      A “Transparent” copy of the Document means a machine-readable copy,
5041      represented in a format whose specification is available to the
5042      general public, that is suitable for revising the document
5043      straightforwardly with generic text editors or (for images composed
5044      of pixels) generic paint programs or (for drawings) some widely
5045      available drawing editor, and that is suitable for input to text
5046      formatters or for automatic translation to a variety of formats
5047      suitable for input to text formatters.  A copy made in an otherwise
5048      Transparent file format whose markup, or absence of markup, has
5049      been arranged to thwart or discourage subsequent modification by
5050      readers is not Transparent.  An image format is not Transparent if
5051      used for any substantial amount of text.  A copy that is not
5052      “Transparent” is called “Opaque”.
5053
5054      Examples of suitable formats for Transparent copies include plain
5055      ASCII without markup, Texinfo input format, LaTeX input format,
5056      SGML or XML using a publicly available DTD, and standard-conforming
5057      simple HTML, PostScript or PDF designed for human modification.
5058      Examples of transparent image formats include PNG, XCF and JPG.
5059      Opaque formats include proprietary formats that can be read and
5060      edited only by proprietary word processors, SGML or XML for which
5061      the DTD and/or processing tools are not generally available, and
5062      the machine-generated HTML, PostScript or PDF produced by some word
5063      processors for output purposes only.
5064
5065      The “Title Page” means, for a printed book, the title page itself,
5066      plus such following pages as are needed to hold, legibly, the
5067      material this License requires to appear in the title page.  For
5068      works in formats which do not have any title page as such, “Title
5069      Page” means the text near the most prominent appearance of the
5070      work’s title, preceding the beginning of the body of the text.
5071
5072      The “publisher” means any person or entity that distributes copies
5073      of the Document to the public.
5074
5075      A section “Entitled XYZ” means a named subunit of the Document
5076      whose title either is precisely XYZ or contains XYZ in parentheses
5077      following text that translates XYZ in another language.  (Here XYZ
5078      stands for a specific section name mentioned below, such as
5079      “Acknowledgements”, “Dedications”, “Endorsements”, or “History”.)
5080      To “Preserve the Title” of such a section when you modify the
5081      Document means that it remains a section “Entitled XYZ” according
5082      to this definition.
5083
5084      The Document may include Warranty Disclaimers next to the notice
5085      which states that this License applies to the Document.  These
5086      Warranty Disclaimers are considered to be included by reference in
5087      this License, but only as regards disclaiming warranties: any other
5088      implication that these Warranty Disclaimers may have is void and
5089      has no effect on the meaning of this License.
5090
5091   2. VERBATIM COPYING
5092
5093      You may copy and distribute the Document in any medium, either
5094      commercially or noncommercially, provided that this License, the
5095      copyright notices, and the license notice saying this License
5096      applies to the Document are reproduced in all copies, and that you
5097      add no other conditions whatsoever to those of this License.  You
5098      may not use technical measures to obstruct or control the reading
5099      or further copying of the copies you make or distribute.  However,
5100      you may accept compensation in exchange for copies.  If you
5101      distribute a large enough number of copies you must also follow the
5102      conditions in section 3.
5103
5104      You may also lend copies, under the same conditions stated above,
5105      and you may publicly display copies.
5106
5107   3. COPYING IN QUANTITY
5108
5109      If you publish printed copies (or copies in media that commonly
5110      have printed covers) of the Document, numbering more than 100, and
5111      the Document’s license notice requires Cover Texts, you must
5112      enclose the copies in covers that carry, clearly and legibly, all
5113      these Cover Texts: Front-Cover Texts on the front cover, and
5114      Back-Cover Texts on the back cover.  Both covers must also clearly
5115      and legibly identify you as the publisher of these copies.  The
5116      front cover must present the full title with all words of the title
5117      equally prominent and visible.  You may add other material on the
5118      covers in addition.  Copying with changes limited to the covers, as
5119      long as they preserve the title of the Document and satisfy these
5120      conditions, can be treated as verbatim copying in other respects.
5121
5122      If the required texts for either cover are too voluminous to fit
5123      legibly, you should put the first ones listed (as many as fit
5124      reasonably) on the actual cover, and continue the rest onto
5125      adjacent pages.
5126
5127      If you publish or distribute Opaque copies of the Document
5128      numbering more than 100, you must either include a machine-readable
5129      Transparent copy along with each Opaque copy, or state in or with
5130      each Opaque copy a computer-network location from which the general
5131      network-using public has access to download using public-standard
5132      network protocols a complete Transparent copy of the Document, free
5133      of added material.  If you use the latter option, you must take
5134      reasonably prudent steps, when you begin distribution of Opaque
5135      copies in quantity, to ensure that this Transparent copy will
5136      remain thus accessible at the stated location until at least one
5137      year after the last time you distribute an Opaque copy (directly or
5138      through your agents or retailers) of that edition to the public.
5139
5140      It is requested, but not required, that you contact the authors of
5141      the Document well before redistributing any large number of copies,
5142      to give them a chance to provide you with an updated version of the
5143      Document.
5144
5145   4. MODIFICATIONS
5146
5147      You may copy and distribute a Modified Version of the Document
5148      under the conditions of sections 2 and 3 above, provided that you
5149      release the Modified Version under precisely this License, with the
5150      Modified Version filling the role of the Document, thus licensing
5151      distribution and modification of the Modified Version to whoever
5152      possesses a copy of it.  In addition, you must do these things in
5153      the Modified Version:
5154
5155        A. Use in the Title Page (and on the covers, if any) a title
5156           distinct from that of the Document, and from those of previous
5157           versions (which should, if there were any, be listed in the
5158           History section of the Document).  You may use the same title
5159           as a previous version if the original publisher of that
5160           version gives permission.
5161
5162        B. List on the Title Page, as authors, one or more persons or
5163           entities responsible for authorship of the modifications in
5164           the Modified Version, together with at least five of the
5165           principal authors of the Document (all of its principal
5166           authors, if it has fewer than five), unless they release you
5167           from this requirement.
5168
5169        C. State on the Title page the name of the publisher of the
5170           Modified Version, as the publisher.
5171
5172        D. Preserve all the copyright notices of the Document.
5173
5174        E. Add an appropriate copyright notice for your modifications
5175           adjacent to the other copyright notices.
5176
5177        F. Include, immediately after the copyright notices, a license
5178           notice giving the public permission to use the Modified
5179           Version under the terms of this License, in the form shown in
5180           the Addendum below.
5181
5182        G. Preserve in that license notice the full lists of Invariant
5183           Sections and required Cover Texts given in the Document’s
5184           license notice.
5185
5186        H. Include an unaltered copy of this License.
5187
5188        I. Preserve the section Entitled “History”, Preserve its Title,
5189           and add to it an item stating at least the title, year, new
5190           authors, and publisher of the Modified Version as given on the
5191           Title Page.  If there is no section Entitled “History” in the
5192           Document, create one stating the title, year, authors, and
5193           publisher of the Document as given on its Title Page, then add
5194           an item describing the Modified Version as stated in the
5195           previous sentence.
5196
5197        J. Preserve the network location, if any, given in the Document
5198           for public access to a Transparent copy of the Document, and
5199           likewise the network locations given in the Document for
5200           previous versions it was based on.  These may be placed in the
5201           “History” section.  You may omit a network location for a work
5202           that was published at least four years before the Document
5203           itself, or if the original publisher of the version it refers
5204           to gives permission.
5205
5206        K. For any section Entitled “Acknowledgements” or “Dedications”,
5207           Preserve the Title of the section, and preserve in the section
5208           all the substance and tone of each of the contributor
5209           acknowledgements and/or dedications given therein.
5210
5211        L. Preserve all the Invariant Sections of the Document, unaltered
5212           in their text and in their titles.  Section numbers or the
5213           equivalent are not considered part of the section titles.
5214
5215        M. Delete any section Entitled “Endorsements”.  Such a section
5216           may not be included in the Modified Version.
5217
5218        N. Do not retitle any existing section to be Entitled
5219           “Endorsements” or to conflict in title with any Invariant
5220           Section.
5221
5222        O. Preserve any Warranty Disclaimers.
5223
5224      If the Modified Version includes new front-matter sections or
5225      appendices that qualify as Secondary Sections and contain no
5226      material copied from the Document, you may at your option designate
5227      some or all of these sections as invariant.  To do this, add their
5228      titles to the list of Invariant Sections in the Modified Version’s
5229      license notice.  These titles must be distinct from any other
5230      section titles.
5231
5232      You may add a section Entitled “Endorsements”, provided it contains
5233      nothing but endorsements of your Modified Version by various
5234      parties—for example, statements of peer review or that the text has
5235      been approved by an organization as the authoritative definition of
5236      a standard.
5237
5238      You may add a passage of up to five words as a Front-Cover Text,
5239      and a passage of up to 25 words as a Back-Cover Text, to the end of
5240      the list of Cover Texts in the Modified Version.  Only one passage
5241      of Front-Cover Text and one of Back-Cover Text may be added by (or
5242      through arrangements made by) any one entity.  If the Document
5243      already includes a cover text for the same cover, previously added
5244      by you or by arrangement made by the same entity you are acting on
5245      behalf of, you may not add another; but you may replace the old
5246      one, on explicit permission from the previous publisher that added
5247      the old one.
5248
5249      The author(s) and publisher(s) of the Document do not by this
5250      License give permission to use their names for publicity for or to
5251      assert or imply endorsement of any Modified Version.
5252
5253   5. COMBINING DOCUMENTS
5254
5255      You may combine the Document with other documents released under
5256      this License, under the terms defined in section 4 above for
5257      modified versions, provided that you include in the combination all
5258      of the Invariant Sections of all of the original documents,
5259      unmodified, and list them all as Invariant Sections of your
5260      combined work in its license notice, and that you preserve all
5261      their Warranty Disclaimers.
5262
5263      The combined work need only contain one copy of this License, and
5264      multiple identical Invariant Sections may be replaced with a single
5265      copy.  If there are multiple Invariant Sections with the same name
5266      but different contents, make the title of each such section unique
5267      by adding at the end of it, in parentheses, the name of the
5268      original author or publisher of that section if known, or else a
5269      unique number.  Make the same adjustment to the section titles in
5270      the list of Invariant Sections in the license notice of the
5271      combined work.
5272
5273      In the combination, you must combine any sections Entitled
5274      “History” in the various original documents, forming one section
5275      Entitled “History”; likewise combine any sections Entitled
5276      “Acknowledgements”, and any sections Entitled “Dedications”.  You
5277      must delete all sections Entitled “Endorsements.”
5278
5279   6. COLLECTIONS OF DOCUMENTS
5280
5281      You may make a collection consisting of the Document and other
5282      documents released under this License, and replace the individual
5283      copies of this License in the various documents with a single copy
5284      that is included in the collection, provided that you follow the
5285      rules of this License for verbatim copying of each of the documents
5286      in all other respects.
5287
5288      You may extract a single document from such a collection, and
5289      distribute it individually under this License, provided you insert
5290      a copy of this License into the extracted document, and follow this
5291      License in all other respects regarding verbatim copying of that
5292      document.
5293
5294   7. AGGREGATION WITH INDEPENDENT WORKS
5295
5296      A compilation of the Document or its derivatives with other
5297      separate and independent documents or works, in or on a volume of a
5298      storage or distribution medium, is called an “aggregate” if the
5299      copyright resulting from the compilation is not used to limit the
5300      legal rights of the compilation’s users beyond what the individual
5301      works permit.  When the Document is included in an aggregate, this
5302      License does not apply to the other works in the aggregate which
5303      are not themselves derivative works of the Document.
5304
5305      If the Cover Text requirement of section 3 is applicable to these
5306      copies of the Document, then if the Document is less than one half
5307      of the entire aggregate, the Document’s Cover Texts may be placed
5308      on covers that bracket the Document within the aggregate, or the
5309      electronic equivalent of covers if the Document is in electronic
5310      form.  Otherwise they must appear on printed covers that bracket
5311      the whole aggregate.
5312
5313   8. TRANSLATION
5314
5315      Translation is considered a kind of modification, so you may
5316      distribute translations of the Document under the terms of section
5317      4.  Replacing Invariant Sections with translations requires special
5318      permission from their copyright holders, but you may include
5319      translations of some or all Invariant Sections in addition to the
5320      original versions of these Invariant Sections.  You may include a
5321      translation of this License, and all the license notices in the
5322      Document, and any Warranty Disclaimers, provided that you also
5323      include the original English version of this License and the
5324      original versions of those notices and disclaimers.  In case of a
5325      disagreement between the translation and the original version of
5326      this License or a notice or disclaimer, the original version will
5327      prevail.
5328
5329      If a section in the Document is Entitled “Acknowledgements”,
5330      “Dedications”, or “History”, the requirement (section 4) to
5331      Preserve its Title (section 1) will typically require changing the
5332      actual title.
5333
5334   9. TERMINATION
5335
5336      You may not copy, modify, sublicense, or distribute the Document
5337      except as expressly provided under this License.  Any attempt
5338      otherwise to copy, modify, sublicense, or distribute it is void,
5339      and will automatically terminate your rights under this License.
5340
5341      However, if you cease all violation of this License, then your
5342      license from a particular copyright holder is reinstated (a)
5343      provisionally, unless and until the copyright holder explicitly and
5344      finally terminates your license, and (b) permanently, if the
5345      copyright holder fails to notify you of the violation by some
5346      reasonable means prior to 60 days after the cessation.
5347
5348      Moreover, your license from a particular copyright holder is
5349      reinstated permanently if the copyright holder notifies you of the
5350      violation by some reasonable means, this is the first time you have
5351      received notice of violation of this License (for any work) from
5352      that copyright holder, and you cure the violation prior to 30 days
5353      after your receipt of the notice.
5354
5355      Termination of your rights under this section does not terminate
5356      the licenses of parties who have received copies or rights from you
5357      under this License.  If your rights have been terminated and not
5358      permanently reinstated, receipt of a copy of some or all of the
5359      same material does not give you any rights to use it.
5360
5361   10. FUTURE REVISIONS OF THIS LICENSE
5362
5363      The Free Software Foundation may publish new, revised versions of
5364      the GNU Free Documentation License from time to time.  Such new
5365      versions will be similar in spirit to the present version, but may
5366      differ in detail to address new problems or concerns.  See
5367      <http://www.gnu.org/copyleft/>.
5368
5369      Each version of the License is given a distinguishing version
5370      number.  If the Document specifies that a particular numbered
5371      version of this License “or any later version” applies to it, you
5372      have the option of following the terms and conditions either of
5373      that specified version or of any later version that has been
5374      published (not as a draft) by the Free Software Foundation.  If the
5375      Document does not specify a version number of this License, you may
5376      choose any version ever published (not as a draft) by the Free
5377      Software Foundation.  If the Document specifies that a proxy can
5378      decide which future versions of this License can be used, that
5379      proxy’s public statement of acceptance of a version permanently
5380      authorizes you to choose that version for the Document.
5381
5382   11. RELICENSING
5383
5384      “Massive Multiauthor Collaboration Site” (or “MMC Site”) means any
5385      World Wide Web server that publishes copyrightable works and also
5386      provides prominent facilities for anybody to edit those works.  A
5387      public wiki that anybody can edit is an example of such a server.
5388      A “Massive Multiauthor Collaboration” (or “MMC”) contained in the
5389      site means any set of copyrightable works thus published on the MMC
5390      site.
5391
5392      “CC-BY-SA” means the Creative Commons Attribution-Share Alike 3.0
5393      license published by Creative Commons Corporation, a not-for-profit
5394      corporation with a principal place of business in San Francisco,
5395      California, as well as future copyleft versions of that license
5396      published by that same organization.
5397
5398      “Incorporate” means to publish or republish a Document, in whole or
5399      in part, as part of another Document.
5400
5401      An MMC is “eligible for relicensing” if it is licensed under this
5402      License, and if all works that were first published under this
5403      License somewhere other than this MMC, and subsequently
5404      incorporated in whole or in part into the MMC, (1) had no cover
5405      texts or invariant sections, and (2) were thus incorporated prior
5406      to November 1, 2008.
5407
5408      The operator of an MMC Site may republish an MMC contained in the
5409      site under CC-BY-SA on the same site at any time before August 1,
5410      2009, provided the MMC is eligible for relicensing.
5411
5412 ADDENDUM: How to use this License for your documents
5413 ====================================================
5414
5415    To use this License in a document you have written, include a copy of
5416 the License in the document and put the following copyright and license
5417 notices just after the title page:
5418
5419        Copyright (C)  YEAR  YOUR NAME.
5420        Permission is granted to copy, distribute and/or modify this document
5421        under the terms of the GNU Free Documentation License, Version 1.3
5422        or any later version published by the Free Software Foundation;
5423        with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
5424        Texts.  A copy of the license is included in the section entitled ``GNU
5425        Free Documentation License''.
5426
5427    If you have Invariant Sections, Front-Cover Texts and Back-Cover
5428 Texts, replace the “with…Texts.” line with this:
5429
5430          with the Invariant Sections being LIST THEIR TITLES, with
5431          the Front-Cover Texts being LIST, and with the Back-Cover Texts
5432          being LIST.
5433
5434    If you have Invariant Sections without Cover Texts, or some other
5435 combination of the three, merge those two alternatives to suit the
5436 situation.
5437
5438    If your document contains nontrivial examples of program code, we
5439 recommend releasing these examples in parallel under your choice of free
5440 software license, such as the GNU General Public License, to permit
5441 their use in free software.
5442
5443 \1f
5444 File: libunistring.info,  Node: Index,  Prev: Licenses,  Up: Top
5445
5446 Index
5447 *****
5448
5449 \0\b[index\0\b]
5450 * Menu:
5451
5452 * ambiguous width:                       uniwidth.h.          (line  10)
5453 * Arabic shaping:                        Arabic shaping.      (line   6)
5454 * argument conventions:                  Conventions.         (line   9)
5455 * autoconf macro:                        Autoconf macro.      (line   6)
5456 * bidi class:                            Bidi class.          (line   6)
5457 * bidirectional category:                Bidi class.          (line   6)
5458 * bidirectional reordering:              More functionality.  (line   6)
5459 * block:                                 Blocks.              (line   6)
5460 * boundaries, between grapheme clusters: unigbrk.h.           (line   6)
5461 * boundaries, between words:             uniwbrk.h.           (line   6)
5462 * breaks, grapheme cluster:              unigbrk.h.           (line   6)
5463 * breaks, line:                          unilbrk.h.           (line   6)
5464 * breaks, word:                          uniwbrk.h.           (line   6)
5465 * bug reports:                           Reporting problems.  (line   6)
5466 * bug tracker:                           Reporting problems.  (line   6)
5467 * C string functions:                    char * strings.      (line   6)
5468 * C, programming language:               ISO C and Java syntax.
5469                                                               (line   6)
5470 * C-like API:                            Classifications like in ISO C.
5471                                                               (line   6)
5472 * canonical combining class:             Canonical combining class.
5473                                                               (line   6)
5474 * case detection:                        Case detection.      (line   6)
5475 * case mappings:                         Case mappings of strings.
5476                                                               (line   6)
5477 * casing_prefix_context_t:               Case mappings of substrings.
5478                                                               (line  14)
5479 * casing_suffix_context_t:               Case mappings of substrings.
5480                                                               (line  43)
5481 * char, type:                            char * strings.      (line  22)
5482 * combining, Unicode characters:         Composition of characters.
5483                                                               (line   6)
5484 * comparing:                             Elementary string functions.
5485                                                               (line 108)
5486 * comparing <1>:                         Elementary string functions on NUL terminated strings.
5487                                                               (line 131)
5488 * comparing, ignoring case:              Case insensitive comparison.
5489                                                               (line   6)
5490 * comparing, ignoring case, with collation rules: Case insensitive comparison.
5491                                                               (line  65)
5492 * comparing, ignoring normalization:     Normalizing comparisons.
5493                                                               (line   6)
5494 * comparing, ignoring normalization and case: Case insensitive comparison.
5495                                                               (line   6)
5496 * comparing, ignoring normalization and case, with collation rules: Case insensitive comparison.
5497                                                               (line  65)
5498 * comparing, ignoring normalization, with collation rules: Normalizing comparisons.
5499                                                               (line  22)
5500 * comparing, with collation rules:       Elementary string functions on NUL terminated strings.
5501                                                               (line 143)
5502 * comparing, with collation rules, ignoring case: Case insensitive comparison.
5503                                                               (line  65)
5504 * comparing, with collation rules, ignoring normalization: Normalizing comparisons.
5505                                                               (line  22)
5506 * comparing, with collation rules, ignoring normalization and case: Case insensitive comparison.
5507                                                               (line  65)
5508 * compiler options:                      Compiler options.    (line  24)
5509 * composing, Unicode characters:         Composition of characters.
5510                                                               (line   6)
5511 * converting:                            Elementary string conversions.
5512                                                               (line   6)
5513 * converting <1>:                        uniconv.h.           (line  45)
5514 * copying:                               Elementary string functions.
5515                                                               (line  72)
5516 * copying <1>:                           Elementary string functions on NUL terminated strings.
5517                                                               (line  62)
5518 * counting:                              Elementary string functions.
5519                                                               (line 153)
5520 * decomposing:                           Decomposition of characters.
5521                                                               (line   6)
5522 * dependencies:                          Installation.        (line   6)
5523 * detecting case:                        Case detection.      (line   6)
5524 * duplicating:                           Elementary string functions with memory allocation.
5525                                                               (line   6)
5526 * duplicating <1>:                       Elementary string functions on NUL terminated strings.
5527                                                               (line 169)
5528 * enum iconv_ilseq_handler:              uniconv.h.           (line  29)
5529 * FDL, GNU Free Documentation License:   GNU FDL.             (line   6)
5530 * formatted output:                      unistdio.h.          (line   6)
5531 * fullwidth:                             uniwidth.h.          (line  22)
5532 * general category:                      General category.    (line   6)
5533 * gl_LIBUNISTRING:                       Autoconf macro.      (line  11)
5534 * GPL, GNU General Public License:       GNU GPL.             (line   6)
5535 * grapheme cluster boundaries:           unigbrk.h.           (line   6)
5536 * grapheme cluster breaks:               unigbrk.h.           (line   6)
5537 * halfwidth:                             uniwidth.h.          (line  22)
5538 * identifiers:                           ISO C and Java syntax.
5539                                                               (line   6)
5540 * installation:                          Installation.        (line  10)
5541 * internationalization:                  Unicode and i18n.    (line   6)
5542 * iterating:                             Elementary string functions.
5543                                                               (line   6)
5544 * iterating <1>:                         Elementary string functions on NUL terminated strings.
5545                                                               (line  15)
5546 * Java, programming language:            ISO C and Java syntax.
5547                                                               (line   6)
5548 * joining group:                         Joining group.       (line   6)
5549 * joining of Arabic characters:          Arabic shaping.      (line   6)
5550 * joining type:                          Joining type.        (line   6)
5551 * LGPL, GNU Lesser General Public License: GNU LGPL.          (line   6)
5552 * License, GNU FDL:                      GNU FDL.             (line   6)
5553 * License, GNU GPL:                      GNU GPL.             (line   6)
5554 * License, GNU LGPL:                     GNU LGPL.            (line   6)
5555 * Licenses:                              Licenses.            (line   6)
5556 * line breaks:                           unilbrk.h.           (line   6)
5557 * locale:                                Locale encodings.    (line   6)
5558 * locale categories:                     Locale encodings.    (line  10)
5559 * locale encoding:                       Locale encodings.    (line  23)
5560 * locale encoding <1>:                   uniconv.h.           (line  10)
5561 * locale language:                       Case mappings of strings.
5562                                                               (line  16)
5563 * locale, multibyte:                     char * strings.      (line  13)
5564 * locale_charset:                        uniconv.h.           (line  12)
5565 * lowercasing:                           Case mappings of strings.
5566                                                               (line   6)
5567 * mailing list:                          Reporting problems.  (line   6)
5568 * mirroring, of Unicode character:       Mirrored character.  (line   6)
5569 * normal forms:                          uninorm.h.           (line   6)
5570 * normalizing:                           uninorm.h.           (line   6)
5571 * output, formatted:                     unistdio.h.          (line   6)
5572 * properties, of Unicode character:      Properties.          (line   6)
5573 * regular expression:                    uniregex.h.          (line   6)
5574 * rendering:                             More functionality.  (line   9)
5575 * return value conventions:              Conventions.         (line  47)
5576 * scripts:                               Scripts.             (line   6)
5577 * searching, for a character:            Elementary string functions.
5578                                                               (line 140)
5579 * searching, for a character <1>:        Elementary string functions on NUL terminated strings.
5580                                                               (line 179)
5581 * searching, for a substring:            Elementary string functions on NUL terminated strings.
5582                                                               (line 235)
5583 * stream, normalizing a:                 Normalization of streams.
5584                                                               (line   6)
5585 * struct uninorm_filter:                 Normalization of streams.
5586                                                               (line  10)
5587 * titlecasing:                           Case mappings of strings.
5588                                                               (line   6)
5589 * u16_asnprintf:                         unistdio.h.          (line 111)
5590 * u16_asprintf:                          unistdio.h.          (line 109)
5591 * u16_casecmp:                           Case insensitive comparison.
5592                                                               (line  48)
5593 * u16_casecoll:                          Case insensitive comparison.
5594                                                               (line  91)
5595 * u16_casefold:                          Case insensitive comparison.
5596                                                               (line  12)
5597 * u16_casexfrm:                          Case insensitive comparison.
5598                                                               (line  71)
5599 * u16_casing_prefixes_context:           Case mappings of substrings.
5600                                                               (line  36)
5601 * u16_casing_prefix_context:             Case mappings of substrings.
5602                                                               (line  28)
5603 * u16_casing_suffixes_context:           Case mappings of substrings.
5604                                                               (line  65)
5605 * u16_casing_suffix_context:             Case mappings of substrings.
5606                                                               (line  57)
5607 * u16_check:                             Elementary string checks.
5608                                                               (line  10)
5609 * u16_chr:                               Elementary string functions.
5610                                                               (line 143)
5611 * u16_cmp:                               Elementary string functions.
5612                                                               (line 113)
5613 * u16_cmp2:                              Elementary string functions.
5614                                                               (line 129)
5615 * u16_conv_from_encoding:                uniconv.h.           (line  51)
5616 * u16_conv_to_encoding:                  uniconv.h.           (line  88)
5617 * u16_cpy:                               Elementary string functions.
5618                                                               (line  76)
5619 * u16_cpy_alloc:                         Elementary string functions with memory allocation.
5620                                                               (line   9)
5621 * u16_ct_casefold:                       Case insensitive comparison.
5622                                                               (line  32)
5623 * u16_ct_tolower:                        Case mappings of substrings.
5624                                                               (line  98)
5625 * u16_ct_totitle:                        Case mappings of substrings.
5626                                                               (line 116)
5627 * u16_ct_toupper:                        Case mappings of substrings.
5628                                                               (line  80)
5629 * u16_endswith:                          Elementary string functions on NUL terminated strings.
5630                                                               (line 259)
5631 * u16_grapheme_breaks:                   Grapheme cluster breaks in a string.
5632                                                               (line  34)
5633 * u16_grapheme_next:                     Grapheme cluster breaks in a string.
5634                                                               (line  11)
5635 * u16_grapheme_prev:                     Grapheme cluster breaks in a string.
5636                                                               (line  21)
5637 * u16_is_cased:                          Case detection.      (line  55)
5638 * u16_is_casefolded:                     Case detection.      (line  42)
5639 * u16_is_lowercase:                      Case detection.      (line  22)
5640 * u16_is_titlecase:                      Case detection.      (line  32)
5641 * u16_is_uppercase:                      Case detection.      (line  12)
5642 * u16_mblen:                             Elementary string functions.
5643                                                               (line  10)
5644 * u16_mbsnlen:                           Elementary string functions.
5645                                                               (line 156)
5646 * u16_mbtouc:                            Elementary string functions.
5647                                                               (line  37)
5648 * u16_mbtoucr:                           Elementary string functions.
5649                                                               (line  44)
5650 * u16_mbtouc_unsafe:                     Elementary string functions.
5651                                                               (line  21)
5652 * u16_move:                              Elementary string functions.
5653                                                               (line  87)
5654 * u16_next:                              Elementary string functions on NUL terminated strings.
5655                                                               (line  23)
5656 * u16_normalize:                         Normalization of strings.
5657                                                               (line  48)
5658 * u16_normcmp:                           Normalizing comparisons.
5659                                                               (line  11)
5660 * u16_normcoll:                          Normalizing comparisons.
5661                                                               (line  37)
5662 * u16_normxfrm:                          Normalizing comparisons.
5663                                                               (line  24)
5664 * u16_possible_linebreaks:               unilbrk.h.           (line  44)
5665 * u16_prev:                              Elementary string functions on NUL terminated strings.
5666                                                               (line  34)
5667 * u16_set:                               Elementary string functions.
5668                                                               (line 100)
5669 * u16_snprintf:                          unistdio.h.          (line 107)
5670 * u16_sprintf:                           unistdio.h.          (line 106)
5671 * u16_startswith:                        Elementary string functions on NUL terminated strings.
5672                                                               (line 251)
5673 * u16_stpcpy:                            Elementary string functions on NUL terminated strings.
5674                                                               (line  75)
5675 * u16_stpncpy:                           Elementary string functions on NUL terminated strings.
5676                                                               (line  98)
5677 * u16_strcat:                            Elementary string functions on NUL terminated strings.
5678                                                               (line 111)
5679 * u16_strchr:                            Elementary string functions on NUL terminated strings.
5680                                                               (line 182)
5681 * u16_strcmp:                            Elementary string functions on NUL terminated strings.
5682                                                               (line 134)
5683 * u16_strcoll:                           Elementary string functions on NUL terminated strings.
5684                                                               (line 144)
5685 * u16_strconv_from_encoding:             uniconv.h.           (line 127)
5686 * u16_strconv_from_locale:               uniconv.h.           (line 156)
5687 * u16_strconv_to_encoding:               uniconv.h.           (line 140)
5688 * u16_strconv_to_locale:                 uniconv.h.           (line 166)
5689 * u16_strcpy:                            Elementary string functions on NUL terminated strings.
5690                                                               (line  65)
5691 * u16_strcspn:                           Elementary string functions on NUL terminated strings.
5692                                                               (line 202)
5693 * u16_strdup:                            Elementary string functions on NUL terminated strings.
5694                                                               (line 172)
5695 * u16_strlen:                            Elementary string functions on NUL terminated strings.
5696                                                               (line  47)
5697 * u16_strmblen:                          Elementary string functions on NUL terminated strings.
5698                                                               (line  10)
5699 * u16_strmbtouc:                         Elementary string functions on NUL terminated strings.
5700                                                               (line  16)
5701 * u16_strncat:                           Elementary string functions on NUL terminated strings.
5702                                                               (line 122)
5703 * u16_strncmp:                           Elementary string functions on NUL terminated strings.
5704                                                               (line 160)
5705 * u16_strncpy:                           Elementary string functions on NUL terminated strings.
5706                                                               (line  87)
5707 * u16_strnlen:                           Elementary string functions on NUL terminated strings.
5708                                                               (line  55)
5709 * u16_strpbrk:                           Elementary string functions on NUL terminated strings.
5710                                                               (line 226)
5711 * u16_strrchr:                           Elementary string functions on NUL terminated strings.
5712                                                               (line 190)
5713 * u16_strspn:                            Elementary string functions on NUL terminated strings.
5714                                                               (line 214)
5715 * u16_strstr:                            Elementary string functions on NUL terminated strings.
5716                                                               (line 240)
5717 * u16_strtok:                            Elementary string functions on NUL terminated strings.
5718                                                               (line 269)
5719 * u16_strwidth:                          uniwidth.h.          (line  38)
5720 * u16_tolower:                           Case mappings of strings.
5721                                                               (line  41)
5722 * u16_totitle:                           Case mappings of strings.
5723                                                               (line  55)
5724 * u16_toupper:                           Case mappings of strings.
5725                                                               (line  27)
5726 * u16_to_u32:                            Elementary string conversions.
5727                                                               (line  21)
5728 * u16_to_u8:                             Elementary string conversions.
5729                                                               (line  17)
5730 * u16_u16_asnprintf:                     unistdio.h.          (line 131)
5731 * u16_u16_asprintf:                      unistdio.h.          (line 129)
5732 * u16_u16_snprintf:                      unistdio.h.          (line 127)
5733 * u16_u16_sprintf:                       unistdio.h.          (line 125)
5734 * u16_u16_vasnprintf:                    unistdio.h.          (line 139)
5735 * u16_u16_vasprintf:                     unistdio.h.          (line 137)
5736 * u16_u16_vsnprintf:                     unistdio.h.          (line 135)
5737 * u16_u16_vsprintf:                      unistdio.h.          (line 133)
5738 * u16_uctomb:                            Elementary string functions.
5739                                                               (line  61)
5740 * u16_vasnprintf:                        unistdio.h.          (line 119)
5741 * u16_vasprintf:                         unistdio.h.          (line 117)
5742 * u16_vsnprintf:                         unistdio.h.          (line 115)
5743 * u16_vsprintf:                          unistdio.h.          (line 113)
5744 * u16_width:                             uniwidth.h.          (line  29)
5745 * u16_width_linebreaks:                  unilbrk.h.           (line  62)
5746 * u16_wordbreaks:                        Word breaks in a string.
5747                                                               (line   9)
5748 * u32_asnprintf:                         unistdio.h.          (line 150)
5749 * u32_asprintf:                          unistdio.h.          (line 148)
5750 * u32_casecmp:                           Case insensitive comparison.
5751                                                               (line  51)
5752 * u32_casecoll:                          Case insensitive comparison.
5753                                                               (line  94)
5754 * u32_casefold:                          Case insensitive comparison.
5755                                                               (line  15)
5756 * u32_casexfrm:                          Case insensitive comparison.
5757                                                               (line  74)
5758 * u32_casing_prefixes_context:           Case mappings of substrings.
5759                                                               (line  38)
5760 * u32_casing_prefix_context:             Case mappings of substrings.
5761                                                               (line  30)
5762 * u32_casing_suffixes_context:           Case mappings of substrings.
5763                                                               (line  67)
5764 * u32_casing_suffix_context:             Case mappings of substrings.
5765                                                               (line  59)
5766 * u32_check:                             Elementary string checks.
5767                                                               (line  11)
5768 * u32_chr:                               Elementary string functions.
5769                                                               (line 145)
5770 * u32_cmp:                               Elementary string functions.
5771                                                               (line 115)
5772 * u32_cmp2:                              Elementary string functions.
5773                                                               (line 131)
5774 * u32_conv_from_encoding:                uniconv.h.           (line  54)
5775 * u32_conv_to_encoding:                  uniconv.h.           (line  91)
5776 * u32_cpy:                               Elementary string functions.
5777                                                               (line  78)
5778 * u32_cpy_alloc:                         Elementary string functions with memory allocation.
5779                                                               (line  10)
5780 * u32_ct_casefold:                       Case insensitive comparison.
5781                                                               (line  37)
5782 * u32_ct_tolower:                        Case mappings of substrings.
5783                                                               (line 103)
5784 * u32_ct_totitle:                        Case mappings of substrings.
5785                                                               (line 121)
5786 * u32_ct_toupper:                        Case mappings of substrings.
5787                                                               (line  85)
5788 * u32_endswith:                          Elementary string functions on NUL terminated strings.
5789                                                               (line 261)
5790 * u32_grapheme_breaks:                   Grapheme cluster breaks in a string.
5791                                                               (line  36)
5792 * u32_grapheme_next:                     Grapheme cluster breaks in a string.
5793                                                               (line  13)
5794 * u32_grapheme_prev:                     Grapheme cluster breaks in a string.
5795                                                               (line  23)
5796 * u32_is_cased:                          Case detection.      (line  57)
5797 * u32_is_casefolded:                     Case detection.      (line  44)
5798 * u32_is_lowercase:                      Case detection.      (line  24)
5799 * u32_is_titlecase:                      Case detection.      (line  34)
5800 * u32_is_uppercase:                      Case detection.      (line  14)
5801 * u32_mblen:                             Elementary string functions.
5802                                                               (line  11)
5803 * u32_mbsnlen:                           Elementary string functions.
5804                                                               (line 157)
5805 * u32_mbtouc:                            Elementary string functions.
5806                                                               (line  38)
5807 * u32_mbtoucr:                           Elementary string functions.
5808                                                               (line  45)
5809 * u32_mbtouc_unsafe:                     Elementary string functions.
5810                                                               (line  23)
5811 * u32_move:                              Elementary string functions.
5812                                                               (line  89)
5813 * u32_next:                              Elementary string functions on NUL terminated strings.
5814                                                               (line  24)
5815 * u32_normalize:                         Normalization of strings.
5816                                                               (line  50)
5817 * u32_normcmp:                           Normalizing comparisons.
5818                                                               (line  13)
5819 * u32_normcoll:                          Normalizing comparisons.
5820                                                               (line  39)
5821 * u32_normxfrm:                          Normalizing comparisons.
5822                                                               (line  26)
5823 * u32_possible_linebreaks:               unilbrk.h.           (line  46)
5824 * u32_prev:                              Elementary string functions on NUL terminated strings.
5825                                                               (line  36)
5826 * u32_set:                               Elementary string functions.
5827                                                               (line 101)
5828 * u32_snprintf:                          unistdio.h.          (line 146)
5829 * u32_sprintf:                           unistdio.h.          (line 145)
5830 * u32_startswith:                        Elementary string functions on NUL terminated strings.
5831                                                               (line 253)
5832 * u32_stpcpy:                            Elementary string functions on NUL terminated strings.
5833                                                               (line  77)
5834 * u32_stpncpy:                           Elementary string functions on NUL terminated strings.
5835                                                               (line 100)
5836 * u32_strcat:                            Elementary string functions on NUL terminated strings.
5837                                                               (line 113)
5838 * u32_strchr:                            Elementary string functions on NUL terminated strings.
5839                                                               (line 183)
5840 * u32_strcmp:                            Elementary string functions on NUL terminated strings.
5841                                                               (line 135)
5842 * u32_strcoll:                           Elementary string functions on NUL terminated strings.
5843                                                               (line 145)
5844 * u32_strconv_from_encoding:             uniconv.h.           (line 129)
5845 * u32_strconv_from_locale:               uniconv.h.           (line 157)
5846 * u32_strconv_to_encoding:               uniconv.h.           (line 142)
5847 * u32_strconv_to_locale:                 uniconv.h.           (line 167)
5848 * u32_strcpy:                            Elementary string functions on NUL terminated strings.
5849                                                               (line  67)
5850 * u32_strcspn:                           Elementary string functions on NUL terminated strings.
5851                                                               (line 204)
5852 * u32_strdup:                            Elementary string functions on NUL terminated strings.
5853                                                               (line 173)
5854 * u32_strlen:                            Elementary string functions on NUL terminated strings.
5855                                                               (line  48)
5856 * u32_strmblen:                          Elementary string functions on NUL terminated strings.
5857                                                               (line  11)
5858 * u32_strmbtouc:                         Elementary string functions on NUL terminated strings.
5859                                                               (line  17)
5860 * u32_strncat:                           Elementary string functions on NUL terminated strings.
5861                                                               (line 124)
5862 * u32_strncmp:                           Elementary string functions on NUL terminated strings.
5863                                                               (line 162)
5864 * u32_strncpy:                           Elementary string functions on NUL terminated strings.
5865                                                               (line  89)
5866 * u32_strnlen:                           Elementary string functions on NUL terminated strings.
5867                                                               (line  56)
5868 * u32_strpbrk:                           Elementary string functions on NUL terminated strings.
5869                                                               (line 228)
5870 * u32_strrchr:                           Elementary string functions on NUL terminated strings.
5871                                                               (line 191)
5872 * u32_strspn:                            Elementary string functions on NUL terminated strings.
5873                                                               (line 216)
5874 * u32_strstr:                            Elementary string functions on NUL terminated strings.
5875                                                               (line 242)
5876 * u32_strtok:                            Elementary string functions on NUL terminated strings.
5877                                                               (line 271)
5878 * u32_strwidth:                          uniwidth.h.          (line  39)
5879 * u32_tolower:                           Case mappings of strings.
5880                                                               (line  44)
5881 * u32_totitle:                           Case mappings of strings.
5882                                                               (line  58)
5883 * u32_toupper:                           Case mappings of strings.
5884                                                               (line  30)
5885 * u32_to_u16:                            Elementary string conversions.
5886                                                               (line  29)
5887 * u32_to_u8:                             Elementary string conversions.
5888                                                               (line  25)
5889 * u32_u32_asnprintf:                     unistdio.h.          (line 170)
5890 * u32_u32_asprintf:                      unistdio.h.          (line 168)
5891 * u32_u32_snprintf:                      unistdio.h.          (line 166)
5892 * u32_u32_sprintf:                       unistdio.h.          (line 164)
5893 * u32_u32_vasnprintf:                    unistdio.h.          (line 178)
5894 * u32_u32_vasprintf:                     unistdio.h.          (line 176)
5895 * u32_u32_vsnprintf:                     unistdio.h.          (line 174)
5896 * u32_u32_vsprintf:                      unistdio.h.          (line 172)
5897 * u32_uctomb:                            Elementary string functions.
5898                                                               (line  62)
5899 * u32_vasnprintf:                        unistdio.h.          (line 158)
5900 * u32_vasprintf:                         unistdio.h.          (line 156)
5901 * u32_vsnprintf:                         unistdio.h.          (line 154)
5902 * u32_vsprintf:                          unistdio.h.          (line 152)
5903 * u32_width:                             uniwidth.h.          (line  31)
5904 * u32_width_linebreaks:                  unilbrk.h.           (line  65)
5905 * u32_wordbreaks:                        Word breaks in a string.
5906                                                               (line  10)
5907 * u8_asnprintf:                          unistdio.h.          (line  72)
5908 * u8_asprintf:                           unistdio.h.          (line  70)
5909 * u8_casecmp:                            Case insensitive comparison.
5910                                                               (line  45)
5911 * u8_casecoll:                           Case insensitive comparison.
5912                                                               (line  88)
5913 * u8_casefold:                           Case insensitive comparison.
5914                                                               (line   9)
5915 * u8_casexfrm:                           Case insensitive comparison.
5916                                                               (line  68)
5917 * u8_casing_prefixes_context:            Case mappings of substrings.
5918                                                               (line  34)
5919 * u8_casing_prefix_context:              Case mappings of substrings.
5920                                                               (line  26)
5921 * u8_casing_suffixes_context:            Case mappings of substrings.
5922                                                               (line  63)
5923 * u8_casing_suffix_context:              Case mappings of substrings.
5924                                                               (line  55)
5925 * u8_check:                              Elementary string checks.
5926                                                               (line   9)
5927 * u8_chr:                                Elementary string functions.
5928                                                               (line 142)
5929 * u8_cmp:                                Elementary string functions.
5930                                                               (line 111)
5931 * u8_cmp2:                               Elementary string functions.
5932                                                               (line 127)
5933 * u8_conv_from_encoding:                 uniconv.h.           (line  48)
5934 * u8_conv_to_encoding:                   uniconv.h.           (line  85)
5935 * u8_cpy:                                Elementary string functions.
5936                                                               (line  74)
5937 * u8_cpy_alloc:                          Elementary string functions with memory allocation.
5938                                                               (line   8)
5939 * u8_ct_casefold:                        Case insensitive comparison.
5940                                                               (line  27)
5941 * u8_ct_tolower:                         Case mappings of substrings.
5942                                                               (line  93)
5943 * u8_ct_totitle:                         Case mappings of substrings.
5944                                                               (line 111)
5945 * u8_ct_toupper:                         Case mappings of substrings.
5946                                                               (line  75)
5947 * u8_endswith:                           Elementary string functions on NUL terminated strings.
5948                                                               (line 257)
5949 * u8_grapheme_breaks:                    Grapheme cluster breaks in a string.
5950                                                               (line  32)
5951 * u8_grapheme_next:                      Grapheme cluster breaks in a string.
5952                                                               (line   9)
5953 * u8_grapheme_prev:                      Grapheme cluster breaks in a string.
5954                                                               (line  19)
5955 * u8_is_cased:                           Case detection.      (line  53)
5956 * u8_is_casefolded:                      Case detection.      (line  40)
5957 * u8_is_lowercase:                       Case detection.      (line  20)
5958 * u8_is_titlecase:                       Case detection.      (line  30)
5959 * u8_is_uppercase:                       Case detection.      (line  10)
5960 * u8_mblen:                              Elementary string functions.
5961                                                               (line   9)
5962 * u8_mbsnlen:                            Elementary string functions.
5963                                                               (line 155)
5964 * u8_mbtouc:                             Elementary string functions.
5965                                                               (line  36)
5966 * u8_mbtoucr:                            Elementary string functions.
5967                                                               (line  43)
5968 * u8_mbtouc_unsafe:                      Elementary string functions.
5969                                                               (line  19)
5970 * u8_move:                               Elementary string functions.
5971                                                               (line  85)
5972 * u8_next:                               Elementary string functions on NUL terminated strings.
5973                                                               (line  22)
5974 * u8_normalize:                          Normalization of strings.
5975                                                               (line  46)
5976 * u8_normcmp:                            Normalizing comparisons.
5977                                                               (line   9)
5978 * u8_normcoll:                           Normalizing comparisons.
5979                                                               (line  35)
5980 * u8_normxfrm:                           Normalizing comparisons.
5981                                                               (line  22)
5982 * u8_possible_linebreaks:                unilbrk.h.           (line  42)
5983 * u8_prev:                               Elementary string functions on NUL terminated strings.
5984                                                               (line  32)
5985 * u8_set:                                Elementary string functions.
5986                                                               (line  99)
5987 * u8_snprintf:                           unistdio.h.          (line  68)
5988 * u8_sprintf:                            unistdio.h.          (line  67)
5989 * u8_startswith:                         Elementary string functions on NUL terminated strings.
5990                                                               (line 249)
5991 * u8_stpcpy:                             Elementary string functions on NUL terminated strings.
5992                                                               (line  74)
5993 * u8_stpncpy:                            Elementary string functions on NUL terminated strings.
5994                                                               (line  96)
5995 * u8_strcat:                             Elementary string functions on NUL terminated strings.
5996                                                               (line 110)
5997 * u8_strchr:                             Elementary string functions on NUL terminated strings.
5998                                                               (line 181)
5999 * u8_strcmp:                             Elementary string functions on NUL terminated strings.
6000                                                               (line 133)
6001 * u8_strcoll:                            Elementary string functions on NUL terminated strings.
6002                                                               (line 143)
6003 * u8_strconv_from_encoding:              uniconv.h.           (line 125)
6004 * u8_strconv_from_locale:                uniconv.h.           (line 155)
6005 * u8_strconv_to_encoding:                uniconv.h.           (line 138)
6006 * u8_strconv_to_locale:                  uniconv.h.           (line 165)
6007 * u8_strcpy:                             Elementary string functions on NUL terminated strings.
6008                                                               (line  64)
6009 * u8_strcspn:                            Elementary string functions on NUL terminated strings.
6010                                                               (line 200)
6011 * u8_strdup:                             Elementary string functions on NUL terminated strings.
6012                                                               (line 171)
6013 * u8_strlen:                             Elementary string functions on NUL terminated strings.
6014                                                               (line  46)
6015 * u8_strmblen:                           Elementary string functions on NUL terminated strings.
6016                                                               (line   9)
6017 * u8_strmbtouc:                          Elementary string functions on NUL terminated strings.
6018                                                               (line  15)
6019 * u8_strncat:                            Elementary string functions on NUL terminated strings.
6020                                                               (line 120)
6021 * u8_strncmp:                            Elementary string functions on NUL terminated strings.
6022                                                               (line 158)
6023 * u8_strncpy:                            Elementary string functions on NUL terminated strings.
6024                                                               (line  85)
6025 * u8_strnlen:                            Elementary string functions on NUL terminated strings.
6026                                                               (line  54)
6027 * u8_strpbrk:                            Elementary string functions on NUL terminated strings.
6028                                                               (line 224)
6029 * u8_strrchr:                            Elementary string functions on NUL terminated strings.
6030                                                               (line 189)
6031 * u8_strspn:                             Elementary string functions on NUL terminated strings.
6032                                                               (line 212)
6033 * u8_strstr:                             Elementary string functions on NUL terminated strings.
6034                                                               (line 238)
6035 * u8_strtok:                             Elementary string functions on NUL terminated strings.
6036                                                               (line 267)
6037 * u8_strwidth:                           uniwidth.h.          (line  37)
6038 * u8_tolower:                            Case mappings of strings.
6039                                                               (line  38)
6040 * u8_totitle:                            Case mappings of strings.
6041                                                               (line  52)
6042 * u8_toupper:                            Case mappings of strings.
6043                                                               (line  24)
6044 * u8_to_u16:                             Elementary string conversions.
6045                                                               (line   9)
6046 * u8_to_u32:                             Elementary string conversions.
6047                                                               (line  13)
6048 * u8_u8_asnprintf:                       unistdio.h.          (line  92)
6049 * u8_u8_asprintf:                        unistdio.h.          (line  90)
6050 * u8_u8_snprintf:                        unistdio.h.          (line  88)
6051 * u8_u8_sprintf:                         unistdio.h.          (line  86)
6052 * u8_u8_vasnprintf:                      unistdio.h.          (line 100)
6053 * u8_u8_vasprintf:                       unistdio.h.          (line  98)
6054 * u8_u8_vsnprintf:                       unistdio.h.          (line  96)
6055 * u8_u8_vsprintf:                        unistdio.h.          (line  94)
6056 * u8_uctomb:                             Elementary string functions.
6057                                                               (line  60)
6058 * u8_vasnprintf:                         unistdio.h.          (line  80)
6059 * u8_vasprintf:                          unistdio.h.          (line  78)
6060 * u8_vsnprintf:                          unistdio.h.          (line  76)
6061 * u8_vsprintf:                           unistdio.h.          (line  74)
6062 * u8_width:                              uniwidth.h.          (line  27)
6063 * u8_width_linebreaks:                   unilbrk.h.           (line  59)
6064 * u8_wordbreaks:                         Word breaks in a string.
6065                                                               (line   8)
6066 * UCS-4:                                 Unicode.             (line  14)
6067 * ucs4_t:                                unitypes.h.          (line  15)
6068 * uc_all_blocks:                         Blocks.              (line  36)
6069 * uc_all_scripts:                        Scripts.             (line  35)
6070 * uc_bidi_category:                      Bidi class.          (line  93)
6071 * uc_bidi_category_byname:               Bidi class.          (line  83)
6072 * uc_bidi_category_name:                 Bidi class.          (line  75)
6073 * uc_bidi_class:                         Bidi class.          (line  92)
6074 * uc_bidi_class_byname:                  Bidi class.          (line  82)
6075 * uc_bidi_class_long_name:               Bidi class.          (line  79)
6076 * uc_bidi_class_name:                    Bidi class.          (line  74)
6077 * uc_block:                              Blocks.              (line  26)
6078 * uc_block_t:                            Blocks.              (line  11)
6079 * uc_canonical_decomposition:            Decomposition of characters.
6080                                                               (line  90)
6081 * uc_combining_class:                    Canonical combining class.
6082                                                               (line 110)
6083 * uc_combining_class_byname:             Canonical combining class.
6084                                                               (line 101)
6085 * uc_combining_class_long_name:          Canonical combining class.
6086                                                               (line  97)
6087 * uc_combining_class_name:               Canonical combining class.
6088                                                               (line  92)
6089 * uc_composition:                        Composition of characters.
6090                                                               (line   9)
6091 * uc_c_ident_category:                   ISO C and Java syntax.
6092                                                               (line  38)
6093 * uc_decimal_value:                      Decimal digit value. (line  10)
6094 * uc_decomposition:                      Decomposition of characters.
6095                                                               (line  80)
6096 * uc_digit_value:                        Digit value.         (line  10)
6097 * uc_fraction_t:                         Numeric value.       (line  12)
6098 * uc_general_category:                   Object oriented API. (line 219)
6099 * uc_general_category_and:               Object oriented API. (line 180)
6100 * uc_general_category_and_not:           Object oriented API. (line 187)
6101 * uc_general_category_byname:            Object oriented API. (line 209)
6102 * uc_general_category_long_name:         Object oriented API. (line 203)
6103 * uc_general_category_name:              Object oriented API. (line 197)
6104 * uc_general_category_or:                Object oriented API. (line 174)
6105 * uc_general_category_t:                 Object oriented API. (line   6)
6106 * uc_graphemeclusterbreak_property:      Grapheme cluster break property.
6107                                                               (line  31)
6108 * uc_is_alnum:                           Classifications like in ISO C.
6109                                                               (line  13)
6110 * uc_is_alpha:                           Classifications like in ISO C.
6111                                                               (line  17)
6112 * uc_is_bidi_category:                   Bidi class.          (line  97)
6113 * uc_is_bidi_class:                      Bidi class.          (line  96)
6114 * uc_is_blank:                           Classifications like in ISO C.
6115                                                               (line  63)
6116 * uc_is_block:                           Blocks.              (line  31)
6117 * uc_is_cntrl:                           Classifications like in ISO C.
6118                                                               (line  23)
6119 * uc_is_c_whitespace:                    ISO C and Java syntax.
6120                                                               (line   9)
6121 * uc_is_digit:                           Classifications like in ISO C.
6122                                                               (line  26)
6123 * uc_is_general_category:                Object oriented API. (line 224)
6124 * uc_is_general_category_withtable:      Bit mask API.        (line  51)
6125 * uc_is_graph:                           Classifications like in ISO C.
6126                                                               (line  30)
6127 * uc_is_grapheme_break:                  Grapheme cluster break property.
6128                                                               (line  38)
6129 * uc_is_java_whitespace:                 ISO C and Java syntax.
6130                                                               (line  13)
6131 * uc_is_lower:                           Classifications like in ISO C.
6132                                                               (line  34)
6133 * uc_is_print:                           Classifications like in ISO C.
6134                                                               (line  40)
6135 * uc_is_property:                        Properties as objects.
6136                                                               (line 150)
6137 * uc_is_property_alphabetic:             Properties as functions.
6138                                                               (line   9)
6139 * uc_is_property_ascii_hex_digit:        Properties as functions.
6140                                                               (line  80)
6141 * uc_is_property_bidi_arabic_digit:      Properties as functions.
6142                                                               (line  66)
6143 * uc_is_property_bidi_arabic_right_to_left: Properties as functions.
6144                                                               (line  62)
6145 * uc_is_property_bidi_block_separator:   Properties as functions.
6146                                                               (line  68)
6147 * uc_is_property_bidi_boundary_neutral:  Properties as functions.
6148                                                               (line  72)
6149 * uc_is_property_bidi_common_separator:  Properties as functions.
6150                                                               (line  67)
6151 * uc_is_property_bidi_control:           Properties as functions.
6152                                                               (line  59)
6153 * uc_is_property_bidi_embedding_or_override: Properties as functions.
6154                                                               (line  74)
6155 * uc_is_property_bidi_european_digit:    Properties as functions.
6156                                                               (line  63)
6157 * uc_is_property_bidi_eur_num_separator: Properties as functions.
6158                                                               (line  64)
6159 * uc_is_property_bidi_eur_num_terminator: Properties as functions.
6160                                                               (line  65)
6161 * uc_is_property_bidi_hebrew_right_to_left: Properties as functions.
6162                                                               (line  61)
6163 * uc_is_property_bidi_left_to_right:     Properties as functions.
6164                                                               (line  60)
6165 * uc_is_property_bidi_non_spacing_mark:  Properties as functions.
6166                                                               (line  71)
6167 * uc_is_property_bidi_other_neutral:     Properties as functions.
6168                                                               (line  75)
6169 * uc_is_property_bidi_pdf:               Properties as functions.
6170                                                               (line  73)
6171 * uc_is_property_bidi_segment_separator: Properties as functions.
6172                                                               (line  69)
6173 * uc_is_property_bidi_whitespace:        Properties as functions.
6174                                                               (line  70)
6175 * uc_is_property_cased:                  Properties as functions.
6176                                                               (line  29)
6177 * uc_is_property_case_ignorable:         Properties as functions.
6178                                                               (line  30)
6179 * uc_is_property_changes_when_casefolded: Properties as functions.
6180                                                               (line  34)
6181 * uc_is_property_changes_when_casemapped: Properties as functions.
6182                                                               (line  35)
6183 * uc_is_property_changes_when_lowercased: Properties as functions.
6184                                                               (line  31)
6185 * uc_is_property_changes_when_titlecased: Properties as functions.
6186                                                               (line  33)
6187 * uc_is_property_changes_when_uppercased: Properties as functions.
6188                                                               (line  32)
6189 * uc_is_property_combining:              Properties as functions.
6190                                                               (line 110)
6191 * uc_is_property_composite:              Properties as functions.
6192                                                               (line 111)
6193 * uc_is_property_currency_symbol:        Properties as functions.
6194                                                               (line 105)
6195 * uc_is_property_dash:                   Properties as functions.
6196                                                               (line  97)
6197 * uc_is_property_decimal_digit:          Properties as functions.
6198                                                               (line 112)
6199 * uc_is_property_default_ignorable_code_point: Properties as functions.
6200                                                               (line  12)
6201 * uc_is_property_deprecated:             Properties as functions.
6202                                                               (line  16)
6203 * uc_is_property_diacritic:              Properties as functions.
6204                                                               (line 114)
6205 * uc_is_property_extender:               Properties as functions.
6206                                                               (line 115)
6207 * uc_is_property_format_control:         Properties as functions.
6208                                                               (line  96)
6209 * uc_is_property_grapheme_base:          Properties as functions.
6210                                                               (line  52)
6211 * uc_is_property_grapheme_extend:        Properties as functions.
6212                                                               (line  53)
6213 * uc_is_property_grapheme_link:          Properties as functions.
6214                                                               (line  55)
6215 * uc_is_property_hex_digit:              Properties as functions.
6216                                                               (line  79)
6217 * uc_is_property_hyphen:                 Properties as functions.
6218                                                               (line  98)
6219 * uc_is_property_ideographic:            Properties as functions.
6220                                                               (line  84)
6221 * uc_is_property_ids_binary_operator:    Properties as functions.
6222                                                               (line  87)
6223 * uc_is_property_ids_trinary_operator:   Properties as functions.
6224                                                               (line  88)
6225 * uc_is_property_id_continue:            Properties as functions.
6226                                                               (line  42)
6227 * uc_is_property_id_start:               Properties as functions.
6228                                                               (line  40)
6229 * uc_is_property_ignorable_control:      Properties as functions.
6230                                                               (line 116)
6231 * uc_is_property_iso_control:            Properties as functions.
6232                                                               (line  95)
6233 * uc_is_property_join_control:           Properties as functions.
6234                                                               (line  51)
6235 * uc_is_property_left_of_pair:           Properties as functions.
6236                                                               (line 109)
6237 * uc_is_property_line_separator:         Properties as functions.
6238                                                               (line 100)
6239 * uc_is_property_logical_order_exception: Properties as functions.
6240                                                               (line  17)
6241 * uc_is_property_lowercase:              Properties as functions.
6242                                                               (line  26)
6243 * uc_is_property_math:                   Properties as functions.
6244                                                               (line 106)
6245 * uc_is_property_non_break:              Properties as functions.
6246                                                               (line  94)
6247 * uc_is_property_not_a_character:        Properties as functions.
6248                                                               (line  11)
6249 * uc_is_property_numeric:                Properties as functions.
6250                                                               (line 113)
6251 * uc_is_property_other_alphabetic:       Properties as functions.
6252                                                               (line  10)
6253 * uc_is_property_other_default_ignorable_code_point: Properties as functions.
6254                                                               (line  14)
6255 * uc_is_property_other_grapheme_extend:  Properties as functions.
6256                                                               (line  54)
6257 * uc_is_property_other_id_continue:      Properties as functions.
6258                                                               (line  43)
6259 * uc_is_property_other_id_start:         Properties as functions.
6260                                                               (line  41)
6261 * uc_is_property_other_lowercase:        Properties as functions.
6262                                                               (line  27)
6263 * uc_is_property_other_math:             Properties as functions.
6264                                                               (line 107)
6265 * uc_is_property_other_uppercase:        Properties as functions.
6266                                                               (line  25)
6267 * uc_is_property_paired_punctuation:     Properties as functions.
6268                                                               (line 108)
6269 * uc_is_property_paragraph_separator:    Properties as functions.
6270                                                               (line 101)
6271 * uc_is_property_pattern_syntax:         Properties as functions.
6272                                                               (line  47)
6273 * uc_is_property_pattern_white_space:    Properties as functions.
6274                                                               (line  46)
6275 * uc_is_property_private_use:            Properties as functions.
6276                                                               (line  19)
6277 * uc_is_property_punctuation:            Properties as functions.
6278                                                               (line  99)
6279 * uc_is_property_quotation_mark:         Properties as functions.
6280                                                               (line 102)
6281 * uc_is_property_radical:                Properties as functions.
6282                                                               (line  86)
6283 * uc_is_property_sentence_terminal:      Properties as functions.
6284                                                               (line 103)
6285 * uc_is_property_soft_dotted:            Properties as functions.
6286                                                               (line  36)
6287 * uc_is_property_space:                  Properties as functions.
6288                                                               (line  93)
6289 * uc_is_property_terminal_punctuation:   Properties as functions.
6290                                                               (line 104)
6291 * uc_is_property_titlecase:              Properties as functions.
6292                                                               (line  28)
6293 * uc_is_property_unassigned_code_value:  Properties as functions.
6294                                                               (line  20)
6295 * uc_is_property_unified_ideograph:      Properties as functions.
6296                                                               (line  85)
6297 * uc_is_property_uppercase:              Properties as functions.
6298                                                               (line  24)
6299 * uc_is_property_variation_selector:     Properties as functions.
6300                                                               (line  18)
6301 * uc_is_property_white_space:            Properties as functions.
6302                                                               (line   8)
6303 * uc_is_property_xid_continue:           Properties as functions.
6304                                                               (line  45)
6305 * uc_is_property_xid_start:              Properties as functions.
6306                                                               (line  44)
6307 * uc_is_property_zero_width:             Properties as functions.
6308                                                               (line  92)
6309 * uc_is_punct:                           Classifications like in ISO C.
6310                                                               (line  43)
6311 * uc_is_script:                          Scripts.             (line  30)
6312 * uc_is_space:                           Classifications like in ISO C.
6313                                                               (line  48)
6314 * uc_is_upper:                           Classifications like in ISO C.
6315                                                               (line  53)
6316 * uc_is_xdigit:                          Classifications like in ISO C.
6317                                                               (line  59)
6318 * uc_java_ident_category:                ISO C and Java syntax.
6319                                                               (line  42)
6320 * uc_joining_group:                      Joining group.       (line  85)
6321 * uc_joining_group_byname:               Joining group.       (line  76)
6322 * uc_joining_group_name:                 Joining group.       (line  73)
6323 * uc_joining_type:                       Joining type.        (line  54)
6324 * uc_joining_type_byname:                Joining type.        (line  45)
6325 * uc_joining_type_long_name:             Joining type.        (line  42)
6326 * uc_joining_type_name:                  Joining type.        (line  39)
6327 * uc_locale_language:                    Case mappings of strings.
6328                                                               (line  20)
6329 * uc_mirror_char:                        Mirrored character.  (line  13)
6330 * uc_numeric_value:                      Numeric value.       (line  21)
6331 * uc_property_byname:                    Properties as objects.
6332                                                               (line 128)
6333 * uc_property_is_valid:                  Properties as objects.
6334                                                               (line 143)
6335 * uc_property_t:                         Properties as objects.
6336                                                               (line   8)
6337 * uc_script:                             Scripts.             (line  19)
6338 * uc_script_byname:                      Scripts.             (line  23)
6339 * uc_script_t:                           Scripts.             (line  10)
6340 * uc_tolower:                            Case mappings of characters.
6341                                                               (line  19)
6342 * uc_totitle:                            Case mappings of characters.
6343                                                               (line  22)
6344 * uc_toupper:                            Case mappings of characters.
6345                                                               (line  16)
6346 * uc_width:                              uniwidth.h.          (line  22)
6347 * uc_wordbreak_property:                 Word break property. (line  31)
6348 * uint16_t:                              unitypes.h.          (line   9)
6349 * uint32_t:                              unitypes.h.          (line  10)
6350 * uint8_t:                               unitypes.h.          (line   8)
6351 * ulc_asnprintf:                         unistdio.h.          (line  49)
6352 * ulc_asprintf:                          unistdio.h.          (line  47)
6353 * ulc_casecmp:                           Case insensitive comparison.
6354                                                               (line  54)
6355 * ulc_casecoll:                          Case insensitive comparison.
6356                                                               (line  97)
6357 * ulc_casexfrm:                          Case insensitive comparison.
6358                                                               (line  77)
6359 * ulc_fprintf:                           unistdio.h.          (line 184)
6360 * ulc_grapheme_breaks:                   Grapheme cluster breaks in a string.
6361                                                               (line  38)
6362 * ulc_possible_linebreaks:               unilbrk.h.           (line  48)
6363 * ulc_snprintf:                          unistdio.h.          (line  44)
6364 * ulc_sprintf:                           unistdio.h.          (line  42)
6365 * ulc_vasnprintf:                        unistdio.h.          (line  61)
6366 * ulc_vasprintf:                         unistdio.h.          (line  58)
6367 * ulc_vfprintf:                          unistdio.h.          (line 185)
6368 * ulc_vsnprintf:                         unistdio.h.          (line  55)
6369 * ulc_vsprintf:                          unistdio.h.          (line  52)
6370 * ulc_width_linebreaks:                  unilbrk.h.           (line  68)
6371 * ulc_wordbreaks:                        Word breaks in a string.
6372                                                               (line  11)
6373 * Unicode:                               Unicode.             (line   6)
6374 * Unicode character, bidi class:         Bidi class.          (line   6)
6375 * Unicode character, bidirectional category: Bidi class.      (line   6)
6376 * Unicode character, block:              Blocks.              (line  24)
6377 * Unicode character, canonical combining class: Canonical combining class.
6378                                                               (line   6)
6379 * Unicode character, case mappings:      Case mappings of characters.
6380                                                               (line   6)
6381 * Unicode character, classification:     General category.    (line   6)
6382 * Unicode character, classification like in C: Classifications like in ISO C.
6383                                                               (line   6)
6384 * Unicode character, general category:   General category.    (line   6)
6385 * Unicode character, mirroring:          Mirrored character.  (line   6)
6386 * Unicode character, name:               uniname.h.           (line   6)
6387 * Unicode character, properties:         Properties.          (line   6)
6388 * Unicode character, script:             Scripts.             (line  17)
6389 * Unicode character, validity in C identifiers: ISO C and Java syntax.
6390                                                               (line  38)
6391 * Unicode character, validity in Java identifiers: ISO C and Java syntax.
6392                                                               (line  42)
6393 * Unicode character, value:              Decimal digit value. (line   6)
6394 * Unicode character, value <1>:          Digit value.         (line   6)
6395 * Unicode character, value <2>:          Numeric value.       (line   6)
6396 * Unicode character, width:              uniwidth.h.          (line  22)
6397 * unicode_character_name:                uniname.h.           (line  18)
6398 * unicode_name_character:                uniname.h.           (line  24)
6399 * uninorm_decomposing_form:              Normalization of strings.
6400                                                               (line  39)
6401 * uninorm_filter_create:                 Normalization of streams.
6402                                                               (line  16)
6403 * uninorm_filter_flush:                  Normalization of streams.
6404                                                               (line  32)
6405 * uninorm_filter_free:                   Normalization of streams.
6406                                                               (line  42)
6407 * uninorm_filter_write:                  Normalization of streams.
6408                                                               (line  27)
6409 * uninorm_is_compat_decomposing:         Normalization of strings.
6410                                                               (line  31)
6411 * uninorm_is_composing:                  Normalization of strings.
6412                                                               (line  35)
6413 * uninorm_t:                             Normalization of strings.
6414                                                               (line   9)
6415 * uppercasing:                           Case mappings of strings.
6416                                                               (line   6)
6417 * use cases:                             Introduction.        (line  36)
6418 * UTF-16:                                Unicode.             (line  14)
6419 * UTF-16, strings:                       Unicode strings.     (line   6)
6420 * UTF-32:                                Unicode.             (line  14)
6421 * UTF-32, strings:                       Unicode strings.     (line   6)
6422 * UTF-8:                                 Unicode.             (line  14)
6423 * UTF-8, strings:                        Unicode strings.     (line   6)
6424 * validity:                              Elementary string checks.
6425                                                               (line   6)
6426 * value, of libunistring:                Introduction.        (line  36)
6427 * value, of Unicode character:           Decimal digit value. (line   6)
6428 * value, of Unicode character <1>:       Digit value.         (line   6)
6429 * value, of Unicode character <2>:       Numeric value.       (line   6)
6430 * verification:                          Elementary string checks.
6431                                                               (line   6)
6432 * wchar_t, type:                         The wchar_t mess.    (line   6)
6433 * well-formed:                           Elementary string checks.
6434                                                               (line   6)
6435 * width:                                 uniwidth.h.          (line   6)
6436 * word boundaries:                       uniwbrk.h.           (line   6)
6437 * word breaks:                           uniwbrk.h.           (line   6)
6438 * wrapping:                              unilbrk.h.           (line   6)
6439
6440
6441 \1f
6442 Tag Table:
6443 Node: Top\7f269
6444 Node: Introduction\7f3400
6445 Node: Unicode\7f5493
6446 Node: Unicode and i18n\7f7378
6447 Node: Locale encodings\7f8848
6448 Node: In-memory representation\7f11113
6449 Node: char * strings\7f12239
6450 Node: The wchar_t mess\7f17727
6451 Node: Unicode strings\7f20035
6452 Node: Conventions\7f21220
6453 Node: unitypes.h\7f23512
6454 Node: unistr.h\7f24096
6455 Node: Elementary string checks\7f24661
6456 Node: Elementary string conversions\7f25283
6457 Node: Elementary string functions\7f26585
6458 Node: Elementary string functions with memory allocation\7f33644
6459 Node: Elementary string functions on NUL terminated strings\7f34266
6460 Node: uniconv.h\7f46494
6461 Node: unistdio.h\7f54447
6462 Node: uniname.h\7f62700
6463 Node: unictype.h\7f64059
6464 Node: General category\7f64987
6465 Node: Object oriented API\7f66042
6466 Node: Bit mask API\7f75276
6467 Node: Canonical combining class\7f77571
6468 Node: Bidi class\7f81805
6469 Node: Decimal digit value\7f85218
6470 Node: Digit value\7f85775
6471 Node: Numeric value\7f86336
6472 Node: Mirrored character\7f87238
6473 Node: Arabic shaping\7f87931
6474 Node: Joining type\7f88404
6475 Node: Joining group\7f90554
6476 Node: Properties\7f93992
6477 Node: Properties as objects\7f94683
6478 Node: Properties as functions\7f101705
6479 Node: Scripts\7f107721
6480 Node: Blocks\7f109126
6481 Node: ISO C and Java syntax\7f110469
6482 Node: Classifications like in ISO C\7f112187
6483 Node: uniwidth.h\7f114999
6484 Node: unigbrk.h\7f117045
6485 Node: Grapheme cluster breaks in a string\7f118539
6486 Node: Grapheme cluster break property\7f120644
6487 Node: uniwbrk.h\7f122545
6488 Node: Word breaks in a string\7f123083
6489 Node: Word break property\7f124175
6490 Node: unilbrk.h\7f125274
6491 Node: uninorm.h\7f129570
6492 Node: Decomposition of characters\7f130207
6493 Node: Composition of characters\7f133684
6494 Node: Normalization of strings\7f134397
6495 Node: Normalizing comparisons\7f136474
6496 Node: Normalization of streams\7f138876
6497 Node: unicase.h\7f141001
6498 Node: Case mappings of characters\7f141690
6499 Node: Case mappings of strings\7f143839
6500 Node: Case mappings of substrings\7f147190
6501 Node: Case insensitive comparison\7f154112
6502 Node: Case detection\7f159517
6503 Node: uniregex.h\7f162831
6504 Node: Using the library\7f163058
6505 Node: Installation\7f163469
6506 Node: Compiler options\7f163954
6507 Node: Include files\7f165594
6508 Node: Autoconf macro\7f166847
6509 Node: Reporting problems\7f168487
6510 Node: More functionality\7f169305
6511 Node: Licenses\7f169748
6512 Node: GNU GPL\7f171386
6513 Node: GNU LGPL\7f209130
6514 Node: GNU FDL\7f217612
6515 Node: Index\7f242916
6516 \1f
6517 End Tag Table
6518
6519 \1f
6520 Local Variables:
6521 coding: utf-8
6522 End: