1 This is libunistring.info, produced by makeinfo version 4.13 from
4 INFO-DIR-SECTION Software development
6 * GNU libunistring: (libunistring). Unicode string library.
9 This manual is for GNU libunistring.
12 File: libunistring.info, Node: Top, Next: Introduction, Up: (dir)
19 * Introduction:: Who may need Unicode strings?
20 * Conventions:: Conventions used in this manual
21 * unitypes.h:: Elementary types
22 * unistr.h:: Elementary Unicode string functions
23 * uniconv.h:: Conversions between Unicode and encodings
24 * unistdio.h:: Output with Unicode strings
25 * uniname.h:: Names of Unicode characters
26 * unictype.h:: Unicode character classification and properties
27 * uniwidth.h:: Display width
28 * uniwbrk.h:: Word breaks in strings
29 * unilbrk.h:: Line breaking
30 * uninorm.h:: Normalization forms
31 * unicase.h:: Case mappings
32 * uniregex.h:: Regular expressions
33 * Using the library:: How to link with the library and use it?
34 * More functionality:: More advanced functionality
37 * Index:: General Index
39 --- The Detailed Node Listing ---
43 * Unicode:: What is Unicode?
44 * Unicode and i18n:: Unicode and internationalization
45 * Locale encodings:: What is a locale encoding?
46 * In-memory representation:: How to represent strings in memory?
47 * char * strings:: What to keep in mind with `char *' strings
48 * The wchar_t mess:: Why `wchar_t *' strings are useless
49 * Unicode strings:: How are Unicode strings represented?
53 * Elementary string checks::
54 * Elementary string conversions::
55 * Elementary string functions::
56 * Elementary string functions with memory allocation::
57 * Elementary string functions on NUL terminated strings::
62 * Canonical combining class::
63 * Bidirectional category::
64 * Decimal digit value::
67 * Mirrored character::
71 * ISO C and Java syntax::
72 * Classifications like in ISO C::
76 * Object oriented API::
81 * Properties as objects::
82 * Properties as functions::
86 * Word breaks in a string::
87 * Word break property::
91 * Decomposition of characters::
92 * Composition of characters::
93 * Normalization of strings::
94 * Normalizing comparisons::
95 * Normalization of streams::
99 * Case mappings of characters::
100 * Case mappings of strings::
101 * Case mappings of substrings::
102 * Case insensitive comparison::
111 * Reporting problems::
115 * GNU GPL:: GNU General Public License
116 * GNU LGPL:: GNU Lesser General Public License
117 * GNU FDL:: GNU Free Documentation License
120 File: libunistring.info, Node: Introduction, Next: Conventions, Prev: Top, Up: Top
125 This library provides functions for manipulating Unicode strings and
126 for manipulating C strings according to the Unicode standard.
128 It consists of the following parts:
131 elementary string functions
134 conversion from/to legacy encodings
137 formatted output to strings
143 character classification and properties
146 string width when using nonproportional fonts
152 line breaking algorithm
155 normalization (composition and decomposition)
161 regular expressions (not yet implemented)
163 libunistring is for you if your application involves non-trivial text
164 processing, such as upper/lower case conversions, line breaking,
165 operations on words, or more advanced analysis of text. Text provided
166 by the user can, in general, contain characters of all kinds of
167 scripts. The text processing functions provided by this library handle
168 all scripts and all languages.
170 libunistring is for you if your application already uses the ISO C /
171 POSIX `<ctype.h>', `<wctype.h>' functions and the text it operates on is
172 provided by the user and can be in any language.
174 libunistring is also for you if your application uses Unicode
175 strings as internal in-memory representation.
179 * Unicode:: What is Unicode?
180 * Unicode and i18n:: Unicode and internationalization
181 * Locale encodings:: What is a locale encoding?
182 * In-memory representation:: How to represent strings in memory?
183 * char * strings:: What to keep in mind with `char *' strings
184 * The wchar_t mess:: Why `wchar_t *' strings are useless
185 * Unicode strings:: How are Unicode strings represented?
188 File: libunistring.info, Node: Unicode, Next: Unicode and i18n, Up: Introduction
193 Unicode is a standardized repertoire of characters that contains
194 characters from all scripts of the world, from Latin letters to Chinese
195 ideographs and Babylonian cuneiform glyphs. It also specifies how
196 these characters are to be rendered on a screen or on paper, and how
197 common text processing (word selection, line breaking, uppercasing of
198 page titles etc.) is supposed to behave on Unicode text.
200 Unicode also specifies three ways of storing sequences of Unicode
201 characters in a computer whose basic unit of data is an 8-bit byte:
203 Every character is represented as 1 to 4 bytes.
206 Every character is represented as 1 to 2 units of 16 bits.
209 Every character is represented as 1 unit of 32 bits.
211 For encoding Unicode text in a file, UTF-8 is usually used. For
212 encoding Unicode strings in memory for a program, either of the three
213 encoding forms can be reasonably used.
215 Unicode is widely used on the web. Prior to the use of Unicode, web
216 pages were in many different encodings (ISO-8859-1 for English, French,
217 Spanish, ISO-8859-2 for Polish, ISO-8859-7 for Greek, KOI8-R for
218 Russian, GB2312 or BIG5 for Chinese, ISO-2022-JP-2 or EUC-JP or
219 Shift_JIS for Japanese, and many many others). It was next to
220 impossible to create a document that contained Chinese and Polish text
221 in the same document. Due to the many encodings for Japanese, even the
222 processing of pure Japanese text was error prone.
225 * The Unicode standard: `http://www.unicode.org/'
227 * Definition of UTF-8: `http://www.rfc-editor.org/rfc/rfc3629.txt'
229 * Definition of UTF-16: `http://www.rfc-editor.org/rfc/rfc2781.txt'
231 * Markus Kuhn's UTF-8 and Unicode FAQ:
232 `http://www.cl.cam.ac.uk/~mgk25/unicode.html'
235 File: libunistring.info, Node: Unicode and i18n, Next: Locale encodings, Prev: Unicode, Up: Introduction
237 1.2 Unicode and Internationalization
238 ====================================
240 Internationalization is the process of changing the source code of a
241 program so that it can meet the expectations of users in any culture,
242 if culture specific data (translations, images etc.) are provided.
244 Use of Unicode is not strictly required for internationalization,
245 but it makes internationalization much easier, because operations that
246 need to look at specific characters (like hyphenation, spell checking,
247 or the automatic conversion of double-quotes to opening and closing
248 double-quote characters) don't need to consider multiple possible
249 encodings of the text.
251 Use of Unicode also enables multilingualization: the ability of
252 having text in multiple languages present in the same document or even
253 in the same line of text.
255 But use of Unicode is not everything. Internationalization usually
256 consists of three features:
257 * Use of Unicode where needed for text processing. This is what
260 * Use of message catalogs for messages shown to the user, This is
261 what GNU gettext is about.
263 * Use of locale specific conventions for date and time formats, for
264 numeric formatting, or for sorting of text. This can be done
265 adequately with the POSIX APIs and the implementation of locales
266 in the GNU C library.
269 File: libunistring.info, Node: Locale encodings, Next: In-memory representation, Prev: Unicode and i18n, Up: Introduction
274 A locale is a set of cultural conventions. According to POSIX, for
275 a program, at any moment, there is one locale being designated as the
276 "current locale". (Actually, POSIX supports also one locale per
277 thread, but this feature is not yet universally implemented and not
278 widely used.) The locale is partitioned into several aspects, called
279 the "categories" of the locale. The main various aspects are:
280 * The character encoding and the character properties. This is the
283 * The sorting rules for text. This is the `LC_COLLATE' category.
285 * The language specific translations of messages. This is the
286 `LC_MESSAGES' category.
288 * The formatting rules for numbers, such as the decimal separator.
289 This is the `LC_NUMERIC' category.
291 * The formatting rules for amounts of money. This is the
292 `LC_MONETARY' category.
294 * The formatting of date and time. This is the `LC_TIME' category.
296 In particular, the `LC_CTYPE' category of the current locale
297 determines the character encoding. This is the encoding of `char *'
298 strings. We also call it the "locale encoding". GNU libunistring has
299 a function, `locale_charset', that returns a standardized (platform
300 independent) name for this encoding.
302 All locale encodings used on glibc systems are essentially ASCII
303 compatible: Most graphic ASCII characters have the same representation,
304 as a single byte, in that encoding as in ASCII.
306 Among the possible locale encodings are UTF-8 and GB18030. Both
307 allow to represent any Unicode character as a sequence of bytes. UTF-8
308 is used in most of the world, whereas GB18030 is used in the People's
309 Republic of China, because it is backward compatible with the GB2312
310 encoding that was used in this country earlier.
312 The legacy locale encodings, ISO-8859-15 (which supplanted
313 ISO-8859-1 in most of Europe), ISO-8859-2, KOI8-R, EUC-JP, etc., are
314 still in use in many places, though.
316 UTF-16 and UTF-32 are not used as locale encodings, because they are
317 not ASCII compatible.
320 File: libunistring.info, Node: In-memory representation, Next: char * strings, Prev: Locale encodings, Up: Introduction
322 1.4 Choice of in-memory representation of strings
323 =================================================
325 There are three ways of representing strings in memory of a running
327 * As `char *' strings. Such strings are represented in locale
328 encoding. This approach is employed when not much text processing
329 is done by the program. When some Unicode aware processing is to
330 be done, a string is converted to Unicode on the fly and back to
331 locale encoding afterwards.
333 * As UTF-8 or UTF-16 or UTF-32 strings. This implies that
334 conversion from locale encoding to Unicode is performed on input,
335 and in the opposite direction on output. This approach is
336 employed when the program does a significant amount of text
337 processing, or when the program has multiple threads operating on
338 the same data but in different locales.
340 * As `wchar_t *', a.k.a. "wide strings". This approach is misguided,
341 see *note The wchar_t mess::.
344 File: libunistring.info, Node: char * strings, Next: The wchar_t mess, Prev: In-memory representation, Up: Introduction
349 The classical C strings, with its C library support standardized by
350 ISO C and POSIX, can be used in internationalized programs with some
351 precautions. The problem with this API is that many of the C library
352 functions for strings don't work correctly on strings in locale
353 encodings, leading to bugs that only people in some cultures of the
354 world will experience.
356 The first problem with the C library API is the support of multibyte
357 locales. According to the locale encoding, in general, every character
358 is represented by one or more bytes (up to 4 bytes in practice -- but
359 use `MB_LEN_MAX' instead of the number 4 in the code). When every
360 character is represented by only 1 byte, we speak of an "unibyte
361 locale", otherwise of a "multibyte locale". It is important to realize
362 that the majority of Unix installations nowadays use UTF-8 or GB18030
363 as locale encoding; therefore, the majority of users are using
366 The important fact to remember is: _A `char' is a byte, not a
370 * The `<ctype.h>' API is useless in this context; it does not work in
373 * The `strlen' function does not return the number of characters in
374 a string. Nor does it return the number of screen columns occupied
375 by a string after it is output. It merely returns the number of
376 _bytes_ occupied by a string.
378 * Truncating a string, for example, with `strncpy', can have the
379 effect of truncating it in the middle of a multibyte character.
380 Such a string will, when output, have a garbled character at its
381 end, often represented by a hollow box.
383 * `strchr' and `strrchr' do not work with multibyte strings if the
384 locale encoding is GB18030 and the character to be searched is a
387 * `strstr' does not work with multibyte strings if the locale
388 encoding is different from UTF-8.
390 * `strcspn', `strpbrk', `strspn' cannot work correctly in multibyte
391 locales: they assume the second argument is a list of single-byte
392 characters. Even in this simple case, they do not work with
393 multibyte strings if the locale encoding is GB18030 and one of the
394 characters to be searched is a digit.
396 * `strsep' and `strtok_r' do not work with multibyte strings unless
397 all of the delimiter characters are ASCII characters < 0x30.
399 * The `strcasecmp', `strncasecmp', and `strcasestr' functions do not
400 work with multibyte strings.
402 The workarounds can be found in GNU gnulib
403 `http://www.gnu.org/software/gnulib/'.
404 * gnulib has modules `mbchar', `mbiter', `mbuiter' that represent
405 multibyte characters and allow to iterate across a multibyte
406 string with the same ease as through a unibyte string.
408 * gnulib has functions `mbslen' and `mbswidth' that can be used
409 instead of `strlen' when the number of characters or the number of
410 screen columns of a string is requested.
412 * gnulib has functions `mbschr' and `mbsrrchr' that are like
413 `strchr' and `strrchr', but work in multibyte locales.
415 * gnulib has a function `mbsstr', like `strstr', but works in
418 * gnulib has functions `mbscspn', `mbspbrk', `mbsspn' that are like
419 `strcspn', `strpbrk', `strspn', but work in multibyte locales.
421 * gnulib has functions `mbssep' and `mbstok_r' that are like
422 `strsep' and `strtok_r' but work in multibyte locales.
424 * gnulib has functions `mbscasecmp', `mbsncasecmp', `mbspcasecmp',
425 and `mbscasestr' that are like `strcasecmp', `strncasecmp', and
426 `strcasestr', but work in multibyte locales. Still, the function
427 `ulc_casecmp' is preferable to these functions; see below.
429 The second problem with the C library API is that it has some
430 assumptions built-in that are not valid in some languages:
431 * It assumes that there are only two forms of every character:
432 uppercase and lowercase. This is not true for Croatian, where the
433 character LETTER DZ WITH CARON comes in three forms: LATIN CAPITAL
434 LETTER DZ WITH CARON (DZ), LATIN CAPITAL LETTER D WITH SMALL
435 LETTER Z WITH CARON (Dz), LATIN SMALL LETTER DZ WITH CARON (dz).
437 * It assumes that uppercasing of 1 character leads to 1 character.
438 This is not true for German, where the LATIN SMALL LETTER SHARP S,
439 when uppercased, becomes `SS'.
441 * It assumes that there is 1:1 mapping between uppercase and
442 lowercase forms. This is not true for the Greek sigma: GREEK
443 CAPITAL LETTER SIGMA is the uppercase of both GREEK SMALL LETTER
444 SIGMA and GREEK SMALL LETTER FINAL SIGMA.
446 * It assumes that the upper/lowercase mappings are position
447 independent. This is not true for the Greek sigma and the
450 The correct way to deal with this problem is
451 1. to provide functions for titlecasing, as well as for upper- and
454 2. to view case transformations as functions that operates on strings,
455 rather than on characters.
457 This is implemented in this library, through the functions declared
458 in `<unicase.h>', see *note unicase.h::.
461 File: libunistring.info, Node: The wchar_t mess, Next: Unicode strings, Prev: char * strings, Up: Introduction
463 1.6 The `wchar_t' mess
464 ======================
466 The ISO C and POSIX standard creators made an attempt to fix the
467 first problem mentioned in the previous section. They introduced
468 * a type `wchar_t', designed to encapsulate an entire character,
470 * a "wide string" type `wchar_t *', and
472 * functions declared in `<wctype.h>' that were meant to supplant the
475 Unfortunately, this API and its implementation has numerous problems:
477 * On AIX and Windows platforms, `wchar_t' is a 16-bit type. This
478 means that it can never accommodate an entire Unicode character.
479 Either the `wchar_t *' strings are limited to characters in UCS-2
480 (the "Basic Multilingual Plane" of Unicode), or -- if `wchar_t *'
481 strings are encoded in UTF-16 -- a `wchar_t' represents only half
482 of a character in the worst case, making the `<wctype.h>' functions
485 * On Solaris and FreeBSD, the `wchar_t' encoding is locale dependent
486 and undocumented. This means, if you want to know any property of
487 a `wchar_t' character, other than the properties defined by
488 `<wctype.h>' -- such as whether it's a dash, currency symbol,
489 paragraph separator, or similar --, you have to convert it to
490 `char *' encoding first, by use of the function `wctomb'.
492 * When you read a stream of wide characters, through the functions
493 `fgetwc' and `fgetws', and when the input stream/file is not in
494 the expected encoding, you have no way to determine the invalid
495 byte sequence and do some corrective action. If you use these
496 functions, your program becomes "garbage in - more garbage out" or
497 "garbage in - abort".
499 As a consequence, it is better to use multibyte strings, as
500 explained in the previous section. Such multibyte strings can bypass
501 limitations of the `wchar_t' type, if you use functions defined in
502 gnulib and libunistring for text processing. They can also faithfully
503 transport malformed characters that were present in the input, without
504 requiring the program to produce garbage or abort.
507 File: libunistring.info, Node: Unicode strings, Prev: The wchar_t mess, Up: Introduction
512 libunistring supports Unicode strings in three representations:
513 * UTF-8 strings, through the type `uint8_t *'. The units are bytes
516 * UTF-16 strings, through the type `uint16_t *', The units are
517 16-bit memory words (`uint16_t').
519 * UTF-32 strings, through the type `uint32_t *'. The units are
520 32-bit memory words (`uint32_t').
522 As with C strings, there are two variants:
523 * Unicode strings with a terminating NUL character are represented as
524 a pointer to the first unit of the string. There is a unit
525 containing a 0 value at the end. It is considered part of the
526 string for all memory allocation purposes, but is not considered
527 part of the string for all other logical purposes.
529 * Unicode strings where embedded NUL characters are allowed. These
530 are represented by a pointer to the first unit and the number of
531 units (not bytes!) of the string. In this setting, there is no
532 trailing zero-valued unit used as "end marker".
535 File: libunistring.info, Node: Conventions, Next: unitypes.h, Prev: Introduction, Up: Top
540 This chapter explains conventions valid throughout the libunistring
543 Variables of type `char *' denote C strings in locale encoding. See
544 *note Locale encodings::.
546 Variables of type `uint8_t *' denote UTF-8 strings. Their units are
549 Variables of type `uint16_t *' denote UTF-16 strings, without byte
550 order mark. Their units are 2-byte words.
552 Variables of type `uint32_t *' denote UTF-32 strings, without byte
553 order mark. Their units are 4-byte words.
555 Argument pairs `(S, N)' denote a string `S[0..N-1]' with exactly N
558 All functions with prefix `ulc_' operate on C strings in locale
561 All functions with prefix `u8_' operate on UTF-8 strings.
563 All functions with prefix `u16_' operate on UTF-16 strings.
565 All functions with prefix `u32_' operate on UTF-32 strings.
567 For every function with prefix `u8_', operating on UTF-8 strings,
568 there is also a corresponding function with prefix `u16_', operating on
569 UTF-16 strings, and a corresponding function with prefix `u32_',
570 operating on UTF-32 strings. Their description is analogous; in this
571 documentation we describe only the function that operates on UTF-8
572 strings, for brevity.
574 A declaration with a variable N denotes the three concrete
575 declarations with N = 8, N = 16, N = 32.
577 All parameters starting with `str' and the parameters of functions
578 starting with `u8_str'/`u16_str'/`u32_str' denote a NUL terminated
581 Error values are always returned through the `errno' variable,
582 usually with a return value that indicates the presence of an error
583 (NULL for functions that return an pointer, or -1 for functions that
586 Functions returning a string result take a `(RESULTBUF, LENGTHP)'
587 argument pair. If RESULTBUF is not NULL and the result fits into
588 `*LENGTHP' units, it is put in RESULTBUF, and RESULTBUF is returned.
589 Otherwise, a freshly allocated string is returned. In both cases,
590 `*LENGTHP' is set to the length (number of units) of the returned
591 string. In case of error, NULL is returned and `errno' is set.
594 File: libunistring.info, Node: unitypes.h, Next: unistr.h, Prev: Conventions, Up: Top
596 3 Elementary types `<unitypes.h>'
597 *********************************
599 The include file `<unitypes.h>' provides the following basic types.
604 These are the storage units of UTF-8/16/32 strings, respectively.
605 The definitions are taken from `<stdint.h>', on platforms where
606 this include file is present.
609 This type represents a single Unicode character, outside of an
613 File: libunistring.info, Node: unistr.h, Next: uniconv.h, Prev: unitypes.h, Up: Top
615 4 Elementary Unicode string functions `<unistr.h>'
616 **************************************************
618 This include file declares elementary functions for Unicode strings.
619 It is essentially the equivalent of what `<string.h>' is for C strings.
623 * Elementary string checks::
624 * Elementary string conversions::
625 * Elementary string functions::
626 * Elementary string functions with memory allocation::
627 * Elementary string functions on NUL terminated strings::
630 File: libunistring.info, Node: Elementary string checks, Next: Elementary string conversions, Up: unistr.h
632 4.1 Elementary string checks
633 ============================
635 The following function is available to verify the integrity of a
638 -- Function: const uint8_t * u8_check (const uint8_t *S, size_t N)
639 -- Function: const uint16_t * u16_check (const uint16_t *S, size_t N)
640 -- Function: const uint32_t * u32_check (const uint32_t *S, size_t N)
641 This function checks whether a Unicode string is well-formed. It
642 returns NULL if valid, or a pointer to the first invalid unit
646 File: libunistring.info, Node: Elementary string conversions, Next: Elementary string functions, Prev: Elementary string checks, Up: unistr.h
648 4.2 Elementary string conversions
649 =================================
651 The following functions perform conversions between the different
652 forms of Unicode strings.
654 -- Function: uint16_t * u8_to_u16 (const uint8_t *S, size_t N,
655 uint16_t *RESULTBUF, size_t *LENGTHP)
656 Converts an UTF-8 string to an UTF-16 string.
658 -- Function: uint32_t * u8_to_u32 (const uint8_t *S, size_t N,
659 uint32_t *RESULTBUF, size_t *LENGTHP)
660 Converts an UTF-8 string to an UTF-32 string.
662 -- Function: uint8_t * u16_to_u8 (const uint16_t *S, size_t N, uint8_t
663 *RESULTBUF, size_t *LENGTHP)
664 Converts an UTF-16 string to an UTF-8 string.
666 -- Function: uint32_t * u16_to_u32 (const uint16_t *S, size_t N,
667 uint32_t *RESULTBUF, size_t *LENGTHP)
668 Converts an UTF-16 string to an UTF-32 string.
670 -- Function: uint8_t * u32_to_u8 (const uint32_t *S, size_t N, uint8_t
671 *RESULTBUF, size_t *LENGTHP)
672 Converts an UTF-32 string to an UTF-8 string.
674 -- Function: uint16_t * u32_to_u16 (const uint32_t *S, size_t N,
675 uint16_t *RESULTBUF, size_t *LENGTHP)
676 Converts an UTF-32 string to an UTF-16 string.
679 File: libunistring.info, Node: Elementary string functions, Next: Elementary string functions with memory allocation, Prev: Elementary string conversions, Up: unistr.h
681 4.3 Elementary string functions
682 ===============================
684 The following functions inspect and return details about the first
685 character in a Unicode string.
687 -- Function: int u8_mblen (const uint8_t *S, size_t N)
688 -- Function: int u16_mblen (const uint16_t *S, size_t N)
689 -- Function: int u32_mblen (const uint32_t *S, size_t N)
690 Returns the length (number of units) of the first character in S,
691 which is no longer than N. Returns 0 if it is the NUL character.
692 Returns -1 upon failure.
694 This function is similar to `mblen', except that it operates on a
695 Unicode string and that S must not be NULL.
697 -- Function: int u8_mbtouc_unsafe (ucs4_t *PUC, const uint8_t *S,
699 -- Function: int u16_mbtouc_unsafe (ucs4_t *PUC, const uint16_t *S,
701 -- Function: int u32_mbtouc_unsafe (ucs4_t *PUC, const uint32_t *S,
703 Returns the length (number of units) of the first character in S,
704 putting its `ucs4_t' representation in `*PUC'. Upon failure,
705 `*PUC' is set to `0xfffd', and an appropriate number of units is
708 The number of available units, N, must be > 0.
710 This function is similar to `mbtowc', except that it operates on a
711 Unicode string, PUC and S must not be NULL, N must be > 0, and the
712 NUL character is not treated specially.
714 -- Function: int u8_mbtouc (ucs4_t *PUC, const uint8_t *S, size_t N)
715 -- Function: int u16_mbtouc (ucs4_t *PUC, const uint16_t *S, size_t N)
716 -- Function: int u32_mbtouc (ucs4_t *PUC, const uint32_t *S, size_t N)
717 This function is like `u8_mbtouc_unsafe', except that it will
718 detect an invalid UTF-8 character, even if the library is compiled
719 without `--enable-safety'.
721 -- Function: int u8_mbtoucr (ucs4_t *PUC, const uint8_t *S, size_t N)
722 -- Function: int u16_mbtoucr (ucs4_t *PUC, const uint16_t *S, size_t N)
723 -- Function: int u32_mbtoucr (ucs4_t *PUC, const uint32_t *S, size_t N)
724 Returns the length (number of units) of the first character in S,
725 putting its `ucs4_t' representation in `*PUC'. Upon failure,
726 `*PUC' is set to `0xfffd', and -1 is returned for an invalid
727 sequence of units, -2 is returned for an incomplete sequence of
730 The number of available units, N, must be > 0.
732 This function is similar to `u8_mbtouc', except that the return
733 value gives more details about the failure, similar to `mbrtowc'.
735 The following function stores a Unicode character as a Unicode
738 -- Function: int u8_uctomb (uint8_t *S, ucs4_t UC, int N)
739 -- Function: int u16_uctomb (uint16_t *S, ucs4_t UC, int N)
740 -- Function: int u32_uctomb (uint32_t *S, ucs4_t UC, int N)
741 Puts the multibyte character represented by UC in S, returning its
742 length. Returns -1 upon failure, -2 if the number of available
743 units, N, is too small. The latter case cannot occur if N >=
746 This function is similar to `wctomb', except that it operates on a
747 Unicode strings, S must not be NULL, and the argument N must be
750 The following functions copy Unicode strings in memory.
752 -- Function: uint8_t * u8_cpy (uint8_t *DEST, const uint8_t *SRC,
754 -- Function: uint16_t * u16_cpy (uint16_t *DEST, const uint16_t *SRC,
756 -- Function: uint32_t * u32_cpy (uint32_t *DEST, const uint32_t *SRC,
758 Copies N units from SRC to DEST.
760 This function is similar to `memcpy', except that it operates on
763 -- Function: uint8_t * u8_move (uint8_t *DEST, const uint8_t *SRC,
765 -- Function: uint16_t * u16_move (uint16_t *DEST, const uint16_t *SRC,
767 -- Function: uint32_t * u32_move (uint32_t *DEST, const uint32_t *SRC,
769 Copies N units from SRC to DEST, guaranteeing correct behavior for
770 overlapping memory areas.
772 This function is similar to `memmove', except that it operates on
775 The following function fills a Unicode string.
777 -- Function: uint8_t * u8_set (uint8_t *S, ucs4_t UC, size_t N)
778 -- Function: uint16_t * u16_set (uint16_t *S, ucs4_t UC, size_t N)
779 -- Function: uint32_t * u32_set (uint32_t *S, ucs4_t UC, size_t N)
780 Sets the first N characters of S to UC. UC should be a character
781 that occupies only 1 unit.
783 This function is similar to `memset', except that it operates on
786 The following function compares two Unicode strings of the same
789 -- Function: int u8_cmp (const uint8_t *S1, const uint8_t *S2, size_t
791 -- Function: int u16_cmp (const uint16_t *S1, const uint16_t *S2,
793 -- Function: int u32_cmp (const uint32_t *S1, const uint32_t *S2,
795 Compares S1 and S2, each of length N, lexicographically. Returns
796 a negative value if S1 compares smaller than S2, a positive value
797 if S1 compares larger than S2, or 0 if they compare equal.
799 This function is similar to `memcmp', except that it operates on
802 The following function compares two Unicode strings of possibly
805 -- Function: int u8_cmp2 (const uint8_t *S1, size_t N1, const uint8_t
807 -- Function: int u16_cmp2 (const uint16_t *S1, size_t N1, const
808 uint16_t *S2, size_t N2)
809 -- Function: int u32_cmp2 (const uint32_t *S1, size_t N1, const
810 uint32_t *S2, size_t N2)
811 Compares S1 and S2, lexicographically. Returns a negative value
812 if S1 compares smaller than S2, a positive value if S1 compares
813 larger than S2, or 0 if they compare equal.
815 This function is similar to the gnulib function `memcmp2', except
816 that it operates on Unicode strings.
818 The following function searches for a given Unicode character.
820 -- Function: uint8_t * u8_chr (const uint8_t *S, size_t N, ucs4_t UC)
821 -- Function: uint16_t * u16_chr (const uint16_t *S, size_t N, ucs4_t
823 -- Function: uint32_t * u32_chr (const uint32_t *S, size_t N, ucs4_t
825 Searches the string at S for UC. Returns a pointer to the first
826 occurrence of UC in S, or NULL if UC does not occur in S.
828 This function is similar to `memchr', except that it operates on
831 The following function counts the number of Unicode characters.
833 -- Function: size_t u8_mbsnlen (const uint8_t *S, size_t N)
834 -- Function: size_t u16_mbsnlen (const uint16_t *S, size_t N)
835 -- Function: size_t u32_mbsnlen (const uint32_t *S, size_t N)
836 Counts and returns the number of Unicode characters in the N units
839 This function is similar to the gnulib function `mbsnlen', except
840 that it operates on Unicode strings.
843 File: libunistring.info, Node: Elementary string functions with memory allocation, Next: Elementary string functions on NUL terminated strings, Prev: Elementary string functions, Up: unistr.h
845 4.4 Elementary string functions with memory allocation
846 ======================================================
848 The following function copies a Unicode string.
850 -- Function: uint8_t * u8_cpy_alloc (const uint8_t *S, size_t N)
851 -- Function: uint16_t * u16_cpy_alloc (const uint16_t *S, size_t N)
852 -- Function: uint32_t * u32_cpy_alloc (const uint32_t *S, size_t N)
853 Makes a freshly allocated copy of S, of length N.
856 File: libunistring.info, Node: Elementary string functions on NUL terminated strings, Prev: Elementary string functions with memory allocation, Up: unistr.h
858 4.5 Elementary string functions on NUL terminated strings
859 =========================================================
861 The following functions inspect and return details about the first
862 character in a Unicode string.
864 -- Function: int u8_strmblen (const uint8_t *S)
865 -- Function: int u16_strmblen (const uint16_t *S)
866 -- Function: int u32_strmblen (const uint32_t *S)
867 Returns the length (number of units) of the first character in S.
868 Returns 0 if it is the NUL character. Returns -1 upon failure.
870 -- Function: int u8_strmbtouc (ucs4_t *PUC, const uint8_t *S)
871 -- Function: int u16_strmbtouc (ucs4_t *PUC, const uint16_t *S)
872 -- Function: int u32_strmbtouc (ucs4_t *PUC, const uint32_t *S)
873 Returns the length (number of units) of the first character in S,
874 putting its `ucs4_t' representation in `*PUC'. Returns 0 if it is
875 the NUL character. Returns -1 upon failure.
877 -- Function: const uint8_t * u8_next (ucs4_t *PUC, const uint8_t *S)
878 -- Function: const uint16_t * u16_next (ucs4_t *PUC, const uint16_t *S)
879 -- Function: const uint32_t * u32_next (ucs4_t *PUC, const uint32_t *S)
880 Forward iteration step. Advances the pointer past the next
881 character, or returns NULL if the end of the string has been
882 reached. Puts the character's `ucs4_t' representation in `*PUC'.
884 The following function inspects and returns details about the
885 previous character in a Unicode string.
887 -- Function: const uint8_t * u8_prev (ucs4_t *PUC, const uint8_t *S,
888 const uint8_t *START)
889 -- Function: const uint16_t * u16_prev (ucs4_t *PUC, const uint16_t
890 *S, const uint16_t *START)
891 -- Function: const uint32_t * u32_prev (ucs4_t *PUC, const uint32_t
892 *S, const uint32_t *START)
893 Backward iteration step. Advances the pointer to point to the
894 previous character, or returns NULL if the beginning of the string
895 had been reached. Puts the character's `ucs4_t' representation in
898 The following functions determine the length of a Unicode string.
900 -- Function: size_t u8_strlen (const uint8_t *S)
901 -- Function: size_t u16_strlen (const uint16_t *S)
902 -- Function: size_t u32_strlen (const uint32_t *S)
903 Returns the number of units in S.
905 This function is similar to `strlen' and `wcslen', except that it
906 operates on Unicode strings.
908 -- Function: size_t u8_strnlen (const uint8_t *S, size_t MAXLEN)
909 -- Function: size_t u16_strnlen (const uint16_t *S, size_t MAXLEN)
910 -- Function: size_t u32_strnlen (const uint32_t *S, size_t MAXLEN)
911 Returns the number of units in S, but at most MAXLEN.
913 This function is similar to `strnlen' and `wcsnlen', except that
914 it operates on Unicode strings.
916 The following functions copy portions of Unicode strings in memory.
918 -- Function: uint8_t * u8_strcpy (uint8_t *DEST, const uint8_t *SRC)
919 -- Function: uint16_t * u16_strcpy (uint16_t *DEST, const uint16_t
921 -- Function: uint32_t * u32_strcpy (uint32_t *DEST, const uint32_t
925 This function is similar to `strcpy' and `wcscpy', except that it
926 operates on Unicode strings.
928 -- Function: uint8_t * u8_stpcpy (uint8_t *DEST, const uint8_t *SRC)
929 -- Function: uint16_t * u16_stpcpy (uint16_t *DEST, const uint16_t
931 -- Function: uint32_t * u32_stpcpy (uint32_t *DEST, const uint32_t
933 Copies SRC to DEST, returning the address of the terminating NUL
936 This function is similar to `stpcpy', except that it operates on
939 -- Function: uint8_t * u8_strncpy (uint8_t *DEST, const uint8_t *SRC,
941 -- Function: uint16_t * u16_strncpy (uint16_t *DEST, const uint16_t
943 -- Function: uint32_t * u32_strncpy (uint32_t *DEST, const uint32_t
945 Copies no more than N units of SRC to DEST.
947 This function is similar to `strncpy' and `wcsncpy', except that
948 it operates on Unicode strings.
950 -- Function: uint8_t * u8_stpncpy (uint8_t *DEST, const uint8_t *SRC,
952 -- Function: uint16_t * u16_stpncpy (uint16_t *DEST, const uint16_t
954 -- Function: uint32_t * u32_stpncpy (uint32_t *DEST, const uint32_t
956 Copies no more than N units of SRC to DEST. Returns a pointer
957 past the last non-NUL unit written into DEST. In other words, if
958 the units written into DEST include a NUL, the return value is the
959 address of the first such NUL unit, otherwise it is `DEST + N'.
961 This function is similar to `stpncpy', except that it operates on
964 -- Function: uint8_t * u8_strcat (uint8_t *DEST, const uint8_t *SRC)
965 -- Function: uint16_t * u16_strcat (uint16_t *DEST, const uint16_t
967 -- Function: uint32_t * u32_strcat (uint32_t *DEST, const uint32_t
969 Appends SRC onto DEST.
971 This function is similar to `strcat' and `wcscat', except that it
972 operates on Unicode strings.
974 -- Function: uint8_t * u8_strncat (uint8_t *DEST, const uint8_t *SRC,
976 -- Function: uint16_t * u16_strncat (uint16_t *DEST, const uint16_t
978 -- Function: uint32_t * u32_strncat (uint32_t *DEST, const uint32_t
980 Appends no more than N units of SRC onto DEST.
982 This function is similar to `strncat' and `wcsncat', except that
983 it operates on Unicode strings.
985 The following functions compare two Unicode strings.
987 -- Function: int u8_strcmp (const uint8_t *S1, const uint8_t *S2)
988 -- Function: int u16_strcmp (const uint16_t *S1, const uint16_t *S2)
989 -- Function: int u32_strcmp (const uint32_t *S1, const uint32_t *S2)
990 Compares S1 and S2, lexicographically. Returns a negative value
991 if S1 compares smaller than S2, a positive value if S1 compares
992 larger than S2, or 0 if they compare equal.
994 This function is similar to `strcmp' and `wcscmp', except that it
995 operates on Unicode strings.
997 -- Function: int u8_strcoll (const uint8_t *S1, const uint8_t *S2)
998 -- Function: int u16_strcoll (const uint16_t *S1, const uint16_t *S2)
999 -- Function: int u32_strcoll (const uint32_t *S1, const uint32_t *S2)
1000 Compares S1 and S2 using the collation rules of the current locale.
1001 Returns -1 if S1 < S2, 0 if S1 = S2, 1 if S1 > S2. Upon failure,
1002 sets `errno' and returns any value.
1004 This function is similar to `strcoll' and `wcscoll', except that
1005 it operates on Unicode strings.
1007 Note that this function may consider different canonical
1008 normalizations of the same string as having a large distance. It
1009 is therefore better to use the function `u8_normcoll' instead of
1010 this one; see *note uninorm.h::.
1012 -- Function: int u8_strncmp (const uint8_t *S1, const uint8_t *S2,
1014 -- Function: int u16_strncmp (const uint16_t *S1, const uint16_t *S2,
1016 -- Function: int u32_strncmp (const uint32_t *S1, const uint32_t *S2,
1018 Compares no more than N units of S1 and S2.
1020 This function is similar to `strncmp' and `wcsncmp', except that
1021 it operates on Unicode strings.
1023 The following function allocates a duplicate of a Unicode string.
1025 -- Function: uint8_t * u8_strdup (const uint8_t *S)
1026 -- Function: uint16_t * u16_strdup (const uint16_t *S)
1027 -- Function: uint32_t * u32_strdup (const uint32_t *S)
1028 Duplicates S, returning an identical malloc'd string.
1030 This function is similar to `strdup' and `wcsdup', except that it
1031 operates on Unicode strings.
1033 The following functions search for a given Unicode character.
1035 -- Function: uint8_t * u8_strchr (const uint8_t *STR, ucs4_t UC)
1036 -- Function: uint16_t * u16_strchr (const uint16_t *STR, ucs4_t UC)
1037 -- Function: uint32_t * u32_strchr (const uint32_t *STR, ucs4_t UC)
1038 Finds the first occurrence of UC in STR.
1040 This function is similar to `strchr' and `wcschr', except that it
1041 operates on Unicode strings.
1043 -- Function: uint8_t * u8_strrchr (const uint8_t *STR, ucs4_t UC)
1044 -- Function: uint16_t * u16_strrchr (const uint16_t *STR, ucs4_t UC)
1045 -- Function: uint32_t * u32_strrchr (const uint32_t *STR, ucs4_t UC)
1046 Finds the last occurrence of UC in STR.
1048 This function is similar to `strrchr' and `wcsrchr', except that
1049 it operates on Unicode strings.
1051 The following functions search for the first occurrence of some
1052 Unicode character in or outside a given set of Unicode characters.
1054 -- Function: size_t u8_strcspn (const uint8_t *STR, const uint8_t
1056 -- Function: size_t u16_strcspn (const uint16_t *STR, const uint16_t
1058 -- Function: size_t u32_strcspn (const uint32_t *STR, const uint32_t
1060 Returns the length of the initial segment of STR which consists
1061 entirely of Unicode characters not in REJECT.
1063 This function is similar to `strcspn' and `wcscspn', except that
1064 it operates on Unicode strings.
1066 -- Function: size_t u8_strspn (const uint8_t *STR, const uint8_t
1068 -- Function: size_t u16_strspn (const uint16_t *STR, const uint16_t
1070 -- Function: size_t u32_strspn (const uint32_t *STR, const uint32_t
1072 Returns the length of the initial segment of STR which consists
1073 entirely of Unicode characters in ACCEPT.
1075 This function is similar to `strspn' and `wcsspn', except that it
1076 operates on Unicode strings.
1078 -- Function: uint8_t * u8_strpbrk (const uint8_t *STR, const uint8_t
1080 -- Function: uint16_t * u16_strpbrk (const uint16_t *STR, const
1082 -- Function: uint32_t * u32_strpbrk (const uint32_t *STR, const
1084 Finds the first occurrence in STR of any character in ACCEPT.
1086 This function is similar to `strpbrk' and `wcspbrk', except that
1087 it operates on Unicode strings.
1089 The following functions search whether a given Unicode string is a
1090 substring of another Unicode string.
1092 -- Function: uint8_t * u8_strstr (const uint8_t *HAYSTACK, const
1094 -- Function: uint16_t * u16_strstr (const uint16_t *HAYSTACK, const
1096 -- Function: uint32_t * u32_strstr (const uint32_t *HAYSTACK, const
1098 Finds the first occurrence of NEEDLE in HAYSTACK.
1100 This function is similar to `strstr' and `wcsstr', except that it
1101 operates on Unicode strings.
1103 -- Function: bool u8_startswith (const uint8_t *STR, const uint8_t
1105 -- Function: bool u16_startswith (const uint16_t *STR, const uint16_t
1107 -- Function: bool u32_startswith (const uint32_t *STR, const uint32_t
1109 Tests whether STR starts with PREFIX.
1111 -- Function: bool u8_endswith (const uint8_t *STR, const uint8_t
1113 -- Function: bool u16_endswith (const uint16_t *STR, const uint16_t
1115 -- Function: bool u32_endswith (const uint32_t *STR, const uint32_t
1117 Tests whether STR ends with SUFFIX.
1119 The following function does one step in tokenizing a Unicode string.
1121 -- Function: uint8_t * u8_strtok (uint8_t *STR, const uint8_t *DELIM,
1123 -- Function: uint16_t * u16_strtok (uint16_t *STR, const uint16_t
1124 *DELIM, uint16_t **PTR)
1125 -- Function: uint32_t * u32_strtok (uint32_t *STR, const uint32_t
1126 *DELIM, uint32_t **PTR)
1127 Divides STR into tokens separated by characters in DELIM.
1129 This function is similar to `strtok_r' and `wcstok', except that
1130 it operates on Unicode strings. Its interface is actually more
1131 similar to `wcstok' than to `strtok'.
1134 File: libunistring.info, Node: uniconv.h, Next: unistdio.h, Prev: unistr.h, Up: Top
1136 5 Conversions between Unicode and encodings `<uniconv.h>'
1137 *********************************************************
1139 This include file declares functions for converting between Unicode
1140 strings and `char *' strings in locale encoding or in other specified
1143 The following function returns the locale encoding.
1145 -- Function: const char * locale_charset ()
1146 Determines the current locale's character encoding, and
1147 canonicalizes it into one of the canonical names listed in
1148 `config.charset'. If the canonical name cannot be determined, the
1149 result is a non-canonical name.
1151 The result must not be freed; it is statically allocated.
1153 The result of this function can be used as an argument to the
1154 `iconv_open' function in GNU libc, in GNU libiconv, or in the
1155 gnulib provided wrapper around the native `iconv_open' function.
1156 It may not work as an argument to the native `iconv_open' function
1159 The handling of unconvertible characters during the conversions can
1160 be parametrized through the following enumeration type:
1162 -- Type: enum iconv_ilseq_handler
1163 This type specifies how unconvertible characters in the input are
1166 -- Constant: enum iconv_ilseq_handler iconveh_error
1167 This handler causes the function to return with `errno' set to
1170 -- Constant: enum iconv_ilseq_handler iconveh_question_mark
1171 This handler produces one question mark `?' per unconvertible
1174 -- Constant: enum iconv_ilseq_handler iconveh_escape_sequence
1175 This handler produces an escape sequence `\uXXXX' or `\UXXXXXXXX'
1176 for each unconvertible character.
1178 The following functions convert between strings in a specified
1179 encoding and Unicode strings.
1181 -- Function: uint8_t * u8_conv_from_encoding (const char *FROMCODE,
1182 enum iconv_ilseq_handler HANDLER, const char *SRC, size_t
1183 SRCLEN, size_t *OFFSETS, uint8_t *RESULTBUF, size_t *LENGTHP)
1184 -- Function: uint16_t * u16_conv_from_encoding (const char *FROMCODE,
1185 enum iconv_ilseq_handler HANDLER, const char *SRC, size_t
1186 SRCLEN, size_t *OFFSETS, uint16_t *RESULTBUF, size_t *LENGTHP)
1187 -- Function: uint32_t * u32_conv_from_encoding (const char *FROMCODE,
1188 enum iconv_ilseq_handler HANDLER, const char *SRC, size_t
1189 SRCLEN, size_t *OFFSETS, uint32_t *RESULTBUF, size_t *LENGTHP)
1190 Converts an entire string, possibly including NUL bytes, from one
1191 encoding to UTF-8 encoding.
1193 Converts a memory region given in encoding FROMCODE. FROMCODE is
1194 as for the `iconv_open' function.
1196 The input is in the memory region between SRC (inclusive) and `SRC
1197 + SRCLEN' (exclusive).
1199 If OFFSETS is not NULL, it should point to an array of SRCLEN
1200 integers; this array is filled with offsets into the result, i.e.
1201 the character starting at `SRC[i]' corresponds to the character
1202 starting at `RESULT[OFFSETS[i]]', and other offsets are set to
1205 `RESULTBUF' and `*LENGTHP' should be a scratch buffer and its
1206 size, or `RESULTBUF' can be NULL.
1208 May erase the contents of the memory at `RESULTBUF'.
1210 If successful: The resulting Unicode string (non-NULL) is returned
1211 and its length stored in `*LENGTHP'. The resulting string is
1212 `RESULTBUF' if no dynamic memory allocation was necessary, or a
1213 freshly allocated memory block otherwise.
1215 In case of error: NULL is returned and `errno' is set. Particular
1216 `errno' values: `EINVAL', `EILSEQ', `ENOMEM'.
1218 -- Function: char * u8_conv_to_encoding (const char *TOCODE, enum
1219 iconv_ilseq_handler HANDLER, const uint8_t *SRC, size_t
1220 SRCLEN, size_t *OFFSETS, char *RESULTBUF, size_t *LENGTHP)
1221 -- Function: char * u16_conv_to_encoding (const char *TOCODE, enum
1222 iconv_ilseq_handler HANDLER, const uint16_t *SRC, size_t
1223 SRCLEN, size_t *OFFSETS, char *RESULTBUF, size_t *LENGTHP)
1224 -- Function: char * u32_conv_to_encoding (const char *TOCODE, enum
1225 iconv_ilseq_handler HANDLER, const uint32_t *SRC, size_t
1226 SRCLEN, size_t *OFFSETS, char *RESULTBUF, size_t *LENGTHP)
1227 Converts an entire Unicode string, possibly including NUL units,
1228 from UTF-8 encoding to a given encoding.
1230 Converts a memory region to encoding TOCODE. TOCODE is as for the
1231 `iconv_open' function.
1233 The input is in the memory region between SRC (inclusive) and `SRC
1234 + SRCLEN' (exclusive).
1236 If OFFSETS is not NULL, it should point to an array of SRCLEN
1237 integers; this array is filled with offsets into the result, i.e.
1238 the character starting at `SRC[i]' corresponds to the character
1239 starting at `RESULT[OFFSETS[i]]', and other offsets are set to
1242 `RESULTBUF' and `*LENGTHP' should be a scratch buffer and its
1243 size, or `RESULTBUF' can be NULL.
1245 May erase the contents of the memory at `RESULTBUF'.
1247 If successful: The resulting Unicode string (non-NULL) is returned
1248 and its length stored in `*LENGTHP'. The resulting string is
1249 `RESULTBUF' if no dynamic memory allocation was necessary, or a
1250 freshly allocated memory block otherwise.
1252 In case of error: NULL is returned and `errno' is set. Particular
1253 `errno' values: `EINVAL', `EILSEQ', `ENOMEM'.
1255 The following functions convert between NUL terminated strings in a
1256 specified encoding and NUL terminated Unicode strings.
1258 -- Function: uint8_t * u8_strconv_from_encoding (const char *STRING,
1259 const char *FROMCODE, enum iconv_ilseq_handler HANDLER)
1260 -- Function: uint16_t * u16_strconv_from_encoding (const char *STRING,
1261 const char *FROMCODE, enum iconv_ilseq_handler HANDLER)
1262 -- Function: uint32_t * u32_strconv_from_encoding (const char *STRING,
1263 const char *FROMCODE, enum iconv_ilseq_handler HANDLER)
1264 Converts a NUL terminated string from a given encoding.
1266 The result is `malloc' allocated, or NULL (with ERRNO set) in case
1269 Particular `errno' values: `EILSEQ', `ENOMEM'.
1271 -- Function: char * u8_strconv_to_encoding (const uint8_t *STRING,
1272 const char *TOCODE, enum iconv_ilseq_handler HANDLER)
1273 -- Function: char * u16_strconv_to_encoding (const uint16_t *STRING,
1274 const char *TOCODE, enum iconv_ilseq_handler HANDLER)
1275 -- Function: char * u32_strconv_to_encoding (const uint32_t *STRING,
1276 const char *TOCODE, enum iconv_ilseq_handler HANDLER)
1277 Converts a NUL terminated string to a given encoding.
1279 The result is `malloc' allocated, or NULL (with `errno' set) in
1282 Particular `errno' values: `EILSEQ', `ENOMEM'.
1284 The following functions are shorthands that convert between NUL
1285 terminated strings in locale encoding and NUL terminated Unicode
1288 -- Function: uint8_t * u8_strconv_from_locale (const char *STRING)
1289 -- Function: uint16_t * u16_strconv_from_locale (const char *STRING)
1290 -- Function: uint32_t * u32_strconv_from_locale (const char *STRING)
1291 Converts a NUL terminated string from the locale encoding.
1293 The result is `malloc' allocated, or NULL (with `errno' set) in
1296 Particular `errno' values: `ENOMEM'.
1298 -- Function: char * u8_strconv_to_locale (const uint8_t *STRING)
1299 -- Function: char * u16_strconv_to_locale (const uint16_t *STRING)
1300 -- Function: char * u32_strconv_to_locale (const uint32_t *STRING)
1301 Converts a NUL terminated string to the locale encoding.
1303 The result is `malloc' allocated, or NULL (with `errno' set) in
1306 Particular `errno' values: `ENOMEM'.
1309 File: libunistring.info, Node: unistdio.h, Next: uniname.h, Prev: uniconv.h, Up: Top
1311 6 Output with Unicode strings `<unistdio.h>'
1312 ********************************************
1314 This include file declares functions for doing formatted output with
1315 Unicode strings. It defines a set of functions similar to `fprintf' and
1316 `sprintf', which are declared in `<stdio.h>'.
1318 These functions work like the `printf' function family. In the
1320 * The format directive `U' takes an UTF-8 string (`const uint8_t *').
1322 * The format directive `lU' takes an UTF-16 string (`const uint16_t
1325 * The format directive `llU' takes an UTF-32 string (`const uint32_t
1328 A function name with an infix `v' indicates that a `va_list' is
1329 passed instead of multiple arguments.
1331 The functions `*sprintf' have a BUF argument that is assumed to be
1332 large enough. (_DANGEROUS! Overflowing the buffer will crash the
1335 The functions `*snprintf' have a BUF argument that is assumed to be
1336 SIZE units large. (_DANGEROUS! The resulting string might be
1337 truncated in the middle of a multibyte character._)
1339 The functions `*asprintf' have a RESULTP argument. The result will
1340 be freshly allocated and stored in `*resultp'.
1342 The functions `*asnprintf' have a (RESULTBUF, LENGTHP) argument
1343 pair. If RESULTBUF is not NULL and the result fits into `*LENGTHP'
1344 units, it is put in RESULTBUF, and RESULTBUF is returned. Otherwise, a
1345 freshly allocated string is returned. In both cases, `*LENGTHP' is set
1346 to the length (number of units) of the returned string. In case of
1347 error, NULL is returned and `errno' is set.
1349 The following functions take an ASCII format string and return a
1350 result that is a `char *' string in locale encoding.
1352 -- Function: int ulc_sprintf (char *BUF, const char *FORMAT, ...)
1354 -- Function: int ulc_snprintf (char *BUF, size_t size, const char
1357 -- Function: int ulc_asprintf (char **RESULTP, const char *FORMAT, ...)
1359 -- Function: char * ulc_asnprintf (char *RESULTBUF, size_t *LENGTHP,
1360 const char *FORMAT, ...)
1362 -- Function: int ulc_vsprintf (char *BUF, const char *FORMAT, va_list
1365 -- Function: int ulc_vsnprintf (char *BUF, size_t size, const char
1366 *FORMAT, va_list AP)
1368 -- Function: int ulc_vasprintf (char **RESULTP, const char *FORMAT,
1371 -- Function: char * ulc_vasnprintf (char *RESULTBUF, size_t *LENGTHP,
1372 const char *FORMAT, va_list AP)
1374 The following functions take an ASCII format string and return a
1375 result in UTF-8 format.
1377 -- Function: int u8_sprintf (uint8_t *BUF, const char *FORMAT, ...)
1379 -- Function: int u8_snprintf (uint8_t *BUF, size_t SIZE, const char
1382 -- Function: int u8_asprintf (uint8_t **RESULTP, const char *FORMAT,
1385 -- Function: uint8_t * u8_asnprintf (uint8_t *RESULTBUF, size_t
1386 *LENGTHP, const char *FORMAT, ...)
1388 -- Function: int u8_vsprintf (uint8_t *BUF, const char *FORMAT,
1391 -- Function: int u8_vsnprintf (uint8_t *BUF, size_t SIZE, const char
1392 *FORMAT, va_list AP)
1394 -- Function: int u8_vasprintf (uint8_t **RESULTP, const char *FORMAT,
1397 -- Function: uint8_t * u8_vasnprintf (uint8_t *resultbuf, size_t
1398 *LENGTHP, const char *FORMAT, va_list AP)
1400 The following functions take an UTF-8 format string and return a
1401 result in UTF-8 format.
1403 -- Function: int u8_u8_sprintf (uint8_t *BUF, const uint8_t *FORMAT,
1406 -- Function: int u8_u8_snprintf (uint8_t *BUF, size_t SIZE, const
1407 uint8_t *FORMAT, ...)
1409 -- Function: int u8_u8_asprintf (uint8_t **RESULTP, const uint8_t
1412 -- Function: uint8_t * u8_u8_asnprintf (uint8_t *resultbuf, size_t
1413 *LENGTHP, const uint8_t *FORMAT, ...)
1415 -- Function: int u8_u8_vsprintf (uint8_t *BUF, const uint8_t *FORMAT,
1418 -- Function: int u8_u8_vsnprintf (uint8_t *BUF, size_t SIZE, const
1419 uint8_t *FORMAT, va_list AP)
1421 -- Function: int u8_u8_vasprintf (uint8_t **RESULTP, const uint8_t
1422 *FORMAT, va_list AP)
1424 -- Function: uint8_t * u8_u8_vasnprintf (uint8_t *resultbuf, size_t
1425 *LENGTHP, const uint8_t *FORMAT, va_list AP)
1427 The following functions take an ASCII format string and return a
1428 result in UTF-16 format.
1430 -- Function: int u16_sprintf (uint16_t *BUF, const char *FORMAT, ...)
1432 -- Function: int u16_snprintf (uint16_t *BUF, size_t SIZE, const char
1435 -- Function: int u16_asprintf (uint16_t **RESULTP, const char *FORMAT,
1438 -- Function: uint16_t * u16_asnprintf (uint16_t *RESULTBUF, size_t
1439 *LENGTHP, const char *FORMAT, ...)
1441 -- Function: int u16_vsprintf (uint16_t *BUF, const char *FORMAT,
1444 -- Function: int u16_vsnprintf (uint16_t *BUF, size_t SIZE, const char
1445 *FORMAT, va_list AP)
1447 -- Function: int u16_vasprintf (uint16_t **RESULTP, const char
1448 *FORMAT, va_list AP)
1450 -- Function: uint16_t * u16_vasnprintf (uint16_t *resultbuf, size_t
1451 *LENGTHP, const char *FORMAT, va_list AP)
1453 The following functions take an UTF-16 format string and return a
1454 result in UTF-16 format.
1456 -- Function: int u16_u16_sprintf (uint16_t *BUF, const uint16_t
1459 -- Function: int u16_u16_snprintf (uint16_t *BUF, size_t SIZE, const
1460 uint16_t *FORMAT, ...)
1462 -- Function: int u16_u16_asprintf (uint16_t **RESULTP, const uint16_t
1465 -- Function: uint16_t * u16_u16_asnprintf (uint16_t *resultbuf, size_t
1466 *LENGTHP, const uint16_t *FORMAT, ...)
1468 -- Function: int u16_u16_vsprintf (uint16_t *BUF, const uint16_t
1469 *FORMAT, va_list AP)
1471 -- Function: int u16_u16_vsnprintf (uint16_t *BUF, size_t SIZE, const
1472 uint16_t *FORMAT, va_list AP)
1474 -- Function: int u16_u16_vasprintf (uint16_t **RESULTP, const uint16_t
1475 *FORMAT, va_list AP)
1477 -- Function: uint16_t * u16_u16_vasnprintf (uint16_t *resultbuf,
1478 size_t *LENGTHP, const uint16_t *FORMAT, va_list AP)
1480 The following functions take an ASCII format string and return a
1481 result in UTF-32 format.
1483 -- Function: int u32_sprintf (uint32_t *BUF, const char *FORMAT, ...)
1485 -- Function: int u32_snprintf (uint32_t *BUF, size_t SIZE, const char
1488 -- Function: int u32_asprintf (uint32_t **RESULTP, const char *FORMAT,
1491 -- Function: uint32_t * u32_asnprintf (uint32_t *RESULTBUF, size_t
1492 *LENGTHP, const char *FORMAT, ...)
1494 -- Function: int u32_vsprintf (uint32_t *BUF, const char *FORMAT,
1497 -- Function: int u32_vsnprintf (uint32_t *BUF, size_t SIZE, const char
1498 *FORMAT, va_list AP)
1500 -- Function: int u32_vasprintf (uint32_t **RESULTP, const char
1501 *FORMAT, va_list AP)
1503 -- Function: uint32_t * u32_vasnprintf (uint32_t *resultbuf, size_t
1504 *LENGTHP, const char *FORMAT, va_list AP)
1506 The following functions take an UTF-32 format string and return a
1507 result in UTF-32 format.
1509 -- Function: int u32_u32_sprintf (uint32_t *BUF, const uint32_t
1512 -- Function: int u32_u32_snprintf (uint32_t *BUF, size_t SIZE, const
1513 uint32_t *FORMAT, ...)
1515 -- Function: int u32_u32_asprintf (uint32_t **RESULTP, const uint32_t
1518 -- Function: uint32_t * u32_u32_asnprintf (uint32_t *resultbuf, size_t
1519 *LENGTHP, const uint32_t *FORMAT, ...)
1521 -- Function: int u32_u32_vsprintf (uint32_t *BUF, const uint32_t
1522 *FORMAT, va_list AP)
1524 -- Function: int u32_u32_vsnprintf (uint32_t *BUF, size_t SIZE, const
1525 uint32_t *FORMAT, va_list AP)
1527 -- Function: int u32_u32_vasprintf (uint32_t **RESULTP, const uint32_t
1528 *FORMAT, va_list AP)
1530 -- Function: uint32_t * u32_u32_vasnprintf (uint32_t *resultbuf,
1531 size_t *LENGTHP, const uint32_t *FORMAT, va_list AP)
1533 The following functions take an ASCII format string and produce
1534 output in locale encoding to a `FILE' stream.
1536 -- Function: int ulc_fprintf (FILE *STREAM, const char *FORMAT, ...)
1538 -- Function: int ulc_vfprintf (FILE *STREAM, const char *FORMAT,
1542 File: libunistring.info, Node: uniname.h, Next: unictype.h, Prev: unistdio.h, Up: Top
1544 7 Names of Unicode characters `<uniname.h>'
1545 *******************************************
1547 This include file implements the association between a Unicode
1548 character and its name.
1550 The name of a Unicode character allows to distinguish it from other,
1551 similar looking characters. For example, the character `x' has the name
1552 `"LATIN SMALL LETTER X"' and is therefore different from the character
1553 named `"MULTIPLICATION SIGN"'.
1555 -- Macro: unsigned int UNINAME_MAX
1556 This macro expands to a constant that is the required size of
1557 buffer for a Unicode character name.
1559 -- Function: char * unicode_character_name (ucs4_t UC, char *BUF)
1560 Looks up the name of a Unicode character, in uppercase ASCII. BUF
1561 must point to a buffer, at least `UNINAME_MAX' bytes in size.
1562 Returns the filled BUF, or NULL if the character does not have a
1565 -- Function: ucs4_t unicode_name_character (const char *NAME)
1566 Looks up the Unicode character with a given name, in upper- or
1567 lowercase ASCII. Returns the character if found, or
1568 `UNINAME_INVALID' if not found.
1570 -- Macro: ucs4_t UNINAME_INVALID
1571 This macro expands to a constant that is a special return value of
1572 the `unicode_name_character' function.
1575 File: libunistring.info, Node: unictype.h, Next: uniwidth.h, Prev: uniname.h, Up: Top
1577 8 Unicode character classification and properties `<unictype.h>'
1578 ****************************************************************
1580 This include file declares functions that classify Unicode characters
1581 and that test whether Unicode characters have specific properties.
1583 The classification assigns a "general category" to every Unicode
1584 character. This is similar to the classification provided by ISO C in
1587 Properties are the data that guides various text processing
1588 algorithms in the presence of specific Unicode characters.
1592 * General category::
1593 * Canonical combining class::
1594 * Bidirectional category::
1595 * Decimal digit value::
1598 * Mirrored character::
1602 * ISO C and Java syntax::
1603 * Classifications like in ISO C::
1606 File: libunistring.info, Node: General category, Next: Canonical combining class, Up: unictype.h
1608 8.1 General category
1609 ====================
1611 Every Unicode character or code point has a _general category_
1612 assigned to it. This classification is important for most algorithms
1613 that work on Unicode text.
1615 The GNU libunistring library provides two kinds of API for working
1616 with general categories. The object oriented API uses a variable to
1617 denote every predefined general category value or combinations thereof.
1618 The low-level API uses a bit mask instead. The advantage of the object
1619 oriented API is that if only a few predefined general category values
1620 are used, the data tables are relatively small. When you combine
1621 general category values (using `uc_general_category_or',
1622 `uc_general_category_and', or `uc_general_category_and_not'), or when
1623 you use the low level bit masks, a big table is used thats holds the
1624 complete general category information for all Unicode characters.
1628 * Object oriented API::
1632 File: libunistring.info, Node: Object oriented API, Next: Bit mask API, Up: General category
1634 8.1.1 The object oriented API for general category
1635 --------------------------------------------------
1637 -- Type: uc_general_category_t
1638 This data type denotes a general category value. It is an
1639 immediate type that can be copied by simple assignment, without
1640 involving memory allocation. It is not an array type.
1642 The following are the predefined general category value. Additional
1643 general categories may be added in the future.
1645 -- Constant: uc_general_category_t UC_CATEGORY_L
1646 -- Constant: uc_general_category_t UC_CATEGORY_Lu
1647 -- Constant: uc_general_category_t UC_CATEGORY_Ll
1648 -- Constant: uc_general_category_t UC_CATEGORY_Lt
1649 -- Constant: uc_general_category_t UC_CATEGORY_Lm
1650 -- Constant: uc_general_category_t UC_CATEGORY_Lo
1651 -- Constant: uc_general_category_t UC_CATEGORY_M
1652 -- Constant: uc_general_category_t UC_CATEGORY_Mn
1653 -- Constant: uc_general_category_t UC_CATEGORY_Mc
1654 -- Constant: uc_general_category_t UC_CATEGORY_Me
1655 -- Constant: uc_general_category_t UC_CATEGORY_N
1656 -- Constant: uc_general_category_t UC_CATEGORY_Nd
1657 -- Constant: uc_general_category_t UC_CATEGORY_Nl
1658 -- Constant: uc_general_category_t UC_CATEGORY_No
1659 -- Constant: uc_general_category_t UC_CATEGORY_P
1660 -- Constant: uc_general_category_t UC_CATEGORY_Pc
1661 -- Constant: uc_general_category_t UC_CATEGORY_Pd
1662 -- Constant: uc_general_category_t UC_CATEGORY_Ps
1663 -- Constant: uc_general_category_t UC_CATEGORY_Pe
1664 -- Constant: uc_general_category_t UC_CATEGORY_Pi
1665 -- Constant: uc_general_category_t UC_CATEGORY_Pf
1666 -- Constant: uc_general_category_t UC_CATEGORY_Po
1667 -- Constant: uc_general_category_t UC_CATEGORY_S
1668 -- Constant: uc_general_category_t UC_CATEGORY_Sm
1669 -- Constant: uc_general_category_t UC_CATEGORY_Sc
1670 -- Constant: uc_general_category_t UC_CATEGORY_Sk
1671 -- Constant: uc_general_category_t UC_CATEGORY_So
1672 -- Constant: uc_general_category_t UC_CATEGORY_Z
1673 -- Constant: uc_general_category_t UC_CATEGORY_Zs
1674 -- Constant: uc_general_category_t UC_CATEGORY_Zl
1675 -- Constant: uc_general_category_t UC_CATEGORY_Zp
1676 -- Constant: uc_general_category_t UC_CATEGORY_C
1677 -- Constant: uc_general_category_t UC_CATEGORY_Cc
1678 -- Constant: uc_general_category_t UC_CATEGORY_Cf
1679 -- Constant: uc_general_category_t UC_CATEGORY_Cs
1680 -- Constant: uc_general_category_t UC_CATEGORY_Co
1681 -- Constant: uc_general_category_t UC_CATEGORY_Cn
1683 The following are alias names for predefined General category values.
1685 -- Macro: uc_general_category_t UC_LETTER
1686 This is another name for `UC_CATEGORY_L'.
1688 -- Macro: uc_general_category_t UC_UPPERCASE_LETTER
1689 This is another name for `UC_CATEGORY_Lu'.
1691 -- Macro: uc_general_category_t UC_LOWERCASE_LETTER
1692 This is another name for `UC_CATEGORY_Ll'.
1694 -- Macro: uc_general_category_t UC_TITLECASE_LETTER
1695 This is another name for `UC_CATEGORY_Lt'.
1697 -- Macro: uc_general_category_t UC_MODIFIER_LETTER
1698 This is another name for `UC_CATEGORY_Lm'.
1700 -- Macro: uc_general_category_t UC_OTHER_LETTER
1701 This is another name for `UC_CATEGORY_Lo'.
1703 -- Macro: uc_general_category_t UC_MARK
1704 This is another name for `UC_CATEGORY_M'.
1706 -- Macro: uc_general_category_t UC_NON_SPACING_MARK
1707 This is another name for `UC_CATEGORY_Mn'.
1709 -- Macro: uc_general_category_t UC_COMBINING_SPACING_MARK
1710 This is another name for `UC_CATEGORY_Mc'.
1712 -- Macro: uc_general_category_t UC_ENCLOSING_MARK
1713 This is another name for `UC_CATEGORY_Me'.
1715 -- Macro: uc_general_category_t UC_NUMBER
1716 This is another name for `UC_CATEGORY_N'.
1718 -- Macro: uc_general_category_t UC_DECIMAL_DIGIT_NUMBER
1719 This is another name for `UC_CATEGORY_Nd'.
1721 -- Macro: uc_general_category_t UC_LETTER_NUMBER
1722 This is another name for `UC_CATEGORY_Nl'.
1724 -- Macro: uc_general_category_t UC_OTHER_NUMBER
1725 This is another name for `UC_CATEGORY_No'.
1727 -- Macro: uc_general_category_t UC_PUNCTUATION
1728 This is another name for `UC_CATEGORY_P'.
1730 -- Macro: uc_general_category_t UC_CONNECTOR_PUNCTUATION
1731 This is another name for `UC_CATEGORY_Pc'.
1733 -- Macro: uc_general_category_t UC_DASH_PUNCTUATION
1734 This is another name for `UC_CATEGORY_Pd'.
1736 -- Macro: uc_general_category_t UC_OPEN_PUNCTUATION
1737 This is another name for `UC_CATEGORY_Ps' ("start punctuation").
1739 -- Macro: uc_general_category_t UC_CLOSE_PUNCTUATION
1740 This is another name for `UC_CATEGORY_Pe' ("end punctuation").
1742 -- Macro: uc_general_category_t UC_INITIAL_QUOTE_PUNCTUATION
1743 This is another name for `UC_CATEGORY_Pi'.
1745 -- Macro: uc_general_category_t UC_FINAL_QUOTE_PUNCTUATION
1746 This is another name for `UC_CATEGORY_Pf'.
1748 -- Macro: uc_general_category_t UC_OTHER_PUNCTUATION
1749 This is another name for `UC_CATEGORY_Po'.
1751 -- Macro: uc_general_category_t UC_SYMBOL
1752 This is another name for `UC_CATEGORY_S'.
1754 -- Macro: uc_general_category_t UC_MATH_SYMBOL
1755 This is another name for `UC_CATEGORY_Sm'.
1757 -- Macro: uc_general_category_t UC_CURRENCY_SYMBOL
1758 This is another name for `UC_CATEGORY_Sc'.
1760 -- Macro: uc_general_category_t UC_MODIFIER_SYMBOL
1761 This is another name for `UC_CATEGORY_Sk'.
1763 -- Macro: uc_general_category_t UC_OTHER_SYMBOL
1764 This is another name for `UC_CATEGORY_So'.
1766 -- Macro: uc_general_category_t UC_SEPARATOR
1767 This is another name for `UC_CATEGORY_Z'.
1769 -- Macro: uc_general_category_t UC_SPACE_SEPARATOR
1770 This is another name for `UC_CATEGORY_Zs'.
1772 -- Macro: uc_general_category_t UC_LINE_SEPARATOR
1773 This is another name for `UC_CATEGORY_Zl'.
1775 -- Macro: uc_general_category_t UC_PARAGRAPH_SEPARATOR
1776 This is another name for `UC_CATEGORY_Zp'.
1778 -- Macro: uc_general_category_t UC_OTHER
1779 This is another name for `UC_CATEGORY_C'.
1781 -- Macro: uc_general_category_t UC_CONTROL
1782 This is another name for `UC_CATEGORY_Cc'.
1784 -- Macro: uc_general_category_t UC_FORMAT
1785 This is another name for `UC_CATEGORY_Cf'.
1787 -- Macro: uc_general_category_t UC_SURROGATE
1788 This is another name for `UC_CATEGORY_Cs'. All code points in this
1789 category are invalid characters.
1791 -- Macro: uc_general_category_t UC_PRIVATE_USE
1792 This is another name for `UC_CATEGORY_Co'.
1794 -- Macro: uc_general_category_t UC_UNASSIGNED
1795 This is another name for `UC_CATEGORY_Cn'. Some code points in
1796 this category are invalid characters.
1798 The following functions combine general categories, like in a
1799 boolean algebra, except that there is no `not' operation.
1801 -- Function: uc_general_category_t uc_general_category_or
1802 (uc_general_category_t CATEGORY1, uc_general_category_t
1804 Returns the union of two general categories. This corresponds to
1805 the unions of the two sets of characters.
1807 -- Function: uc_general_category_t uc_general_category_and
1808 (uc_general_category_t CATEGORY1, uc_general_category_t
1810 Returns the intersection of two general categories as bit masks.
1811 This _does not_ correspond to the intersection of the two sets of
1814 -- Function: uc_general_category_t uc_general_category_and_not
1815 (uc_general_category_t CATEGORY1, uc_general_category_t
1817 Returns the intersection of a general category with the complement
1818 of a second general category, as bit masks. This _does not_
1819 correspond to the intersection with complement, when viewing the
1820 categories as sets of characters.
1822 The following functions associate general categories with their name.
1824 -- Function: const char * uc_general_category_name
1825 (uc_general_category_t CATEGORY)
1826 Returns the name of a general category. Returns NULL if the
1827 general category corresponds to a bit mask that does not have a
1830 -- Function: uc_general_category_t uc_general_category_byname (const
1831 char *CATEGORY_NAME)
1832 Returns the general category given by name, e.g. `"Lu"'.
1834 The following functions view general categories as sets of Unicode
1837 -- Function: uc_general_category_t uc_general_category (ucs4_t UC)
1838 Returns the general category of a Unicode character.
1840 This function uses a big table.
1842 -- Function: bool uc_is_general_category (ucs4_t UC,
1843 uc_general_category_t CATEGORY)
1844 Tests whether a Unicode character belongs to a given category.
1845 The CATEGORY argument can be a predefined general category or the
1846 combination of several predefined general categories.
1849 File: libunistring.info, Node: Bit mask API, Prev: Object oriented API, Up: General category
1851 8.1.2 The bit mask API for general category
1852 -------------------------------------------
1854 The following are the predefined general category value as bit masks.
1855 Additional general categories may be added in the future.
1857 -- Macro: uint32_t UC_CATEGORY_MASK_L
1858 -- Macro: uint32_t UC_CATEGORY_MASK_Lu
1859 -- Macro: uint32_t UC_CATEGORY_MASK_Ll
1860 -- Macro: uint32_t UC_CATEGORY_MASK_Lt
1861 -- Macro: uint32_t UC_CATEGORY_MASK_Lm
1862 -- Macro: uint32_t UC_CATEGORY_MASK_Lo
1863 -- Macro: uint32_t UC_CATEGORY_MASK_M
1864 -- Macro: uint32_t UC_CATEGORY_MASK_Mn
1865 -- Macro: uint32_t UC_CATEGORY_MASK_Mc
1866 -- Macro: uint32_t UC_CATEGORY_MASK_Me
1867 -- Macro: uint32_t UC_CATEGORY_MASK_N
1868 -- Macro: uint32_t UC_CATEGORY_MASK_Nd
1869 -- Macro: uint32_t UC_CATEGORY_MASK_Nl
1870 -- Macro: uint32_t UC_CATEGORY_MASK_No
1871 -- Macro: uint32_t UC_CATEGORY_MASK_P
1872 -- Macro: uint32_t UC_CATEGORY_MASK_Pc
1873 -- Macro: uint32_t UC_CATEGORY_MASK_Pd
1874 -- Macro: uint32_t UC_CATEGORY_MASK_Ps
1875 -- Macro: uint32_t UC_CATEGORY_MASK_Pe
1876 -- Macro: uint32_t UC_CATEGORY_MASK_Pi
1877 -- Macro: uint32_t UC_CATEGORY_MASK_Pf
1878 -- Macro: uint32_t UC_CATEGORY_MASK_Po
1879 -- Macro: uint32_t UC_CATEGORY_MASK_S
1880 -- Macro: uint32_t UC_CATEGORY_MASK_Sm
1881 -- Macro: uint32_t UC_CATEGORY_MASK_Sc
1882 -- Macro: uint32_t UC_CATEGORY_MASK_Sk
1883 -- Macro: uint32_t UC_CATEGORY_MASK_So
1884 -- Macro: uint32_t UC_CATEGORY_MASK_Z
1885 -- Macro: uint32_t UC_CATEGORY_MASK_Zs
1886 -- Macro: uint32_t UC_CATEGORY_MASK_Zl
1887 -- Macro: uint32_t UC_CATEGORY_MASK_Zp
1888 -- Macro: uint32_t UC_CATEGORY_MASK_C
1889 -- Macro: uint32_t UC_CATEGORY_MASK_Cc
1890 -- Macro: uint32_t UC_CATEGORY_MASK_Cf
1891 -- Macro: uint32_t UC_CATEGORY_MASK_Cs
1892 -- Macro: uint32_t UC_CATEGORY_MASK_Co
1893 -- Macro: uint32_t UC_CATEGORY_MASK_Cn
1895 The following function views general categories as sets of Unicode
1898 -- Function: bool uc_is_general_category_withtable (ucs4_t UC,
1900 Tests whether a Unicode character belongs to a given category.
1901 The BITMASK argument can be a predefined general category bitmask
1902 or the combination of several predefined general category bitmasks.
1904 This function uses a big table comprising all general categories.
1907 File: libunistring.info, Node: Canonical combining class, Next: Bidirectional category, Prev: General category, Up: unictype.h
1909 8.2 Canonical combining class
1910 =============================
1912 Every Unicode character or code point has a _canonical combining
1913 class_ assigned to it.
1915 What is the meaning of the canonical combining class? Essentially,
1916 it indicates the priority with which a combining character is attached
1917 to its base character. The characters for which the canonical
1918 combining class is 0 are the base characters, and the characters for
1919 which it is greater than 0 are the combining characters. Combining
1920 characters are rendered near/attached/around their base character, and
1921 combining characters with small combining classes are attached "first"
1922 or "closer" to the base character.
1924 The canonical combining class of a character is a number in the range
1925 0..255. The possible values are described in the Unicode Character
1926 Database `http://www.unicode.org/Public/UNIDATA/UCD.html'. The list
1927 here is not definitive; more values can be added in future versions.
1929 -- Constant: int UC_CCC_NR
1930 The canonical combining class value for "Not Reordered" characters.
1933 -- Constant: int UC_CCC_OV
1934 The canonical combining class value for "Overlay" characters.
1936 -- Constant: int UC_CCC_NK
1937 The canonical combining class value for "Nukta" characters.
1939 -- Constant: int UC_CCC_KV
1940 The canonical combining class value for "Kana Voicing" characters.
1942 -- Constant: int UC_CCC_VR
1943 The canonical combining class value for "Virama" characters.
1945 -- Constant: int UC_CCC_ATBL
1946 The canonical combining class value for "Attached Below Left"
1949 -- Constant: int UC_CCC_ATB
1950 The canonical combining class value for "Attached Below"
1953 -- Constant: int UC_CCC_ATAR
1954 The canonical combining class value for "Attached Above Right"
1957 -- Constant: int UC_CCC_BL
1958 The canonical combining class value for "Below Left" characters.
1960 -- Constant: int UC_CCC_B
1961 The canonical combining class value for "Below" characters.
1963 -- Constant: int UC_CCC_BR
1964 The canonical combining class value for "Below Right" characters.
1966 -- Constant: int UC_CCC_L
1967 The canonical combining class value for "Left" characters.
1969 -- Constant: int UC_CCC_R
1970 The canonical combining class value for "Right" characters.
1972 -- Constant: int UC_CCC_AL
1973 The canonical combining class value for "Above Left" characters.
1975 -- Constant: int UC_CCC_A
1976 The canonical combining class value for "Above" characters.
1978 -- Constant: int UC_CCC_AR
1979 The canonical combining class value for "Above Right" characters.
1981 -- Constant: int UC_CCC_DB
1982 The canonical combining class value for "Double Below" characters.
1984 -- Constant: int UC_CCC_DA
1985 The canonical combining class value for "Double Above" characters.
1987 -- Constant: int UC_CCC_IS
1988 The canonical combining class value for "Iota Subscript"
1991 The following function looks up the canonical combining class of a
1994 -- Function: int uc_combining_class (ucs4_t UC)
1995 Returns the canonical combining class of a Unicode character.
1998 File: libunistring.info, Node: Bidirectional category, Next: Decimal digit value, Prev: Canonical combining class, Up: unictype.h
2000 8.3 Bidirectional category
2001 ==========================
2003 Every Unicode character or code point has a _bidirectional category_
2006 The bidirectional category guides the bidirectional algorithm
2007 (`http://www.unicode.org/reports/tr9/'). The possible values are the
2010 -- Constant: int UC_BIDI_L
2011 The bidirectional category for `Left-to-Right`" characters.
2013 -- Constant: int UC_BIDI_LRE
2014 The bidirectional category for "Left-to-Right Embedding"
2017 -- Constant: int UC_BIDI_LRO
2018 The bidirectional category for "Left-to-Right Override" characters.
2020 -- Constant: int UC_BIDI_R
2021 The bidirectional category for "Right-to-Left" characters.
2023 -- Constant: int UC_BIDI_AL
2024 The bidirectional category for "Right-to-Left Arabic" characters.
2026 -- Constant: int UC_BIDI_RLE
2027 The bidirectional category for "Right-to-Left Embedding"
2030 -- Constant: int UC_BIDI_RLO
2031 The bidirectional category for "Right-to-Left Override" characters.
2033 -- Constant: int UC_BIDI_PDF
2034 The bidirectional category for "Pop Directional Format" characters.
2036 -- Constant: int UC_BIDI_EN
2037 The bidirectional category for "European Number" characters.
2039 -- Constant: int UC_BIDI_ES
2040 The bidirectional category for "European Number Separator"
2043 -- Constant: int UC_BIDI_ET
2044 The bidirectional category for "European Number Terminator"
2047 -- Constant: int UC_BIDI_AN
2048 The bidirectional category for "Arabic Number" characters.
2050 -- Constant: int UC_BIDI_CS
2051 The bidirectional category for "Common Number Separator"
2054 -- Constant: int UC_BIDI_NSM
2055 The bidirectional category for "Non-Spacing Mark" characters.
2057 -- Constant: int UC_BIDI_BN
2058 The bidirectional category for "Boundary Neutral" characters.
2060 -- Constant: int UC_BIDI_B
2061 The bidirectional category for "Paragraph Separator" characters.
2063 -- Constant: int UC_BIDI_S
2064 The bidirectional category for "Segment Separator" characters.
2066 -- Constant: int UC_BIDI_WS
2067 The bidirectional category for "Whitespace" characters.
2069 -- Constant: int UC_BIDI_ON
2070 The bidirectional category for "Other Neutral" characters.
2072 The following functions implement the association between a
2073 bidirectional category and its name.
2075 -- Function: const char * uc_bidi_category_name (int CATEGORY)
2076 Returns the name of a bidirectional category.
2078 -- Function: int uc_bidi_category_byname (const char *CATEGORY_NAME)
2079 Returns the bidirectional category given by name, e.g. `"LRE"'.
2081 The following functions view bidirectional categories as sets of
2084 -- Function: int uc_bidi_category (ucs4_t UC)
2085 Returns the bidirectional category of a Unicode character.
2087 -- Function: bool uc_is_bidi_category (ucs4_t UC, int CATEGORY)
2088 Tests whether a Unicode character belongs to a given bidirectional
2092 File: libunistring.info, Node: Decimal digit value, Next: Digit value, Prev: Bidirectional category, Up: unictype.h
2094 8.4 Decimal digit value
2095 =======================
2097 Decimal digits (like the digits from `0' to `9') exist in many
2098 scripts. The following function converts a decimal digit character to
2099 its numerical value.
2101 -- Function: int uc_decimal_value (ucs4_t UC)
2102 Returns the decimal digit value of a Unicode character. The
2103 return value is an integer in the range 0..9, or -1 for characters
2104 that do not represent a decimal digit.
2107 File: libunistring.info, Node: Digit value, Next: Numeric value, Prev: Decimal digit value, Up: unictype.h
2112 Digit characters are like decimal digit characters, possibly in
2113 special forms, like as superscript, subscript, or circled. The
2114 following function converts a digit character to its numerical value.
2116 -- Function: int uc_digit_value (ucs4_t UC)
2117 Returns the digit value of a Unicode character. The return value
2118 is an integer in the range 0..9, or -1 for characters that do not
2122 File: libunistring.info, Node: Numeric value, Next: Mirrored character, Prev: Digit value, Up: unictype.h
2127 There are also characters that represent numbers without a digit
2128 system, like the Roman numerals, and fractional numbers, like 1/4 or
2131 The following type represents the numeric value of a Unicode
2134 -- Type: uc_fraction_t
2135 This is a structure type with the following fields:
2138 An integer N is represented by `numerator = N', `denominator = 1'.
2140 The following function converts a number character to its numerical
2143 -- Function: uc_fraction_t uc_numeric_value (ucs4_t UC)
2144 Returns the numeric value of a Unicode character. The return
2145 value is a fraction, or the pseudo-fraction `{ 0, 0 }' for
2146 characters that do not represent a number.
2149 File: libunistring.info, Node: Mirrored character, Next: Properties, Prev: Numeric value, Up: unictype.h
2151 8.7 Mirrored character
2152 ======================
2154 Character mirroring is used to associate the closing parenthesis
2155 character to the opening parenthesis character, the closing brace
2156 character with the opening brace character, and so on.
2158 The following function looks up the mirrored character of a Unicode
2161 -- Function: bool uc_mirror_char (ucs4_t UC, ucs4_t *PUC)
2162 Stores the mirrored character of a Unicode character UC in `*PUC'
2163 and returns `true', if it exists. Otherwise it stores UC
2164 unmodified in `*PUC' and returns `false'.
2167 File: libunistring.info, Node: Properties, Next: Scripts, Prev: Mirrored character, Up: unictype.h
2172 This section defines boolean properties of Unicode characters. This
2173 means, a character either has the given property or does not have it.
2174 In other words, the property can be viewed as a subset of the set of
2177 The GNU libunistring library provides two kinds of API for working
2178 with properties. The object oriented API uses a type `uc_property_t'
2179 to designate a property. In the function-based API, which is a bit more
2180 low level, a property is merely a function.
2184 * Properties as objects::
2185 * Properties as functions::
2188 File: libunistring.info, Node: Properties as objects, Next: Properties as functions, Up: Properties
2190 8.8.1 Properties as objects - the object oriented API
2191 -----------------------------------------------------
2193 The following type designates a property on Unicode characters.
2195 -- Type: uc_property_t
2196 This data type denotes a boolean property on Unicode characters.
2197 It is an immediate type that can be copied by simple assignment,
2198 without involving memory allocation. It is not an array type.
2200 Many Unicode properties are predefined.
2202 The following are general properties.
2204 -- Constant: uc_property_t UC_PROPERTY_WHITE_SPACE
2205 -- Constant: uc_property_t UC_PROPERTY_ALPHABETIC
2206 -- Constant: uc_property_t UC_PROPERTY_OTHER_ALPHABETIC
2207 -- Constant: uc_property_t UC_PROPERTY_NOT_A_CHARACTER
2208 -- Constant: uc_property_t UC_PROPERTY_DEFAULT_IGNORABLE_CODE_POINT
2209 -- Constant: uc_property_t
2210 UC_PROPERTY_OTHER_DEFAULT_IGNORABLE_CODE_POINT
2211 -- Constant: uc_property_t UC_PROPERTY_DEPRECATED
2212 -- Constant: uc_property_t UC_PROPERTY_LOGICAL_ORDER_EXCEPTION
2213 -- Constant: uc_property_t UC_PROPERTY_VARIATION_SELECTOR
2214 -- Constant: uc_property_t UC_PROPERTY_PRIVATE_USE
2215 -- Constant: uc_property_t UC_PROPERTY_UNASSIGNED_CODE_VALUE
2217 The following properties are related to case folding.
2219 -- Constant: uc_property_t UC_PROPERTY_UPPERCASE
2220 -- Constant: uc_property_t UC_PROPERTY_OTHER_UPPERCASE
2221 -- Constant: uc_property_t UC_PROPERTY_LOWERCASE
2222 -- Constant: uc_property_t UC_PROPERTY_OTHER_LOWERCASE
2223 -- Constant: uc_property_t UC_PROPERTY_TITLECASE
2224 -- Constant: uc_property_t UC_PROPERTY_SOFT_DOTTED
2226 The following properties are related to identifiers.
2228 -- Constant: uc_property_t UC_PROPERTY_ID_START
2229 -- Constant: uc_property_t UC_PROPERTY_OTHER_ID_START
2230 -- Constant: uc_property_t UC_PROPERTY_ID_CONTINUE
2231 -- Constant: uc_property_t UC_PROPERTY_OTHER_ID_CONTINUE
2232 -- Constant: uc_property_t UC_PROPERTY_XID_START
2233 -- Constant: uc_property_t UC_PROPERTY_XID_CONTINUE
2234 -- Constant: uc_property_t UC_PROPERTY_PATTERN_WHITE_SPACE
2235 -- Constant: uc_property_t UC_PROPERTY_PATTERN_SYNTAX
2237 The following properties have an influence on shaping and rendering.
2239 -- Constant: uc_property_t UC_PROPERTY_JOIN_CONTROL
2240 -- Constant: uc_property_t UC_PROPERTY_GRAPHEME_BASE
2241 -- Constant: uc_property_t UC_PROPERTY_GRAPHEME_EXTEND
2242 -- Constant: uc_property_t UC_PROPERTY_OTHER_GRAPHEME_EXTEND
2243 -- Constant: uc_property_t UC_PROPERTY_GRAPHEME_LINK
2245 The following properties relate to bidirectional reordering.
2247 -- Constant: uc_property_t UC_PROPERTY_BIDI_CONTROL
2248 -- Constant: uc_property_t UC_PROPERTY_BIDI_LEFT_TO_RIGHT
2249 -- Constant: uc_property_t UC_PROPERTY_BIDI_HEBREW_RIGHT_TO_LEFT
2250 -- Constant: uc_property_t UC_PROPERTY_BIDI_ARABIC_RIGHT_TO_LEFT
2251 -- Constant: uc_property_t UC_PROPERTY_BIDI_EUROPEAN_DIGIT
2252 -- Constant: uc_property_t UC_PROPERTY_BIDI_EUR_NUM_SEPARATOR
2253 -- Constant: uc_property_t UC_PROPERTY_BIDI_EUR_NUM_TERMINATOR
2254 -- Constant: uc_property_t UC_PROPERTY_BIDI_ARABIC_DIGIT
2255 -- Constant: uc_property_t UC_PROPERTY_BIDI_COMMON_SEPARATOR
2256 -- Constant: uc_property_t UC_PROPERTY_BIDI_BLOCK_SEPARATOR
2257 -- Constant: uc_property_t UC_PROPERTY_BIDI_SEGMENT_SEPARATOR
2258 -- Constant: uc_property_t UC_PROPERTY_BIDI_WHITESPACE
2259 -- Constant: uc_property_t UC_PROPERTY_BIDI_NON_SPACING_MARK
2260 -- Constant: uc_property_t UC_PROPERTY_BIDI_BOUNDARY_NEUTRAL
2261 -- Constant: uc_property_t UC_PROPERTY_BIDI_PDF
2262 -- Constant: uc_property_t UC_PROPERTY_BIDI_EMBEDDING_OR_OVERRIDE
2263 -- Constant: uc_property_t UC_PROPERTY_BIDI_OTHER_NEUTRAL
2265 The following properties deal with number representations.
2267 -- Constant: uc_property_t UC_PROPERTY_HEX_DIGIT
2268 -- Constant: uc_property_t UC_PROPERTY_ASCII_HEX_DIGIT
2270 The following properties deal with CJK.
2272 -- Constant: uc_property_t UC_PROPERTY_IDEOGRAPHIC
2273 -- Constant: uc_property_t UC_PROPERTY_UNIFIED_IDEOGRAPH
2274 -- Constant: uc_property_t UC_PROPERTY_RADICAL
2275 -- Constant: uc_property_t UC_PROPERTY_IDS_BINARY_OPERATOR
2276 -- Constant: uc_property_t UC_PROPERTY_IDS_TRINARY_OPERATOR
2278 Other miscellaneous properties are:
2280 -- Constant: uc_property_t UC_PROPERTY_ZERO_WIDTH
2281 -- Constant: uc_property_t UC_PROPERTY_SPACE
2282 -- Constant: uc_property_t UC_PROPERTY_NON_BREAK
2283 -- Constant: uc_property_t UC_PROPERTY_ISO_CONTROL
2284 -- Constant: uc_property_t UC_PROPERTY_FORMAT_CONTROL
2285 -- Constant: uc_property_t UC_PROPERTY_DASH
2286 -- Constant: uc_property_t UC_PROPERTY_HYPHEN
2287 -- Constant: uc_property_t UC_PROPERTY_PUNCTUATION
2288 -- Constant: uc_property_t UC_PROPERTY_LINE_SEPARATOR
2289 -- Constant: uc_property_t UC_PROPERTY_PARAGRAPH_SEPARATOR
2290 -- Constant: uc_property_t UC_PROPERTY_QUOTATION_MARK
2291 -- Constant: uc_property_t UC_PROPERTY_SENTENCE_TERMINAL
2292 -- Constant: uc_property_t UC_PROPERTY_TERMINAL_PUNCTUATION
2293 -- Constant: uc_property_t UC_PROPERTY_CURRENCY_SYMBOL
2294 -- Constant: uc_property_t UC_PROPERTY_MATH
2295 -- Constant: uc_property_t UC_PROPERTY_OTHER_MATH
2296 -- Constant: uc_property_t UC_PROPERTY_PAIRED_PUNCTUATION
2297 -- Constant: uc_property_t UC_PROPERTY_LEFT_OF_PAIR
2298 -- Constant: uc_property_t UC_PROPERTY_COMBINING
2299 -- Constant: uc_property_t UC_PROPERTY_COMPOSITE
2300 -- Constant: uc_property_t UC_PROPERTY_DECIMAL_DIGIT
2301 -- Constant: uc_property_t UC_PROPERTY_NUMERIC
2302 -- Constant: uc_property_t UC_PROPERTY_DIACRITIC
2303 -- Constant: uc_property_t UC_PROPERTY_EXTENDER
2304 -- Constant: uc_property_t UC_PROPERTY_IGNORABLE_CONTROL
2306 The following function looks up a property by its name.
2308 -- Function: uc_property_t uc_property_byname (const char
2310 Returns the property given by name, e.g. `"White space"'. If a
2311 property with the given name exists, the result will satisfy the
2312 `uc_property_is_valid' predicate. Otherwise the result will not
2313 satisfy this predicate and must not be passed to functions that
2314 expect an `uc_property_t' argument.
2316 This function references a big table of all predefined properties.
2317 Its use can significantly increase the size of your application.
2319 -- Function: bool uc_property_is_valid (uc_property_t property)
2320 Returns `true' when the given property is valid, or `false'
2323 The following function views a property as a set of Unicode
2326 -- Function: bool uc_is_property (ucs4_t UC, uc_property_t PROPERTY)
2327 Tests whether the Unicode character UC has the given property.
2330 File: libunistring.info, Node: Properties as functions, Prev: Properties as objects, Up: Properties
2332 8.8.2 Properties as functions - the functional API
2333 --------------------------------------------------
2335 The following are general properties.
2337 -- Function: bool uc_is_property_white_space (ucs4_t UC)
2338 -- Function: bool uc_is_property_alphabetic (ucs4_t UC)
2339 -- Function: bool uc_is_property_other_alphabetic (ucs4_t UC)
2340 -- Function: bool uc_is_property_not_a_character (ucs4_t UC)
2341 -- Function: bool uc_is_property_default_ignorable_code_point (ucs4_t
2343 -- Function: bool uc_is_property_other_default_ignorable_code_point
2345 -- Function: bool uc_is_property_deprecated (ucs4_t UC)
2346 -- Function: bool uc_is_property_logical_order_exception (ucs4_t UC)
2347 -- Function: bool uc_is_property_variation_selector (ucs4_t UC)
2348 -- Function: bool uc_is_property_private_use (ucs4_t UC)
2349 -- Function: bool uc_is_property_unassigned_code_value (ucs4_t UC)
2351 The following properties are related to case folding.
2353 -- Function: bool uc_is_property_uppercase (ucs4_t UC)
2354 -- Function: bool uc_is_property_other_uppercase (ucs4_t UC)
2355 -- Function: bool uc_is_property_lowercase (ucs4_t UC)
2356 -- Function: bool uc_is_property_other_lowercase (ucs4_t UC)
2357 -- Function: bool uc_is_property_titlecase (ucs4_t UC)
2358 -- Function: bool uc_is_property_soft_dotted (ucs4_t UC)
2360 The following properties are related to identifiers.
2362 -- Function: bool uc_is_property_id_start (ucs4_t UC)
2363 -- Function: bool uc_is_property_other_id_start (ucs4_t UC)
2364 -- Function: bool uc_is_property_id_continue (ucs4_t UC)
2365 -- Function: bool uc_is_property_other_id_continue (ucs4_t UC)
2366 -- Function: bool uc_is_property_xid_start (ucs4_t UC)
2367 -- Function: bool uc_is_property_xid_continue (ucs4_t UC)
2368 -- Function: bool uc_is_property_pattern_white_space (ucs4_t UC)
2369 -- Function: bool uc_is_property_pattern_syntax (ucs4_t UC)
2371 The following properties have an influence on shaping and rendering.
2373 -- Function: bool uc_is_property_join_control (ucs4_t UC)
2374 -- Function: bool uc_is_property_grapheme_base (ucs4_t UC)
2375 -- Function: bool uc_is_property_grapheme_extend (ucs4_t UC)
2376 -- Function: bool uc_is_property_other_grapheme_extend (ucs4_t UC)
2377 -- Function: bool uc_is_property_grapheme_link (ucs4_t UC)
2379 The following properties relate to bidirectional reordering.
2381 -- Function: bool uc_is_property_bidi_control (ucs4_t UC)
2382 -- Function: bool uc_is_property_bidi_left_to_right (ucs4_t UC)
2383 -- Function: bool uc_is_property_bidi_hebrew_right_to_left (ucs4_t UC)
2384 -- Function: bool uc_is_property_bidi_arabic_right_to_left (ucs4_t UC)
2385 -- Function: bool uc_is_property_bidi_european_digit (ucs4_t UC)
2386 -- Function: bool uc_is_property_bidi_eur_num_separator (ucs4_t UC)
2387 -- Function: bool uc_is_property_bidi_eur_num_terminator (ucs4_t UC)
2388 -- Function: bool uc_is_property_bidi_arabic_digit (ucs4_t UC)
2389 -- Function: bool uc_is_property_bidi_common_separator (ucs4_t UC)
2390 -- Function: bool uc_is_property_bidi_block_separator (ucs4_t UC)
2391 -- Function: bool uc_is_property_bidi_segment_separator (ucs4_t UC)
2392 -- Function: bool uc_is_property_bidi_whitespace (ucs4_t UC)
2393 -- Function: bool uc_is_property_bidi_non_spacing_mark (ucs4_t UC)
2394 -- Function: bool uc_is_property_bidi_boundary_neutral (ucs4_t UC)
2395 -- Function: bool uc_is_property_bidi_pdf (ucs4_t UC)
2396 -- Function: bool uc_is_property_bidi_embedding_or_override (ucs4_t UC)
2397 -- Function: bool uc_is_property_bidi_other_neutral (ucs4_t UC)
2399 The following properties deal with number representations.
2401 -- Function: bool uc_is_property_hex_digit (ucs4_t UC)
2402 -- Function: bool uc_is_property_ascii_hex_digit (ucs4_t UC)
2404 The following properties deal with CJK.
2406 -- Function: bool uc_is_property_ideographic (ucs4_t UC)
2407 -- Function: bool uc_is_property_unified_ideograph (ucs4_t UC)
2408 -- Function: bool uc_is_property_radical (ucs4_t UC)
2409 -- Function: bool uc_is_property_ids_binary_operator (ucs4_t UC)
2410 -- Function: bool uc_is_property_ids_trinary_operator (ucs4_t UC)
2412 Other miscellaneous properties are:
2414 -- Function: bool uc_is_property_zero_width (ucs4_t UC)
2415 -- Function: bool uc_is_property_space (ucs4_t UC)
2416 -- Function: bool uc_is_property_non_break (ucs4_t UC)
2417 -- Function: bool uc_is_property_iso_control (ucs4_t UC)
2418 -- Function: bool uc_is_property_format_control (ucs4_t UC)
2419 -- Function: bool uc_is_property_dash (ucs4_t UC)
2420 -- Function: bool uc_is_property_hyphen (ucs4_t UC)
2421 -- Function: bool uc_is_property_punctuation (ucs4_t UC)
2422 -- Function: bool uc_is_property_line_separator (ucs4_t UC)
2423 -- Function: bool uc_is_property_paragraph_separator (ucs4_t UC)
2424 -- Function: bool uc_is_property_quotation_mark (ucs4_t UC)
2425 -- Function: bool uc_is_property_sentence_terminal (ucs4_t UC)
2426 -- Function: bool uc_is_property_terminal_punctuation (ucs4_t UC)
2427 -- Function: bool uc_is_property_currency_symbol (ucs4_t UC)
2428 -- Function: bool uc_is_property_math (ucs4_t UC)
2429 -- Function: bool uc_is_property_other_math (ucs4_t UC)
2430 -- Function: bool uc_is_property_paired_punctuation (ucs4_t UC)
2431 -- Function: bool uc_is_property_left_of_pair (ucs4_t UC)
2432 -- Function: bool uc_is_property_combining (ucs4_t UC)
2433 -- Function: bool uc_is_property_composite (ucs4_t UC)
2434 -- Function: bool uc_is_property_decimal_digit (ucs4_t UC)
2435 -- Function: bool uc_is_property_numeric (ucs4_t UC)
2436 -- Function: bool uc_is_property_diacritic (ucs4_t UC)
2437 -- Function: bool uc_is_property_extender (ucs4_t UC)
2438 -- Function: bool uc_is_property_ignorable_control (ucs4_t UC)
2441 File: libunistring.info, Node: Scripts, Next: Blocks, Prev: Properties, Up: unictype.h
2446 The Unicode characters are subdivided into scripts.
2448 The following type is used to represent a script:
2450 -- Type: uc_script_t
2451 This data type is a structure type that refers to statically
2452 allocated read-only data. It contains the following fields:
2455 The `name' field contains the name of the script.
2457 The following functions look up a script.
2459 -- Function: const uc_script_t * uc_script (ucs4_t UC)
2460 Returns the script of a Unicode character. Returns NULL if UC
2461 does not belong to any script.
2463 -- Function: const uc_script_t * uc_script_byname (const char
2465 Returns the script given by its name, e.g. `"HAN"'. Returns NULL
2466 if a script with the given name does not exist.
2468 The following function views a script as a set of Unicode characters.
2470 -- Function: bool uc_is_script (ucs4_t UC, const uc_script_t *SCRIPT)
2471 Tests whether a Unicode character belongs to a given script.
2473 The following gives a global picture of all scripts.
2475 -- Function: void uc_all_scripts (const uc_script_t **SCRIPTS, size_t
2477 Get the list of all scripts. Stores a pointer to an array of all
2478 scripts in `*SCRIPTS' and the length of this array in `*COUNT'.
2481 File: libunistring.info, Node: Blocks, Next: ISO C and Java syntax, Prev: Scripts, Up: unictype.h
2486 The Unicode characters are subdivided into blocks. A block is an
2487 interval of Unicode code points.
2489 The following type is used to represent a block.
2492 This data type is a structure type that refers to statically
2493 allocated data. It contains the following fields:
2498 The `start' field is the first Unicode code point in the block.
2500 The `end' field is the last Unicode code point in the block.
2502 The `name' field is the name of the block.
2504 The following function looks up a block.
2506 -- Function: const uc_block_t * uc_block (ucs4_t UC)
2507 Returns the block a character belongs to.
2509 The following function views a block as a set of Unicode characters.
2511 -- Function: bool uc_is_block (ucs4_t UC, const uc_block_t *BLOCK)
2512 Tests whether a Unicode character belongs to a given block.
2514 The following gives a global picture of all block.
2516 -- Function: void uc_all_blocks (const uc_block_t **BLOCKS, size_t
2518 Get the list of all blocks. Stores a pointer to an array of all
2519 blocks in `*BLOCKS' and the length of this array in `*COUNT'.
2522 File: libunistring.info, Node: ISO C and Java syntax, Next: Classifications like in ISO C, Prev: Blocks, Up: unictype.h
2524 8.11 ISO C and Java syntax
2525 ==========================
2527 The following properties are taken from language standards. The
2528 supported language standards are ISO C 99 and Java.
2530 -- Function: bool uc_is_c_whitespace (ucs4_t UC)
2531 Tests whether a Unicode character is considered whitespace in ISO
2534 -- Function: bool uc_is_java_whitespace (ucs4_t UC)
2535 Tests whether a Unicode character is considered whitespace in Java.
2537 The following enumerated values are the possible return values of
2538 the functions `uc_c_ident_category' and `uc_java_ident_category'.
2540 -- Constant: int UC_IDENTIFIER_START
2541 This return value means that the given character is valid as first
2542 or subsequent character in an identifier.
2544 -- Constant: int UC_IDENTIFIER_VALID
2545 This return value means that the given character is valid as
2546 subsequent character only.
2548 -- Constant: int UC_IDENTIFIER_INVALID
2549 This return value means that the given character is not valid in
2552 -- Constant: int UC_IDENTIFIER_IGNORABLE
2553 This return value (only for Java) means that the given character
2556 The following function determine whether a given character can be a
2557 constituent of an identifier in the given programming language.
2559 -- Function: int uc_c_ident_category (ucs4_t UC)
2560 Returns the categorization of a Unicode character with respect to
2561 the ISO C 99 identifier syntax.
2563 -- Function: int uc_java_ident_category (ucs4_t UC)
2564 Returns the categorization of a Unicode character with respect to
2565 the Java identifier syntax.
2568 File: libunistring.info, Node: Classifications like in ISO C, Prev: ISO C and Java syntax, Up: unictype.h
2570 8.12 Classifications like in ISO C
2571 ==================================
2573 The following character classifications mimic those declared in the
2574 ISO C header files `<ctype.h>' and `<wctype.h>'. These functions are
2575 deprecated, because this set of functions was designed with ASCII in
2576 mind and cannot reflect the more diverse reality of the Unicode
2577 character set. But they can be a quick-and-dirty porting aid when
2578 migrating from `wchar_t' APIs to Unicode strings.
2580 -- Function: bool uc_is_alnum (ucs4_t UC)
2581 Tests for any character for which `uc_is_alpha' or `uc_is_digit' is
2584 -- Function: bool uc_is_alpha (ucs4_t UC)
2585 Tests for any character for which `uc_is_upper' or `uc_is_lower' is
2586 true, or any character that is one of a locale-specific set of
2587 characters for which none of `uc_is_cntrl', `uc_is_digit',
2588 `uc_is_punct', or `uc_is_space' is true.
2590 -- Function: bool uc_is_cntrl (ucs4_t UC)
2591 Tests for any control character.
2593 -- Function: bool uc_is_digit (ucs4_t UC)
2594 Tests for any character that corresponds to a decimal-digit
2597 -- Function: bool uc_is_graph (ucs4_t UC)
2598 Tests for any character for which `uc_is_print' is true and
2599 `uc_is_space' is false.
2601 -- Function: bool uc_is_lower (ucs4_t UC)
2602 Tests for any character that corresponds to a lowercase letter or
2603 is one of a locale-specific set of characters for which none of
2604 `uc_is_cntrl', `uc_is_digit', `uc_is_punct', or `uc_is_space' is
2607 -- Function: bool uc_is_print (ucs4_t UC)
2608 Tests for any printing character.
2610 -- Function: bool uc_is_punct (ucs4_t UC)
2611 Tests for any printing character that is one of a locale-specific
2612 set of characters for which neither `uc_is_space' nor
2613 `uc_is_alnum' is true.
2615 -- Function: bool uc_is_space (ucs4_t UC)
2616 Test for any character that corresponds to a locale-specific set
2617 of characters for which none of `uc_is_alnum', `uc_is_graph', or
2618 `uc_is_punct' is true.
2620 -- Function: bool uc_is_upper (ucs4_t UC)
2621 Tests for any character that corresponds to an uppercase letter or
2622 is one of a locale-specific set of characters for which none of
2623 `uc_is_cntrl', `uc_is_digit', `uc_is_punct', or `uc_is_space' is
2626 -- Function: bool uc_is_xdigit (ucs4_t UC)
2627 Tests for any character that corresponds to a hexadecimal-digit
2630 -- Function: bool uc_is_blank (ucs4_t UC)
2631 Tests for any character that corresponds to a standard blank
2632 character or a locale-specific set of characters for which
2633 `uc_is_alnum' is false.
2636 File: libunistring.info, Node: uniwidth.h, Next: uniwbrk.h, Prev: unictype.h, Up: Top
2638 9 Display width `<uniwidth.h>'
2639 ******************************
2641 This include file declares functions that return the display width,
2642 measured in columns, of characters or strings, when output to a device
2643 that uses non-proportional fonts.
2645 Note that for some rarely used characters the actual fonts or
2646 terminal emulators can use a different width. There is no mechanism
2647 for communicating the display width of characters across a Unix
2648 pseudo-terminal (tty). Also, there are scripts with complex rendering,
2649 like the Indic scripts. For these scripts, there is no such concept as
2650 non-proportional fonts. Therefore the results of these functions
2651 usually work fine on most scripts and on most characters but can fail
2652 to represent the actual display width.
2654 These functions are locale dependent. The ENCODING argument
2655 identifies the encoding (e.g. `"ISO-8859-2"' for Polish).
2657 -- Function: int uc_width (ucs4_t UC, const char *ENCODING)
2658 Determines and returns the number of column positions required for
2659 UC. Returns -1 if UC is a control character that has an influence
2660 on the column position when output.
2662 -- Function: int u8_width (const uint8_t *S, size_t N, const char
2664 -- Function: int u16_width (const uint16_t *S, size_t N, const char
2666 -- Function: int u32_width (const uint32_t *S, size_t N, const char
2668 Determines and returns the number of column positions required for
2669 first N units (or fewer if S ends before this) in S. This
2670 function ignores control characters in the string.
2672 -- Function: int u8_strwidth (const uint8_t *S, const char *ENCODING)
2673 -- Function: int u16_strwidth (const uint16_t *S, const char *ENCODING)
2674 -- Function: int u32_strwidth (const uint32_t *S, const char *ENCODING)
2675 Determines and returns the number of column positions required for
2676 S. This function ignores control characters in the string.
2679 File: libunistring.info, Node: uniwbrk.h, Next: unilbrk.h, Prev: uniwidth.h, Up: Top
2681 10 Word breaks in strings `<uniwbrk.h>'
2682 ***************************************
2684 This include file declares functions for determining where in a
2685 string "words" start and end. Here "words" are not necessarily the
2686 same as entities that can be looked up in dictionaries, but rather
2687 groups of consecutive characters that should not be split by text
2688 processing operations.
2692 * Word breaks in a string::
2693 * Word break property::
2696 File: libunistring.info, Node: Word breaks in a string, Next: Word break property, Up: uniwbrk.h
2698 10.1 Word breaks in a string
2699 ============================
2701 The following functions determine the word breaks in a string.
2703 -- Function: void u8_wordbreaks (const uint8_t *S, size_t N, char *P)
2704 -- Function: void u16_wordbreaks (const uint16_t *S, size_t N, char *P)
2705 -- Function: void u32_wordbreaks (const uint32_t *S, size_t N, char *P)
2706 -- Function: void ulc_wordbreaks (const char *S, size_t N, char *P)
2707 Determines the word break points in S, an array of N units, and
2708 stores the result at `P[0..N-1]'.
2710 means that there is a word boundary between `S[i-1]' and
2714 means that `S[i-1]' and `S[i]' must not be separated.
2715 `P[0]' is always set to 0. If an application wants to consider a
2716 word break to be present at the beginning of the string (before
2717 `S[0]') or at the end of the string (after `S[0..N-1]'), it has to
2718 treat these cases explicitly.
2721 File: libunistring.info, Node: Word break property, Prev: Word breaks in a string, Up: uniwbrk.h
2723 10.2 Word break property
2724 ========================
2726 This is a more low-level API. The word break property is a property
2727 defined in Unicode Standard Annex #29, section "Word Boundaries", see
2728 `http://www.unicode.org/reports/tr29/#Word_Boundaries'. It is used for
2729 determining the word breaks in a string.
2731 The following are the possible values of the word break property.
2732 More values may be added in the future.
2734 -- Constant: int WBP_OTHER
2735 -- Constant: int WBP_CR
2736 -- Constant: int WBP_LF
2737 -- Constant: int WBP_NEWLINE
2738 -- Constant: int WBP_EXTEND
2739 -- Constant: int WBP_FORMAT
2740 -- Constant: int WBP_KATAKANA
2741 -- Constant: int WBP_ALETTER
2742 -- Constant: int WBP_MIDNUMLET
2743 -- Constant: int WBP_MIDLETTER
2744 -- Constant: int WBP_MIDNUM
2745 -- Constant: int WBP_NUMERIC
2746 -- Constant: int WBP_EXTENDNUMLET
2748 The following function looks up the word break property of a
2751 -- Function: int uc_wordbreak_property (ucs4_t UC)
2752 Returns the Word_Break property of a Unicode character.
2755 File: libunistring.info, Node: unilbrk.h, Next: uninorm.h, Prev: uniwbrk.h, Up: Top
2757 11 Line breaking `<unilbrk.h>'
2758 ******************************
2760 This include file declares functions for determining where in a
2761 string line breaks could or should be introduced, in order to make the
2762 displayed string fit into a column of given width.
2764 These functions are locale dependent. The ENCODING argument
2765 identifies the encoding (e.g. `"ISO-8859-2"' for Polish).
2767 The following enumerated values indicate whether, at a given
2768 position, a line break is possible or not. Given an string S as an
2769 array `S[0..N-1]' and a position I, the values have the following
2772 -- Constant: int UC_BREAK_MANDATORY
2773 This value indicates that `S[I]' is a line break character.
2775 -- Constant: int UC_BREAK_POSSIBLE
2776 This value indicates that a line break may be inserted between
2777 `S[I-1]' and `S[I]'.
2779 -- Constant: int UC_BREAK_HYPHENATION
2780 This value indicates that a hyphen and a line break may be
2781 inserted between `S[I-1]' and `S[I]'. But beware of language
2782 dependent hyphenation rules.
2784 -- Constant: int UC_BREAK_PROHIBITED
2785 This value indicates that `S[I-1]' and `S[I]' must not be
2788 -- Constant: int UC_BREAK_UNDEFINED
2789 This value is not used as a return value; rather, in the
2790 overriding argument of the `u*_width_linebreaks' functions, it
2791 indicates the absence of an override.
2793 The following functions determine the positions at which line breaks
2796 -- Function: void u8_possible_linebreaks (const uint8_t *S, size_t N,
2797 const char *ENCODING, char *P)
2798 -- Function: void u16_possible_linebreaks (const uint16_t *S, size_t
2799 N, const char *ENCODING, char *P)
2800 -- Function: void u32_possible_linebreaks (const uint32_t *S, size_t
2801 N, const char *ENCODING, char *P)
2802 -- Function: void ulc_possible_linebreaks (const char *S, size_t N,
2803 const char *ENCODING, char *P)
2804 Determines the line break points in S, and stores the result at
2805 `P[0..N-1]'. Every `P[I]' is assigned one of the values
2806 `UC_BREAK_MANDATORY', `UC_BREAK_POSSIBLE', `UC_BREAK_HYPHENATION',
2807 `UC_BREAK_PROHIBITED'.
2809 The following functions determine where line breaks should be
2810 inserted so that each line fits in a given width, when output to a
2811 device that uses non-proportional fonts.
2813 -- Function: int u8_width_linebreaks (const uint8_t *S, size_t N, int
2814 WIDTH, int START_COLUMN, int AT_END_COLUMNS, const char
2815 *OVERRIDE, const char *ENCODING, char *P)
2816 -- Function: int u16_width_linebreaks (const uint16_t *S, size_t N,
2817 int WIDTH, int START_COLUMN, int AT_END_COLUMNS, const char
2818 *OVERRIDE, const char *ENCODING, char *P)
2819 -- Function: int u32_width_linebreaks (const uint32_t *S, size_t N,
2820 int WIDTH, int START_COLUMN, int AT_END_COLUMNS, const char
2821 *OVERRIDE, const char *ENCODING, char *P)
2822 -- Function: int ulc_width_linebreaks (const char *S, size_t N, int
2823 WIDTH, int START_COLUMN, int AT_END_COLUMNS, const char
2824 *OVERRIDE, const char *ENCODING, char *P)
2825 Chooses the best line breaks, assuming that every character
2826 occupies a width given by the `uc_width' function (see *note
2829 The string is `S[0..N-1]'.
2831 The maximum number of columns per line is given as WIDTH. The
2832 starting column of the string is given as START_COLUMN. If the
2833 algorithm shall keep room after the last piece, this amount of
2834 room can be given as AT_END_COLUMNS.
2836 OVERRIDE is an optional override; if `OVERRIDE[I] !=
2837 UC_BREAK_UNDEFINED', `OVERRIDE[I]' takes precedence over `P[I]' as
2838 returned by the `u*_possible_linebreaks' function.
2840 The given ENCODING is used for disambiguating widths in `uc_width'.
2842 Returns the column after the end of the string, and stores the
2843 result at `P[0..N-1]'. Every `P[I]' is assigned one of the values
2844 `UC_BREAK_MANDATORY', `UC_BREAK_POSSIBLE', `UC_BREAK_HYPHENATION',
2845 `UC_BREAK_PROHIBITED'. Here the value `UC_BREAK_POSSIBLE'
2846 indicates that a line break _should_ be inserted.
2849 File: libunistring.info, Node: uninorm.h, Next: unicase.h, Prev: unilbrk.h, Up: Top
2851 12 Normalization forms (composition and decomposition) `<uninorm.h>'
2852 ********************************************************************
2854 This include file defines functions for transforming Unicode strings
2855 to one of the four normal forms, known as NFC, NFD, NKFC, NFKD. These
2856 transformations involve decomposition and -- for NFC and NFKC --
2857 composition of Unicode characters.
2861 * Decomposition of characters::
2862 * Composition of characters::
2863 * Normalization of strings::
2864 * Normalizing comparisons::
2865 * Normalization of streams::
2868 File: libunistring.info, Node: Decomposition of characters, Next: Composition of characters, Up: uninorm.h
2870 12.1 Decomposition of Unicode characters
2871 ========================================
2873 The following enumerated values are the possible types of
2874 decomposition of a Unicode character.
2876 -- Constant: int UC_DECOMP_CANONICAL
2877 Denotes canonical decomposition.
2879 -- Constant: int UC_DECOMP_FONT
2880 UCD marker: `<font>'. Denotes a font variant (e.g. a blackletter
2883 -- Constant: int UC_DECOMP_NOBREAK
2884 UCD marker: `<noBreak>'. Denotes a no-break version of a space or
2887 -- Constant: int UC_DECOMP_INITIAL
2888 UCD marker: `<initial>'. Denotes an initial presentation form
2891 -- Constant: int UC_DECOMP_MEDIAL
2892 UCD marker: `<medial>'. Denotes a medial presentation form
2895 -- Constant: int UC_DECOMP_FINAL
2896 UCD marker: `<final>'. Denotes a final presentation form (Arabic).
2898 -- Constant: int UC_DECOMP_ISOLATED
2899 UCD marker: `<isolated>'. Denotes an isolated presentation form
2902 -- Constant: int UC_DECOMP_CIRCLE
2903 UCD marker: `<circle>'. Denotes an encircled form.
2905 -- Constant: int UC_DECOMP_SUPER
2906 UCD marker: `<super>'. Denotes a superscript form.
2908 -- Constant: int UC_DECOMP_SUB
2909 UCD marker: `<sub>'. Denotes a subscript form.
2911 -- Constant: int UC_DECOMP_VERTICAL
2912 UCD marker: `<vertical>'. Denotes a vertical layout presentation
2915 -- Constant: int UC_DECOMP_WIDE
2916 UCD marker: `<wide>'. Denotes a wide (or zenkaku) compatibility
2919 -- Constant: int UC_DECOMP_NARROW
2920 UCD marker: `<narrow>'. Denotes a narrow (or hankaku)
2921 compatibility character.
2923 -- Constant: int UC_DECOMP_SMALL
2924 UCD marker: `<small>'. Denotes a small variant form (CNS
2927 -- Constant: int UC_DECOMP_SQUARE
2928 UCD marker: `<square>'. Denotes a CJK squared font variant.
2930 -- Constant: int UC_DECOMP_FRACTION
2931 UCD marker: `<fraction>'. Denotes a vulgar fraction form.
2933 -- Constant: int UC_DECOMP_COMPAT
2934 UCD marker: `<compat>'. Denotes an otherwise unspecified
2935 compatibility character.
2937 The following constant denotes the maximum size of decomposition of
2938 a single Unicode character.
2940 -- Macro: unsigned int UC_DECOMPOSITION_MAX_LENGTH
2941 This macro expands to a constant that is the required size of
2942 buffer passed to the `uc_decomposition' and
2943 `uc_canonical_decomposition' functions.
2945 The following functions decompose a Unicode character.
2947 -- Function: int uc_decomposition (ucs4_t UC, int *DECOMP_TAG, ucs4_t
2949 Returns the character decomposition mapping of the Unicode
2950 character UC. DECOMPOSITION must point to an array of at least
2951 `UC_DECOMPOSITION_MAX_LENGTH' `ucs_t' elements.
2953 When a decomposition exists, `DECOMPOSITION[0..N-1]' and
2954 `*DECOMP_TAG' are filled and N is returned. Otherwise -1 is
2957 -- Function: int uc_canonical_decomposition (ucs4_t UC, ucs4_t
2959 Returns the canonical character decomposition mapping of the
2960 Unicode character UC. DECOMPOSITION must point to an array of at
2961 least `UC_DECOMPOSITION_MAX_LENGTH' `ucs_t' elements.
2963 When a decomposition exists, `DECOMPOSITION[0..N-1]' is filled and
2964 N is returned. Otherwise -1 is returned.
2967 File: libunistring.info, Node: Composition of characters, Next: Normalization of strings, Prev: Decomposition of characters, Up: uninorm.h
2969 12.2 Composition of Unicode characters
2970 ======================================
2972 The following function composes a Unicode character from two Unicode
2975 -- Function: ucs4_t uc_composition (ucs4_t UC1, ucs4_t UC2)
2976 Attempts to combine the Unicode characters UC1, UC2. UC1 is known
2977 to have canonical combining class 0.
2979 Returns the combination of UC1 and UC2, if it exists. Returns 0
2982 Not all decompositions can be recombined using this function. See
2983 the Unicode file `CompositionExclusions.txt' for details.
2986 File: libunistring.info, Node: Normalization of strings, Next: Normalizing comparisons, Prev: Composition of characters, Up: uninorm.h
2988 12.3 Normalization of strings
2989 =============================
2991 The Unicode standard defines four normalization forms for Unicode
2992 strings. The following type is used to denote a normalization form.
2995 An object of type `uninorm_t' denotes a Unicode normalization form.
2996 This is a scalar type; its values can be compared with `=='.
2998 The following constants denote the four normalization forms.
3000 -- Macro: uninorm_t UNINORM_NFD
3001 Denotes Normalization form D: canonical decomposition.
3003 -- Macro: uninorm_t UNINORM_NFC
3004 Normalization form C: canonical decomposition, then canonical
3007 -- Macro: uninorm_t UNINORM_NFKD
3008 Normalization form KD: compatibility decomposition.
3010 -- Macro: uninorm_t UNINORM_NFKC
3011 Normalization form KC: compatibility decomposition, then canonical
3014 The following functions operate on `uninorm_t' objects.
3016 -- Function: bool uninorm_is_compat_decomposing (uninorm_t NF)
3017 Tests whether the normalization form NF does compatibility
3020 -- Function: bool uninorm_is_composing (uninorm_t NF)
3021 Tests whether the normalization form NF includes canonical
3024 -- Function: uninorm_t uninorm_decomposing_form (uninorm_t NF)
3025 Returns the decomposing variant of the normalization form NF.
3026 This maps NFC,NFD -> NFD and NFKC,NFKD -> NFKD.
3028 The following functions apply a Unicode normalization form to a
3031 -- Function: uint8_t * u8_normalize (uninorm_t NF, const uint8_t *S,
3032 size_t N, uint8_t *RESULTBUF, size_t *LENGTHP)
3033 -- Function: uint16_t * u16_normalize (uninorm_t NF, const uint16_t
3034 *S, size_t N, uint16_t *RESULTBUF, size_t *LENGTHP)
3035 -- Function: uint32_t * u32_normalize (uninorm_t NF, const uint32_t
3036 *S, size_t N, uint32_t *RESULTBUF, size_t *LENGTHP)
3037 Returns the specified normalization form of a string.
3040 File: libunistring.info, Node: Normalizing comparisons, Next: Normalization of streams, Prev: Normalization of strings, Up: uninorm.h
3042 12.4 Normalizing comparisons
3043 ============================
3045 The following functions compare Unicode string, ignoring differences
3048 -- Function: int u8_normcmp (const uint8_t *S1, size_t N1, const
3049 uint8_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
3050 -- Function: int u16_normcmp (const uint16_t *S1, size_t N1, const
3051 uint16_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
3052 -- Function: int u32_normcmp (const uint32_t *S1, size_t N1, const
3053 uint32_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
3054 Compares S1 and S2, ignoring differences in normalization.
3056 NF must be either `UNINORM_NFD' or `UNINORM_NFKD'.
3058 If successful, sets `*RESULTP' to -1 if S1 < S2, 0 if S1 = S2, 1
3059 if S1 > S2, and returns 0. Upon failure, returns -1 with `errno'
3062 -- Function: char * u8_normxfrm (const uint8_t *S, size_t N, uninorm_t
3063 NF, char *RESULTBUF, size_t *LENGTHP)
3064 -- Function: char * u16_normxfrm (const uint16_t *S, size_t N,
3065 uninorm_t NF, char *RESULTBUF, size_t *LENGTHP)
3066 -- Function: char * u32_normxfrm (const uint32_t *S, size_t N,
3067 uninorm_t NF, char *RESULTBUF, size_t *LENGTHP)
3068 Converts the string S of length N to a NUL-terminated byte
3069 sequence, in such a way that comparing `u8_normxfrm (S1)' and
3070 `u8_normxfrm (S2)' with the `u8_cmp2' function is equivalent to
3071 comparing S1 and S2 with the `u8_normcoll' function.
3073 NF must be either `UNINORM_NFC' or `UNINORM_NFKC'.
3075 -- Function: int u8_normcoll (const uint8_t *S1, size_t N1, const
3076 uint8_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
3077 -- Function: int u16_normcoll (const uint16_t *S1, size_t N1, const
3078 uint16_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
3079 -- Function: int u32_normcoll (const uint32_t *S1, size_t N1, const
3080 uint32_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
3081 Compares S1 and S2, ignoring differences in normalization, using
3082 the collation rules of the current locale.
3084 NF must be either `UNINORM_NFC' or `UNINORM_NFKC'.
3086 If successful, sets `*RESULTP' to -1 if S1 < S2, 0 if S1 = S2, 1
3087 if S1 > S2, and returns 0. Upon failure, returns -1 with `errno'
3091 File: libunistring.info, Node: Normalization of streams, Prev: Normalizing comparisons, Up: uninorm.h
3093 12.5 Normalization of streams of Unicode characters
3094 ===================================================
3096 A "stream of Unicode characters" is essentially a function that
3097 accepts an `ucs4_t' argument repeatedly, optionally combined with a
3098 function that "flushes" the stream.
3100 -- Type: struct uninorm_filter
3101 This is the data type of a stream of Unicode characters that
3102 normalizes its input according to a given normalization form and
3103 passes the normalized character sequence to the encapsulated
3104 stream of Unicode characters.
3106 -- Function: struct uninorm_filter * uninorm_filter_create (uninorm_t
3107 NF, int (*STREAM_FUNC) (void *STREAM_DATA, ucs4_t UC), void
3109 Creates and returns a normalization filter for Unicode characters.
3111 The pair (STREAM_FUNC, STREAM_DATA) is the encapsulated stream.
3112 `STREAM_FUNC (STREAM_DATA, UC)' receives the Unicode character UC
3113 and returns 0 if successful, or -1 with `errno' set upon failure.
3115 Returns the new filter, or NULL with `errno' set upon failure.
3117 -- Function: int uninorm_filter_write (struct uninorm_filter *FILTER,
3119 Stuffs a Unicode character into a normalizing filter. Returns 0
3120 if successful, or -1 with `errno' set upon failure.
3122 -- Function: int uninorm_filter_flush (struct uninorm_filter *FILTER)
3123 Brings data buffered in the filter to its destination, the
3124 encapsulated stream.
3126 Returns 0 if successful, or -1 with `errno' set upon failure.
3128 Note! If after calling this function, additional characters are
3129 written into the filter, the resulting character sequence in the
3130 encapsulated stream will not necessarily be normalized.
3132 -- Function: int uninorm_filter_free (struct uninorm_filter *FILTER)
3133 Brings data buffered in the filter to its destination, the
3134 encapsulated stream, then closes and frees the filter.
3136 Returns 0 if successful, or -1 with `errno' set upon failure.
3139 File: libunistring.info, Node: unicase.h, Next: uniregex.h, Prev: uninorm.h, Up: Top
3141 13 Case mappings `<unicase.h>'
3142 ******************************
3144 This include file defines functions for case mapping for Unicode
3145 strings and case insensitive comparison of Unicode strings and C
3148 These string functions fix the problems that were mentioned in *note
3149 char * strings::, namely, they handle the Croatian LETTER DZ WITH
3150 CARON, the German LATIN SMALL LETTER SHARP S, the Greek sigma and the
3151 Lithuanian i correctly.
3155 * Case mappings of characters::
3156 * Case mappings of strings::
3157 * Case mappings of substrings::
3158 * Case insensitive comparison::
3162 File: libunistring.info, Node: Case mappings of characters, Next: Case mappings of strings, Up: unicase.h
3164 13.1 Case mappings of characters
3165 ================================
3167 The following functions implement case mappings on Unicode
3168 characters -- for those cases only where the result of the mapping is a
3169 again a single Unicode character.
3171 These mappings are locale and context independent.
3173 *WARNING!* These functions are not sufficient for languages such as
3174 German, Greek and Lithuanian. Better use the functions below that
3175 treat an entire string at once and are language aware.
3177 -- Function: ucs4_t uc_toupper (ucs4_t UC)
3178 Returns the uppercase mapping of the Unicode character UC.
3180 -- Function: ucs4_t uc_tolower (ucs4_t UC)
3181 Returns the lowercase mapping of the Unicode character UC.
3183 -- Function: ucs4_t uc_totitle (ucs4_t UC)
3184 Returns the titlecase mapping of the Unicode character UC.
3186 The titlecase mapping of a character is to be used when the
3187 character should look like upper case and the following characters
3190 For most characters, this is the same as the uppercase mapping.
3191 There are only few characters where the title case variant and the
3192 uuper case variant are different. These characters occur in the
3193 Latin writing of the Croatian, Bosnian, and Serbian languages.
3195 Lower case Title case Upper case
3196 ------------------------------------------------------------------
3197 LATIN SMALL LETTER LJ LATIN CAPITAL LETTER LATIN CAPITAL LETTER
3198 L WITH SMALL LETTER J LJ
3199 LATIN SMALL LETTER NJ LATIN CAPITAL LETTER LATIN CAPITAL LETTER
3200 N WITH SMALL LETTER J NJ
3201 LATIN SMALL LETTER DZ LATIN CAPITAL LETTER LATIN CAPITAL LETTER
3202 D WITH SMALL LETTER Z DZ
3203 LATIN SMALL LETTER LATIN CAPITAL LETTER LATIN CAPITAL LETTER
3204 DZ WITH CARON D WITH SMALL LETTER DZ WITH CARON
3208 File: libunistring.info, Node: Case mappings of strings, Next: Case mappings of substrings, Prev: Case mappings of characters, Up: unicase.h
3210 13.2 Case mappings of strings
3211 =============================
3213 Case mapping should always be performed on entire strings, not on
3214 individual characters. The functions in this sections do so.
3216 These functions allow to apply a normalization after the case
3217 mapping. The reason is that if you want to treat `ä' and `Ä' the
3218 same, you most often also want to treat the composed and decomposed
3219 forms of such a character, U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS
3220 and U+0041 LATIN CAPITAL LETTER A U+0308 COMBINING DIAERESIS the same.
3221 The NF argument designates the normalization.
3223 These functions are locale dependent. The ISO639_LANGUAGE argument
3224 identifies the language (e.g. `"tr"' for Turkish). NULL means to use
3225 locale independent case mappings.
3227 -- Function: const char * uc_locale_language ()
3228 Returns the ISO 639 language code of the current locale. Returns
3229 `""' if it is unknown, or in the "C" locale.
3231 -- Function: uint8_t * u8_toupper (const uint8_t *S, size_t N, const
3232 char *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF,
3234 -- Function: uint16_t * u16_toupper (const uint16_t *S, size_t N,
3235 const char *ISO639_LANGUAGE, uninorm_t NF, uint16_t
3236 *RESULTBUF, size_t *LENGTHP)
3237 -- Function: uint32_t * u32_toupper (const uint32_t *S, size_t N,
3238 const char *ISO639_LANGUAGE, uninorm_t NF, uint32_t
3239 *RESULTBUF, size_t *LENGTHP)
3240 Returns the uppercase mapping of a string.
3242 The NF argument identifies the normalization form to apply after
3243 the case-mapping. It can also be NULL, for no normalization.
3245 -- Function: uint8_t * u8_tolower (const uint8_t *S, size_t N, const
3246 char *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF,
3248 -- Function: uint16_t * u16_tolower (const uint16_t *S, size_t N,
3249 const char *ISO639_LANGUAGE, uninorm_t NF, uint16_t
3250 *RESULTBUF, size_t *LENGTHP)
3251 -- Function: uint32_t * u32_tolower (const uint32_t *S, size_t N,
3252 const char *ISO639_LANGUAGE, uninorm_t NF, uint32_t
3253 *RESULTBUF, size_t *LENGTHP)
3254 Returns the lowercase mapping of a string.
3256 The NF argument identifies the normalization form to apply after
3257 the case-mapping. It can also be NULL, for no normalization.
3259 -- Function: uint8_t * u8_totitle (const uint8_t *S, size_t N, const
3260 char *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF,
3262 -- Function: uint16_t * u16_totitle (const uint16_t *S, size_t N,
3263 const char *ISO639_LANGUAGE, uninorm_t NF, uint16_t
3264 *RESULTBUF, size_t *LENGTHP)
3265 -- Function: uint32_t * u32_totitle (const uint32_t *S, size_t N,
3266 const char *ISO639_LANGUAGE, uninorm_t NF, uint32_t
3267 *RESULTBUF, size_t *LENGTHP)
3268 Returns the titlecase mapping of a string.
3270 Mapping to title case means that, in each word, the first cased
3271 character is being mapped to title case and the remaining
3272 characters of the word are being mapped to lower case.
3274 The NF argument identifies the normalization form to apply after
3275 the case-mapping. It can also be NULL, for no normalization.
3278 File: libunistring.info, Node: Case mappings of substrings, Next: Case insensitive comparison, Prev: Case mappings of strings, Up: unicase.h
3280 13.3 Case mappings of substrings
3281 ================================
3283 Case mapping of a substring cannot simply be performed by extracting
3284 the substring and then applying the case mapping function to it. This
3285 does not work because case mapping requires some information about the
3286 surrounding characters. The following functions allow to apply case
3287 mappings to substrings of a given string, while taking into account the
3288 characters that precede it (the "prefix") and the characters that
3289 follow it (the "suffix").
3291 -- Type: casing_prefix_context_t
3292 This data type denotes the case-mapping context that is given by a
3293 prefix string. It is an immediate type that can be copied by
3294 simple assignment, without involving memory allocation. It is not
3297 -- Constant: casing_prefix_context_t unicase_empty_prefix_context
3298 This constant is the case-mapping context that corresponds to an
3299 empty prefix string.
3301 The following functions return `casing_prefix_context_t' objects:
3303 -- Function: casing_prefix_context_t u8_casing_prefix_context (const
3304 uint8_t *S, size_t N)
3305 -- Function: casing_prefix_context_t u16_casing_prefix_context (const
3306 uint16_t *S, size_t N)
3307 -- Function: casing_prefix_context_t u32_casing_prefix_context (const
3308 uint32_t *S, size_t N)
3309 Returns the case-mapping context of a given prefix string.
3311 -- Function: casing_prefix_context_t u8_casing_prefixes_context (const
3312 uint8_t *S, size_t N, casing_prefix_context_t A_CONTEXT)
3313 -- Function: casing_prefix_context_t u16_casing_prefixes_context
3314 (const uint16_t *S, size_t N, casing_prefix_context_t
3316 -- Function: casing_prefix_context_t u32_casing_prefixes_context
3317 (const uint32_t *S, size_t N, casing_prefix_context_t
3319 Returns the case-mapping context of the prefix concat(A, S), given
3320 the case-mapping context of the prefix A.
3322 -- Type: casing_suffix_context_t
3323 This data type denotes the case-mapping context that is given by a
3324 suffix string. It is an immediate type that can be copied by
3325 simple assignment, without involving memory allocation. It is not
3328 -- Constant: casing_suffix_context_t unicase_empty_suffix_context
3329 This constant is the case-mapping context that corresponds to an
3330 empty suffix string.
3332 The following functions return `casing_suffix_context_t' objects:
3334 -- Function: casing_suffix_context_t u8_casing_suffix_context (const
3335 uint8_t *S, size_t N)
3336 -- Function: casing_suffix_context_t u16_casing_suffix_context (const
3337 uint16_t *S, size_t N)
3338 -- Function: casing_suffix_context_t u32_casing_suffix_context (const
3339 uint32_t *S, size_t N)
3340 Returns the case-mapping context of a given suffix string.
3342 -- Function: casing_suffix_context_t u8_casing_suffixes_context (const
3343 uint8_t *S, size_t N, casing_suffix_context_t A_CONTEXT)
3344 -- Function: casing_suffix_context_t u16_casing_suffixes_context
3345 (const uint16_t *S, size_t N, casing_suffix_context_t
3347 -- Function: casing_suffix_context_t u32_casing_suffixes_context
3348 (const uint32_t *S, size_t N, casing_suffix_context_t
3350 Returns the case-mapping context of the suffix concat(S, A), given
3351 the case-mapping context of the suffix A.
3353 The following functions perform a case mapping, considering the
3354 prefix context and the suffix context.
3356 -- Function: uint8_t * u8_ct_toupper (const uint8_t *S, size_t N,
3357 casing_prefix_context_t PREFIX_CONTEXT,
3358 casing_suffix_context_t SUFFIX_CONTEXT, const char
3359 *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF, size_t
3361 -- Function: uint16_t * u16_ct_toupper (const uint16_t *S, size_t N,
3362 casing_prefix_context_t PREFIX_CONTEXT,
3363 casing_suffix_context_t SUFFIX_CONTEXT, const char
3364 *ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF, size_t
3366 -- Function: uint32_t * u32_ct_toupper (const uint32_t *S, size_t N,
3367 casing_prefix_context_t PREFIX_CONTEXT,
3368 casing_suffix_context_t SUFFIX_CONTEXT, const char
3369 *ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF, size_t
3371 Returns the uppercase mapping of a string that is surrounded by a
3372 prefix and a suffix.
3374 -- Function: uint8_t * u8_ct_tolower (const uint8_t *S, size_t N,
3375 casing_prefix_context_t PREFIX_CONTEXT,
3376 casing_suffix_context_t SUFFIX_CONTEXT, const char
3377 *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF, size_t
3379 -- Function: uint16_t * u16_ct_tolower (const uint16_t *S, size_t N,
3380 casing_prefix_context_t PREFIX_CONTEXT,
3381 casing_suffix_context_t SUFFIX_CONTEXT, const char
3382 *ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF, size_t
3384 -- Function: uint32_t * u32_ct_tolower (const uint32_t *S, size_t N,
3385 casing_prefix_context_t PREFIX_CONTEXT,
3386 casing_suffix_context_t SUFFIX_CONTEXT, const char
3387 *ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF, size_t
3389 Returns the lowercase mapping of a string that is surrounded by a
3390 prefix and a suffix.
3392 -- Function: uint8_t * u8_ct_totitle (const uint8_t *S, size_t N,
3393 casing_prefix_context_t PREFIX_CONTEXT,
3394 casing_suffix_context_t SUFFIX_CONTEXT, const char
3395 *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF, size_t
3397 -- Function: uint16_t * u16_ct_totitle (const uint16_t *S, size_t N,
3398 casing_prefix_context_t PREFIX_CONTEXT,
3399 casing_suffix_context_t SUFFIX_CONTEXT, const char
3400 *ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF, size_t
3402 -- Function: uint32_t * u32_ct_totitle (const uint32_t *S, size_t N,
3403 casing_prefix_context_t PREFIX_CONTEXT,
3404 casing_suffix_context_t SUFFIX_CONTEXT, const char
3405 *ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF, size_t
3407 Returns the titlecase mapping of a string that is surrounded by a
3408 prefix and a suffix.
3410 For example, to uppercase the UTF-8 substring between `s +
3411 start_index' and `s + end_index' of a string that extends from `s' to
3412 `s + u8_strlen (s)', you can use the statements
3414 size_t result_length;
3416 u8_ct_toupper (s + start_index, end_index - start_index,
3417 u8_casing_prefix_context (s, start_index),
3418 u8_casing_suffix_context (s + end_index,
3419 u8_strlen (s) - end_index),
3420 iso639_language, NULL, NULL, &result_length);
3423 File: libunistring.info, Node: Case insensitive comparison, Next: Case detection, Prev: Case mappings of substrings, Up: unicase.h
3425 13.4 Case insensitive comparison
3426 ================================
3428 The following functions implement comparison that ignores
3429 differences in case and normalization.
3431 -- Function: uint8_t * u8_casefold (const uint8_t *S, size_t N, const
3432 char *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF,
3434 -- Function: uint16_t * u16_casefold (const uint16_t *S, size_t N,
3435 const char *ISO639_LANGUAGE, uninorm_t NF, uint16_t
3436 *RESULTBUF, size_t *LENGTHP)
3437 -- Function: uint32_t * u32_casefold (const uint32_t *S, size_t N,
3438 const char *ISO639_LANGUAGE, uninorm_t NF, uint32_t
3439 *RESULTBUF, size_t *LENGTHP)
3440 Returns the case folded string.
3442 Comparing `u8_casefold (S1)' and `u8_casefold (S2)' with the
3443 `u8_cmp2' function is equivalent to comparing S1 and S2 with
3446 The NF argument identifies the normalization form to apply after
3447 the case-mapping. It can also be NULL, for no normalization.
3449 -- Function: uint8_t * u8_ct_casefold (const uint8_t *S, size_t N,
3450 casing_prefix_context_t PREFIX_CONTEXT,
3451 casing_suffix_context_t SUFFIX_CONTEXT, const char
3452 *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF, size_t
3454 -- Function: uint16_t * u16_ct_casefold (const uint16_t *S, size_t N,
3455 casing_prefix_context_t PREFIX_CONTEXT,
3456 casing_suffix_context_t SUFFIX_CONTEXT, const char
3457 *ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF, size_t
3459 -- Function: uint32_t * u32_ct_casefold (const uint32_t *S, size_t N,
3460 casing_prefix_context_t PREFIX_CONTEXT,
3461 casing_suffix_context_t SUFFIX_CONTEXT, const char
3462 *ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF, size_t
3464 Returns the case folded string. The case folding takes into
3465 account the case mapping contexts of the prefix and suffix strings.
3467 -- Function: int u8_casecmp (const uint8_t *S1, size_t N1, const
3468 uint8_t *S2, size_t N2, const char *ISO639_LANGUAGE,
3469 uninorm_t NF, int *RESULTP)
3470 -- Function: int u16_casecmp (const uint16_t *S1, size_t N1, const
3471 uint16_t *S2, size_t N2, const char *ISO639_LANGUAGE,
3472 uninorm_t NF, int *RESULTP)
3473 -- Function: int u32_casecmp (const uint32_t *S1, size_t N1, const
3474 uint32_t *S2, size_t N2, const char *ISO639_LANGUAGE,
3475 uninorm_t NF, int *RESULTP)
3476 -- Function: int ulc_casecmp (const char *S1, size_t N1, const char
3477 *S2, size_t N2, const char *ISO639_LANGUAGE, uninorm_t NF,
3479 Compares S1 and S2, ignoring differences in case and normalization.
3481 The NF argument identifies the normalization form to apply after
3482 the case-mapping. It can also be NULL, for no normalization.
3484 If successful, sets `*RESULTP' to -1 if S1 < S2, 0 if S1 = S2, 1
3485 if S1 > S2, and returns 0. Upon failure, returns -1 with `errno'
3488 The following functions additionally take into account the sorting
3489 rules of the current locale.
3491 -- Function: char * u8_casexfrm (const uint8_t *S, size_t N, const
3492 char *ISO639_LANGUAGE, uninorm_t NF, char *RESULTBUF, size_t
3494 -- Function: char * u16_casexfrm (const uint16_t *S, size_t N, const
3495 char *ISO639_LANGUAGE, uninorm_t NF, char *RESULTBUF, size_t
3497 -- Function: char * u32_casexfrm (const uint32_t *S, size_t N, const
3498 char *ISO639_LANGUAGE, uninorm_t NF, char *RESULTBUF, size_t
3500 -- Function: char * ulc_casexfrm (const char *S, size_t N, const char
3501 *ISO639_LANGUAGE, uninorm_t NF, char *RESULTBUF, size_t
3503 Converts the string S of length N to a NUL-terminated byte
3504 sequence, in such a way that comparing `u8_casexfrm (S1)' and
3505 `u8_casexfrm (S2)' with the gnulib function `memcmp2' is
3506 equivalent to comparing S1 and S2 with `u8_casecoll'.
3508 NF must be either `UNINORM_NFC', `UNINORM_NFKC', or NULL for no
3511 -- Function: int u8_casecoll (const uint8_t *S1, size_t N1, const
3512 uint8_t *S2, size_t N2, const char *ISO639_LANGUAGE,
3513 uninorm_t NF, int *RESULTP)
3514 -- Function: int u16_casecoll (const uint16_t *S1, size_t N1, const
3515 uint16_t *S2, size_t N2, const char *ISO639_LANGUAGE,
3516 uninorm_t NF, int *RESULTP)
3517 -- Function: int u32_casecoll (const uint32_t *S1, size_t N1, const
3518 uint32_t *S2, size_t N2, const char *ISO639_LANGUAGE,
3519 uninorm_t NF, int *RESULTP)
3520 -- Function: int ulc_casecoll (const char *S1, size_t N1, const char
3521 *S2, size_t N2, const char *ISO639_LANGUAGE, uninorm_t NF,
3523 Compares S1 and S2, ignoring differences in case and normalization,
3524 using the collation rules of the current locale.
3526 The NF argument identifies the normalization form to apply after
3527 the case-mapping. It must be either `UNINORM_NFC' or
3528 `UNINORM_NFKC'. It can also be NULL, for no normalization.
3530 If successful, sets `*RESULTP' to -1 if S1 < S2, 0 if S1 = S2, 1
3531 if S1 > S2, and returns 0. Upon failure, returns -1 with `errno'
3535 File: libunistring.info, Node: Case detection, Prev: Case insensitive comparison, Up: unicase.h
3540 The following functions determine whether a Unicode string is
3541 entirely in upper case. or entirely in lower case, or entirely in title
3542 case, or already case-folded.
3544 -- Function: int u8_is_uppercase (const uint8_t *S, size_t N, const
3545 char *ISO639_LANGUAGE, bool *RESULTP)
3546 -- Function: int u16_is_uppercase (const uint16_t *S, size_t N, const
3547 char *ISO639_LANGUAGE, bool *RESULTP)
3548 -- Function: int u32_is_uppercase (const uint32_t *S, size_t N, const
3549 char *ISO639_LANGUAGE, bool *RESULTP)
3550 Sets `*RESULTP' to true if mapping NFD(S) to upper case is a
3551 no-op, or to false otherwise, and returns 0. Upon failure,
3552 returns -1 with `errno' set.
3554 -- Function: int u8_is_lowercase (const uint8_t *S, size_t N, const
3555 char *ISO639_LANGUAGE, bool *RESULTP)
3556 -- Function: int u16_is_lowercase (const uint16_t *S, size_t N, const
3557 char *ISO639_LANGUAGE, bool *RESULTP)
3558 -- Function: int u32_is_lowercase (const uint32_t *S, size_t N, const
3559 char *ISO639_LANGUAGE, bool *RESULTP)
3560 Sets `*RESULTP' to true if mapping NFD(S) to lower case is a
3561 no-op, or to false otherwise, and returns 0. Upon failure,
3562 returns -1 with `errno' set.
3564 -- Function: int u8_is_titlecase (const uint8_t *S, size_t N, const
3565 char *ISO639_LANGUAGE, bool *RESULTP)
3566 -- Function: int u16_is_titlecase (const uint16_t *S, size_t N, const
3567 char *ISO639_LANGUAGE, bool *RESULTP)
3568 -- Function: int u32_is_titlecase (const uint32_t *S, size_t N, const
3569 char *ISO639_LANGUAGE, bool *RESULTP)
3570 Sets `*RESULTP' to true if mapping NFD(S) to title case is a
3571 no-op, or to false otherwise, and returns 0. Upon failure,
3572 returns -1 with `errno' set.
3574 -- Function: int u8_is_casefolded (const uint8_t *S, size_t N, const
3575 char *ISO639_LANGUAGE, bool *RESULTP)
3576 -- Function: int u16_is_casefolded (const uint16_t *S, size_t N, const
3577 char *ISO639_LANGUAGE, bool *RESULTP)
3578 -- Function: int u32_is_casefolded (const uint32_t *S, size_t N, const
3579 char *ISO639_LANGUAGE, bool *RESULTP)
3580 Sets `*RESULTP' to true if applying case folding to NFD(S) is a
3581 no-op, or to false otherwise, and returns 0. Upon failure,
3582 returns -1 with `errno' set.
3584 The following functions determine whether case mappings have any
3585 effect on a Unicode string.
3587 -- Function: int u8_is_cased (const uint8_t *S, size_t N, const char
3588 *ISO639_LANGUAGE, bool *RESULTP)
3589 -- Function: int u16_is_cased (const uint16_t *S, size_t N, const char
3590 *ISO639_LANGUAGE, bool *RESULTP)
3591 -- Function: int u32_is_cased (const uint32_t *S, size_t N, const char
3592 *ISO639_LANGUAGE, bool *RESULTP)
3593 Sets `*RESULTP' to true if case matters for S, that is, if mapping
3594 NFD(S) to either upper case or lower case or title case is not a
3595 no-op. Set `*RESULTP' to false if NFD(S) maps to itself under the
3596 upper case mapping, under the lower case mapping, and under the
3597 title case mapping; in other words, when NFD(S) consists entirely
3598 of caseless characters. Upon failure, returns -1 with `errno' set.
3601 File: libunistring.info, Node: uniregex.h, Next: Using the library, Prev: unicase.h, Up: Top
3603 14 Regular expressions `<uniregex.h>'
3604 *************************************
3606 This include file is not yet implemented.
3609 File: libunistring.info, Node: Using the library, Next: More functionality, Prev: uniregex.h, Up: Top
3611 15 Using the library
3612 ********************
3614 This chapter explains some practical considerations, regarding the
3615 installation and compiler options that are needed in order to use this
3621 * Compiler options::
3624 * Reporting problems::
3627 File: libunistring.info, Node: Installation, Next: Compiler options, Up: Using the library
3632 Before you can use the library, it must be installed. First, you
3633 have to make sure all dependencies are installed. They are listed in
3634 the file `DEPENDENCIES'.
3636 Then you can proceed to build and install the library, as described
3637 in the file `INSTALL'. For installation on Windows systems, please
3638 refer to the file `README.woe32'.
3641 File: libunistring.info, Node: Compiler options, Next: Include files, Prev: Installation, Up: Using the library
3643 15.2 Compiler options
3644 =====================
3646 Let's denote as `LIBUNISTRING_PREFIX' the value of the `--prefix'
3647 option that you passed to `configure' while installing this package.
3648 If you didn't pass any `--prefix' option, then the package is installed
3651 Let's denote as `LIBUNISTRING_INCLUDEDIR' the directory where the
3652 include files were installed. This is usually the same as
3653 `${LIBUNISTRING_PREFIX}/include'. Except that if you passed an
3654 `--includedir' option to `configure', it is the value of that option.
3656 Let's further denote as `LIBUNISTRING_LIBDIR' the directory where
3657 the library itself was installed. This is the value that you passed
3658 with the `--libdir' option to `configure', or otherwise the same as
3659 `${LIBUNISTRING_PREFIX}/lib'. Recall that when building in 64-bit mode
3660 on a 64-bit GNU/Linux system that supports executables in either 64-bit
3661 mode or 32-bit mode, you should have used the option
3662 `--libdir=${LIBUNISTRING_PREFIX}/lib64'.
3664 So that the compiler finds the include files, you have to pass it the
3665 option `-I${LIBUNISTRING_INCLUDEDIR}'.
3667 So that the compiler finds the library during its linking pass, you
3668 have to pass it the options `-L${LIBUNISTRING_LIBDIR} -lunistring'. On
3669 some systems, in some configurations, you also have to pass options
3670 needed for linking with `libiconv'. The autoconf macro
3671 `gl_LIBUNISTRING' (see *note Autoconf macro::) deals with this
3675 File: libunistring.info, Node: Include files, Next: Autoconf macro, Prev: Compiler options, Up: Using the library
3680 Most of the include files have been presented in the introduction,
3681 see *note Introduction::, and subsequent detailed chapters.
3683 Another include file is `<unistring/version.h>'. It contains the
3684 version number of the libunistring library.
3686 -- Macro: int _LIBUNISTRING_VERSION
3687 This constant contains the version of libunistring that is being
3688 used at compile time. It encodes the major and minor parts of the
3689 version number only. These parts are encoded in the form
3690 `(major<<8) + minor'.
3692 -- Constant: int _libunistring_version
3693 This constant contains the version of libunistring that is being
3694 used at run time. It encodes the major and minor parts of the
3695 version number only. These parts are encoded in the form
3696 `(major<<8) + minor'.
3698 It is possible that `_libunistring_version' is greater than
3699 `_LIBUNISTRING_VERSION'. This can happen when you use `libunistring'
3700 as a shared library, and a newer, binary backward-compatible version
3701 has been installed after your program that uses `libunistring' was
3705 File: libunistring.info, Node: Autoconf macro, Next: Reporting problems, Prev: Include files, Up: Using the library
3710 GNU Gnulib provides an autoconf macro that tests for the availability
3711 of `libunistring'. It is contained in the Gnulib module
3713 `http://www.gnu.org/software/gnulib/MODULES.html#module=libunistring'.
3715 The macro is called `gl_LIBUNISTRING'. It searches for an installed
3716 libunistring. If found, it sets and AC_SUBSTs `HAVE_LIBUNISTRING=yes'
3717 and the `LIBUNISTRING' and `LTLIBUNISTRING' variables and augments the
3718 `CPPFLAGS' variable, and defines the C macro `HAVE_LIBUNISTRING' to 1.
3719 Otherwise, it sets and AC_SUBSTs `HAVE_LIBUNISTRING=no' and
3720 `LIBUNISTRING' and `LTLIBUNISTRING' to empty.
3722 The complexities that `gl_LIBUNISTRING' deals with are the following:
3724 * On some operating systems, in some configurations, libunistring
3725 depends on `libiconv', and the options for linking with libiconv
3726 must be mentioned explicitly on the link command line.
3728 * GNU `libunistring', if installed, is not necessarily already in the
3729 search path (`CPPFLAGS' for the include file search path,
3730 `LDFLAGS' for the library search path).
3732 * GNU `libunistring', if installed, is not necessarily already in the
3733 run time library search path. To avoid the need for setting an
3734 environment variable like `LD_LIBRARY_PATH', the macro adds the
3735 appropriate run time search path options to the `LIBUNISTRING'
3736 variable. This works on most systems.
3739 File: libunistring.info, Node: Reporting problems, Prev: Autoconf macro, Up: Using the library
3741 15.5 Reporting problems
3742 =======================
3744 If you encounter any problem, please don't hesitate to send a
3745 detailed bug report to the `bug-libunistring@gnu.org' mailing list.
3746 You can alternatively also use the bug tracker at the project page
3747 `https://savannah.gnu.org/projects/libunistring'.
3749 Please always include the version number of this library, and a short
3750 description of your operating system and compilation environment with
3751 corresponding version numbers.
3753 For problems that appear while building and installing
3754 `libunistring', for which you don't find the remedy in the `INSTALL'
3755 file, please include a description of the options that you passed to
3756 the `configure' script.
3759 File: libunistring.info, Node: More functionality, Next: Licenses, Prev: Using the library, Up: Top
3761 16 More advanced functionality
3762 ******************************
3764 For bidirectional reordering of strings, we recommend the GNU
3765 FriBidi library: `http://www.fribidi.org/'.
3767 For the rendering of Unicode strings outside of the context of a
3768 given toolkit (KDE/Qt or GNOME/Gtk), we recommend the Pango library:
3769 `http://www.pango.org/'.
3772 File: libunistring.info, Node: Licenses, Next: Index, Prev: More functionality, Up: Top
3777 The files of this package are covered by the licenses indicated in
3778 each particular file or directory. Here is a summary:
3780 * The `libunistring' library is covered by the GNU Lesser General
3781 Public License (LGPL). A copy of the license is included in *note
3784 * This manual is free documentation. It is dually licensed under the
3785 GNU FDL and the GNU GPL. This means that you can redistribute this
3786 manual under either of these two licenses, at your choice.
3787 This manual is covered by the GNU FDL. Permission is granted to
3788 copy, distribute and/or modify this document under the terms of the
3789 GNU Free Documentation License (FDL), either version 1.2 of the
3790 License, or (at your option) any later version published by the
3791 Free Software Foundation (FSF); with no Invariant Sections, with no
3792 Front-Cover Text, and with no Back-Cover Texts. A copy of the
3793 license is included in *note GNU FDL::.
3794 This manual is covered by the GNU GPL. You can redistribute it
3795 and/or modify it under the terms of the GNU General Public License
3796 (GPL), either version 3 of the License, or (at your option) any
3797 later version published by the Free Software Foundation (FSF). A
3798 copy of the license is included in *note GNU GPL::.
3802 * GNU GPL:: GNU General Public License
3803 * GNU LGPL:: GNU Lesser General Public License
3804 * GNU FDL:: GNU Free Documentation License
3807 File: libunistring.info, Node: GNU GPL, Next: GNU LGPL, Up: Licenses
3809 A.1 GNU GENERAL PUBLIC LICENSE
3810 ==============================
3812 Version 3, 29 June 2007
3814 Copyright (C) 2007 Free Software Foundation, Inc. `http://fsf.org/'
3816 Everyone is permitted to copy and distribute verbatim copies of this
3817 license document, but changing it is not allowed.
3822 The GNU General Public License is a free, copyleft license for
3823 software and other kinds of works.
3825 The licenses for most software and other practical works are designed
3826 to take away your freedom to share and change the works. By contrast,
3827 the GNU General Public License is intended to guarantee your freedom to
3828 share and change all versions of a program--to make sure it remains
3829 free software for all its users. We, the Free Software Foundation, use
3830 the GNU General Public License for most of our software; it applies
3831 also to any other work released this way by its authors. You can apply
3832 it to your programs, too.
3834 When we speak of free software, we are referring to freedom, not
3835 price. Our General Public Licenses are designed to make sure that you
3836 have the freedom to distribute copies of free software (and charge for
3837 them if you wish), that you receive source code or can get it if you
3838 want it, that you can change the software or use pieces of it in new
3839 free programs, and that you know you can do these things.
3841 To protect your rights, we need to prevent others from denying you
3842 these rights or asking you to surrender the rights. Therefore, you
3843 have certain responsibilities if you distribute copies of the software,
3844 or if you modify it: responsibilities to respect the freedom of others.
3846 For example, if you distribute copies of such a program, whether
3847 gratis or for a fee, you must pass on to the recipients the same
3848 freedoms that you received. You must make sure that they, too, receive
3849 or can get the source code. And you must show them these terms so they
3852 Developers that use the GNU GPL protect your rights with two steps:
3853 (1) assert copyright on the software, and (2) offer you this License
3854 giving you legal permission to copy, distribute and/or modify it.
3856 For the developers' and authors' protection, the GPL clearly explains
3857 that there is no warranty for this free software. For both users' and
3858 authors' sake, the GPL requires that modified versions be marked as
3859 changed, so that their problems will not be attributed erroneously to
3860 authors of previous versions.
3862 Some devices are designed to deny users access to install or run
3863 modified versions of the software inside them, although the
3864 manufacturer can do so. This is fundamentally incompatible with the
3865 aim of protecting users' freedom to change the software. The
3866 systematic pattern of such abuse occurs in the area of products for
3867 individuals to use, which is precisely where it is most unacceptable.
3868 Therefore, we have designed this version of the GPL to prohibit the
3869 practice for those products. If such problems arise substantially in
3870 other domains, we stand ready to extend this provision to those domains
3871 in future versions of the GPL, as needed to protect the freedom of
3874 Finally, every program is threatened constantly by software patents.
3875 States should not allow patents to restrict development and use of
3876 software on general-purpose computers, but in those that do, we wish to
3877 avoid the special danger that patents applied to a free program could
3878 make it effectively proprietary. To prevent this, the GPL assures that
3879 patents cannot be used to render the program non-free.
3881 The precise terms and conditions for copying, distribution and
3882 modification follow.
3884 TERMS AND CONDITIONS
3885 ====================
3889 "This License" refers to version 3 of the GNU General Public
3892 "Copyright" also means copyright-like laws that apply to other
3893 kinds of works, such as semiconductor masks.
3895 "The Program" refers to any copyrightable work licensed under this
3896 License. Each licensee is addressed as "you". "Licensees" and
3897 "recipients" may be individuals or organizations.
3899 To "modify" a work means to copy from or adapt all or part of the
3900 work in a fashion requiring copyright permission, other than the
3901 making of an exact copy. The resulting work is called a "modified
3902 version" of the earlier work or a work "based on" the earlier work.
3904 A "covered work" means either the unmodified Program or a work
3905 based on the Program.
3907 To "propagate" a work means to do anything with it that, without
3908 permission, would make you directly or secondarily liable for
3909 infringement under applicable copyright law, except executing it
3910 on a computer or modifying a private copy. Propagation includes
3911 copying, distribution (with or without modification), making
3912 available to the public, and in some countries other activities as
3915 To "convey" a work means any kind of propagation that enables other
3916 parties to make or receive copies. Mere interaction with a user
3917 through a computer network, with no transfer of a copy, is not
3920 An interactive user interface displays "Appropriate Legal Notices"
3921 to the extent that it includes a convenient and prominently visible
3922 feature that (1) displays an appropriate copyright notice, and (2)
3923 tells the user that there is no warranty for the work (except to
3924 the extent that warranties are provided), that licensees may
3925 convey the work under this License, and how to view a copy of this
3926 License. If the interface presents a list of user commands or
3927 options, such as a menu, a prominent item in the list meets this
3932 The "source code" for a work means the preferred form of the work
3933 for making modifications to it. "Object code" means any
3934 non-source form of a work.
3936 A "Standard Interface" means an interface that either is an
3937 official standard defined by a recognized standards body, or, in
3938 the case of interfaces specified for a particular programming
3939 language, one that is widely used among developers working in that
3942 The "System Libraries" of an executable work include anything,
3943 other than the work as a whole, that (a) is included in the normal
3944 form of packaging a Major Component, but which is not part of that
3945 Major Component, and (b) serves only to enable use of the work
3946 with that Major Component, or to implement a Standard Interface
3947 for which an implementation is available to the public in source
3948 code form. A "Major Component", in this context, means a major
3949 essential component (kernel, window system, and so on) of the
3950 specific operating system (if any) on which the executable work
3951 runs, or a compiler used to produce the work, or an object code
3952 interpreter used to run it.
3954 The "Corresponding Source" for a work in object code form means all
3955 the source code needed to generate, install, and (for an executable
3956 work) run the object code and to modify the work, including
3957 scripts to control those activities. However, it does not include
3958 the work's System Libraries, or general-purpose tools or generally
3959 available free programs which are used unmodified in performing
3960 those activities but which are not part of the work. For example,
3961 Corresponding Source includes interface definition files
3962 associated with source files for the work, and the source code for
3963 shared libraries and dynamically linked subprograms that the work
3964 is specifically designed to require, such as by intimate data
3965 communication or control flow between those subprograms and other
3968 The Corresponding Source need not include anything that users can
3969 regenerate automatically from other parts of the Corresponding
3972 The Corresponding Source for a work in source code form is that
3975 2. Basic Permissions.
3977 All rights granted under this License are granted for the term of
3978 copyright on the Program, and are irrevocable provided the stated
3979 conditions are met. This License explicitly affirms your unlimited
3980 permission to run the unmodified Program. The output from running
3981 a covered work is covered by this License only if the output,
3982 given its content, constitutes a covered work. This License
3983 acknowledges your rights of fair use or other equivalent, as
3984 provided by copyright law.
3986 You may make, run and propagate covered works that you do not
3987 convey, without conditions so long as your license otherwise
3988 remains in force. You may convey covered works to others for the
3989 sole purpose of having them make modifications exclusively for
3990 you, or provide you with facilities for running those works,
3991 provided that you comply with the terms of this License in
3992 conveying all material for which you do not control copyright.
3993 Those thus making or running the covered works for you must do so
3994 exclusively on your behalf, under your direction and control, on
3995 terms that prohibit them from making any copies of your
3996 copyrighted material outside their relationship with you.
3998 Conveying under any other circumstances is permitted solely under
3999 the conditions stated below. Sublicensing is not allowed; section
4000 10 makes it unnecessary.
4002 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
4004 No covered work shall be deemed part of an effective technological
4005 measure under any applicable law fulfilling obligations under
4006 article 11 of the WIPO copyright treaty adopted on 20 December
4007 1996, or similar laws prohibiting or restricting circumvention of
4010 When you convey a covered work, you waive any legal power to forbid
4011 circumvention of technological measures to the extent such
4012 circumvention is effected by exercising rights under this License
4013 with respect to the covered work, and you disclaim any intention
4014 to limit operation or modification of the work as a means of
4015 enforcing, against the work's users, your or third parties' legal
4016 rights to forbid circumvention of technological measures.
4018 4. Conveying Verbatim Copies.
4020 You may convey verbatim copies of the Program's source code as you
4021 receive it, in any medium, provided that you conspicuously and
4022 appropriately publish on each copy an appropriate copyright notice;
4023 keep intact all notices stating that this License and any
4024 non-permissive terms added in accord with section 7 apply to the
4025 code; keep intact all notices of the absence of any warranty; and
4026 give all recipients a copy of this License along with the Program.
4028 You may charge any price or no price for each copy that you convey,
4029 and you may offer support or warranty protection for a fee.
4031 5. Conveying Modified Source Versions.
4033 You may convey a work based on the Program, or the modifications to
4034 produce it from the Program, in the form of source code under the
4035 terms of section 4, provided that you also meet all of these
4038 a. The work must carry prominent notices stating that you
4039 modified it, and giving a relevant date.
4041 b. The work must carry prominent notices stating that it is
4042 released under this License and any conditions added under
4043 section 7. This requirement modifies the requirement in
4044 section 4 to "keep intact all notices".
4046 c. You must license the entire work, as a whole, under this
4047 License to anyone who comes into possession of a copy. This
4048 License will therefore apply, along with any applicable
4049 section 7 additional terms, to the whole of the work, and all
4050 its parts, regardless of how they are packaged. This License
4051 gives no permission to license the work in any other way, but
4052 it does not invalidate such permission if you have separately
4055 d. If the work has interactive user interfaces, each must display
4056 Appropriate Legal Notices; however, if the Program has
4057 interactive interfaces that do not display Appropriate Legal
4058 Notices, your work need not make them do so.
4060 A compilation of a covered work with other separate and independent
4061 works, which are not by their nature extensions of the covered
4062 work, and which are not combined with it such as to form a larger
4063 program, in or on a volume of a storage or distribution medium, is
4064 called an "aggregate" if the compilation and its resulting
4065 copyright are not used to limit the access or legal rights of the
4066 compilation's users beyond what the individual works permit.
4067 Inclusion of a covered work in an aggregate does not cause this
4068 License to apply to the other parts of the aggregate.
4070 6. Conveying Non-Source Forms.
4072 You may convey a covered work in object code form under the terms
4073 of sections 4 and 5, provided that you also convey the
4074 machine-readable Corresponding Source under the terms of this
4075 License, in one of these ways:
4077 a. Convey the object code in, or embodied in, a physical product
4078 (including a physical distribution medium), accompanied by the
4079 Corresponding Source fixed on a durable physical medium
4080 customarily used for software interchange.
4082 b. Convey the object code in, or embodied in, a physical product
4083 (including a physical distribution medium), accompanied by a
4084 written offer, valid for at least three years and valid for
4085 as long as you offer spare parts or customer support for that
4086 product model, to give anyone who possesses the object code
4087 either (1) a copy of the Corresponding Source for all the
4088 software in the product that is covered by this License, on a
4089 durable physical medium customarily used for software
4090 interchange, for a price no more than your reasonable cost of
4091 physically performing this conveying of source, or (2) access
4092 to copy the Corresponding Source from a network server at no
4095 c. Convey individual copies of the object code with a copy of
4096 the written offer to provide the Corresponding Source. This
4097 alternative is allowed only occasionally and noncommercially,
4098 and only if you received the object code with such an offer,
4099 in accord with subsection 6b.
4101 d. Convey the object code by offering access from a designated
4102 place (gratis or for a charge), and offer equivalent access
4103 to the Corresponding Source in the same way through the same
4104 place at no further charge. You need not require recipients
4105 to copy the Corresponding Source along with the object code.
4106 If the place to copy the object code is a network server, the
4107 Corresponding Source may be on a different server (operated
4108 by you or a third party) that supports equivalent copying
4109 facilities, provided you maintain clear directions next to
4110 the object code saying where to find the Corresponding Source.
4111 Regardless of what server hosts the Corresponding Source, you
4112 remain obligated to ensure that it is available for as long
4113 as needed to satisfy these requirements.
4115 e. Convey the object code using peer-to-peer transmission,
4116 provided you inform other peers where the object code and
4117 Corresponding Source of the work are being offered to the
4118 general public at no charge under subsection 6d.
4121 A separable portion of the object code, whose source code is
4122 excluded from the Corresponding Source as a System Library, need
4123 not be included in conveying the object code work.
4125 A "User Product" is either (1) a "consumer product", which means
4126 any tangible personal property which is normally used for personal,
4127 family, or household purposes, or (2) anything designed or sold for
4128 incorporation into a dwelling. In determining whether a product
4129 is a consumer product, doubtful cases shall be resolved in favor of
4130 coverage. For a particular product received by a particular user,
4131 "normally used" refers to a typical or common use of that class of
4132 product, regardless of the status of the particular user or of the
4133 way in which the particular user actually uses, or expects or is
4134 expected to use, the product. A product is a consumer product
4135 regardless of whether the product has substantial commercial,
4136 industrial or non-consumer uses, unless such uses represent the
4137 only significant mode of use of the product.
4139 "Installation Information" for a User Product means any methods,
4140 procedures, authorization keys, or other information required to
4141 install and execute modified versions of a covered work in that
4142 User Product from a modified version of its Corresponding Source.
4143 The information must suffice to ensure that the continued
4144 functioning of the modified object code is in no case prevented or
4145 interfered with solely because modification has been made.
4147 If you convey an object code work under this section in, or with,
4148 or specifically for use in, a User Product, and the conveying
4149 occurs as part of a transaction in which the right of possession
4150 and use of the User Product is transferred to the recipient in
4151 perpetuity or for a fixed term (regardless of how the transaction
4152 is characterized), the Corresponding Source conveyed under this
4153 section must be accompanied by the Installation Information. But
4154 this requirement does not apply if neither you nor any third party
4155 retains the ability to install modified object code on the User
4156 Product (for example, the work has been installed in ROM).
4158 The requirement to provide Installation Information does not
4159 include a requirement to continue to provide support service,
4160 warranty, or updates for a work that has been modified or
4161 installed by the recipient, or for the User Product in which it
4162 has been modified or installed. Access to a network may be denied
4163 when the modification itself materially and adversely affects the
4164 operation of the network or violates the rules and protocols for
4165 communication across the network.
4167 Corresponding Source conveyed, and Installation Information
4168 provided, in accord with this section must be in a format that is
4169 publicly documented (and with an implementation available to the
4170 public in source code form), and must require no special password
4171 or key for unpacking, reading or copying.
4173 7. Additional Terms.
4175 "Additional permissions" are terms that supplement the terms of
4176 this License by making exceptions from one or more of its
4177 conditions. Additional permissions that are applicable to the
4178 entire Program shall be treated as though they were included in
4179 this License, to the extent that they are valid under applicable
4180 law. If additional permissions apply only to part of the Program,
4181 that part may be used separately under those permissions, but the
4182 entire Program remains governed by this License without regard to
4183 the additional permissions.
4185 When you convey a copy of a covered work, you may at your option
4186 remove any additional permissions from that copy, or from any part
4187 of it. (Additional permissions may be written to require their own
4188 removal in certain cases when you modify the work.) You may place
4189 additional permissions on material, added by you to a covered work,
4190 for which you have or can give appropriate copyright permission.
4192 Notwithstanding any other provision of this License, for material
4193 you add to a covered work, you may (if authorized by the copyright
4194 holders of that material) supplement the terms of this License
4197 a. Disclaiming warranty or limiting liability differently from
4198 the terms of sections 15 and 16 of this License; or
4200 b. Requiring preservation of specified reasonable legal notices
4201 or author attributions in that material or in the Appropriate
4202 Legal Notices displayed by works containing it; or
4204 c. Prohibiting misrepresentation of the origin of that material,
4205 or requiring that modified versions of such material be
4206 marked in reasonable ways as different from the original
4209 d. Limiting the use for publicity purposes of names of licensors
4210 or authors of the material; or
4212 e. Declining to grant rights under trademark law for use of some
4213 trade names, trademarks, or service marks; or
4215 f. Requiring indemnification of licensors and authors of that
4216 material by anyone who conveys the material (or modified
4217 versions of it) with contractual assumptions of liability to
4218 the recipient, for any liability that these contractual
4219 assumptions directly impose on those licensors and authors.
4221 All other non-permissive additional terms are considered "further
4222 restrictions" within the meaning of section 10. If the Program as
4223 you received it, or any part of it, contains a notice stating that
4224 it is governed by this License along with a term that is a further
4225 restriction, you may remove that term. If a license document
4226 contains a further restriction but permits relicensing or
4227 conveying under this License, you may add to a covered work
4228 material governed by the terms of that license document, provided
4229 that the further restriction does not survive such relicensing or
4232 If you add terms to a covered work in accord with this section, you
4233 must place, in the relevant source files, a statement of the
4234 additional terms that apply to those files, or a notice indicating
4235 where to find the applicable terms.
4237 Additional terms, permissive or non-permissive, may be stated in
4238 the form of a separately written license, or stated as exceptions;
4239 the above requirements apply either way.
4243 You may not propagate or modify a covered work except as expressly
4244 provided under this License. Any attempt otherwise to propagate or
4245 modify it is void, and will automatically terminate your rights
4246 under this License (including any patent licenses granted under
4247 the third paragraph of section 11).
4249 However, if you cease all violation of this License, then your
4250 license from a particular copyright holder is reinstated (a)
4251 provisionally, unless and until the copyright holder explicitly
4252 and finally terminates your license, and (b) permanently, if the
4253 copyright holder fails to notify you of the violation by some
4254 reasonable means prior to 60 days after the cessation.
4256 Moreover, your license from a particular copyright holder is
4257 reinstated permanently if the copyright holder notifies you of the
4258 violation by some reasonable means, this is the first time you have
4259 received notice of violation of this License (for any work) from
4260 that copyright holder, and you cure the violation prior to 30 days
4261 after your receipt of the notice.
4263 Termination of your rights under this section does not terminate
4264 the licenses of parties who have received copies or rights from
4265 you under this License. If your rights have been terminated and
4266 not permanently reinstated, you do not qualify to receive new
4267 licenses for the same material under section 10.
4269 9. Acceptance Not Required for Having Copies.
4271 You are not required to accept this License in order to receive or
4272 run a copy of the Program. Ancillary propagation of a covered work
4273 occurring solely as a consequence of using peer-to-peer
4274 transmission to receive a copy likewise does not require
4275 acceptance. However, nothing other than this License grants you
4276 permission to propagate or modify any covered work. These actions
4277 infringe copyright if you do not accept this License. Therefore,
4278 by modifying or propagating a covered work, you indicate your
4279 acceptance of this License to do so.
4281 10. Automatic Licensing of Downstream Recipients.
4283 Each time you convey a covered work, the recipient automatically
4284 receives a license from the original licensors, to run, modify and
4285 propagate that work, subject to this License. You are not
4286 responsible for enforcing compliance by third parties with this
4289 An "entity transaction" is a transaction transferring control of an
4290 organization, or substantially all assets of one, or subdividing an
4291 organization, or merging organizations. If propagation of a
4292 covered work results from an entity transaction, each party to that
4293 transaction who receives a copy of the work also receives whatever
4294 licenses to the work the party's predecessor in interest had or
4295 could give under the previous paragraph, plus a right to
4296 possession of the Corresponding Source of the work from the
4297 predecessor in interest, if the predecessor has it or can get it
4298 with reasonable efforts.
4300 You may not impose any further restrictions on the exercise of the
4301 rights granted or affirmed under this License. For example, you
4302 may not impose a license fee, royalty, or other charge for
4303 exercise of rights granted under this License, and you may not
4304 initiate litigation (including a cross-claim or counterclaim in a
4305 lawsuit) alleging that any patent claim is infringed by making,
4306 using, selling, offering for sale, or importing the Program or any
4311 A "contributor" is a copyright holder who authorizes use under this
4312 License of the Program or a work on which the Program is based.
4313 The work thus licensed is called the contributor's "contributor
4316 A contributor's "essential patent claims" are all patent claims
4317 owned or controlled by the contributor, whether already acquired or
4318 hereafter acquired, that would be infringed by some manner,
4319 permitted by this License, of making, using, or selling its
4320 contributor version, but do not include claims that would be
4321 infringed only as a consequence of further modification of the
4322 contributor version. For purposes of this definition, "control"
4323 includes the right to grant patent sublicenses in a manner
4324 consistent with the requirements of this License.
4326 Each contributor grants you a non-exclusive, worldwide,
4327 royalty-free patent license under the contributor's essential
4328 patent claims, to make, use, sell, offer for sale, import and
4329 otherwise run, modify and propagate the contents of its
4330 contributor version.
4332 In the following three paragraphs, a "patent license" is any
4333 express agreement or commitment, however denominated, not to
4334 enforce a patent (such as an express permission to practice a
4335 patent or covenant not to sue for patent infringement). To
4336 "grant" such a patent license to a party means to make such an
4337 agreement or commitment not to enforce a patent against the party.
4339 If you convey a covered work, knowingly relying on a patent
4340 license, and the Corresponding Source of the work is not available
4341 for anyone to copy, free of charge and under the terms of this
4342 License, through a publicly available network server or other
4343 readily accessible means, then you must either (1) cause the
4344 Corresponding Source to be so available, or (2) arrange to deprive
4345 yourself of the benefit of the patent license for this particular
4346 work, or (3) arrange, in a manner consistent with the requirements
4347 of this License, to extend the patent license to downstream
4348 recipients. "Knowingly relying" means you have actual knowledge
4349 that, but for the patent license, your conveying the covered work
4350 in a country, or your recipient's use of the covered work in a
4351 country, would infringe one or more identifiable patents in that
4352 country that you have reason to believe are valid.
4354 If, pursuant to or in connection with a single transaction or
4355 arrangement, you convey, or propagate by procuring conveyance of, a
4356 covered work, and grant a patent license to some of the parties
4357 receiving the covered work authorizing them to use, propagate,
4358 modify or convey a specific copy of the covered work, then the
4359 patent license you grant is automatically extended to all
4360 recipients of the covered work and works based on it.
4362 A patent license is "discriminatory" if it does not include within
4363 the scope of its coverage, prohibits the exercise of, or is
4364 conditioned on the non-exercise of one or more of the rights that
4365 are specifically granted under this License. You may not convey a
4366 covered work if you are a party to an arrangement with a third
4367 party that is in the business of distributing software, under
4368 which you make payment to the third party based on the extent of
4369 your activity of conveying the work, and under which the third
4370 party grants, to any of the parties who would receive the covered
4371 work from you, a discriminatory patent license (a) in connection
4372 with copies of the covered work conveyed by you (or copies made
4373 from those copies), or (b) primarily for and in connection with
4374 specific products or compilations that contain the covered work,
4375 unless you entered into that arrangement, or that patent license
4376 was granted, prior to 28 March 2007.
4378 Nothing in this License shall be construed as excluding or limiting
4379 any implied license or other defenses to infringement that may
4380 otherwise be available to you under applicable patent law.
4382 12. No Surrender of Others' Freedom.
4384 If conditions are imposed on you (whether by court order,
4385 agreement or otherwise) that contradict the conditions of this
4386 License, they do not excuse you from the conditions of this
4387 License. If you cannot convey a covered work so as to satisfy
4388 simultaneously your obligations under this License and any other
4389 pertinent obligations, then as a consequence you may not convey it
4390 at all. For example, if you agree to terms that obligate you to
4391 collect a royalty for further conveying from those to whom you
4392 convey the Program, the only way you could satisfy both those
4393 terms and this License would be to refrain entirely from conveying
4396 13. Use with the GNU Affero General Public License.
4398 Notwithstanding any other provision of this License, you have
4399 permission to link or combine any covered work with a work licensed
4400 under version 3 of the GNU Affero General Public License into a
4401 single combined work, and to convey the resulting work. The terms
4402 of this License will continue to apply to the part which is the
4403 covered work, but the special requirements of the GNU Affero
4404 General Public License, section 13, concerning interaction through
4405 a network will apply to the combination as such.
4407 14. Revised Versions of this License.
4409 The Free Software Foundation may publish revised and/or new
4410 versions of the GNU General Public License from time to time.
4411 Such new versions will be similar in spirit to the present
4412 version, but may differ in detail to address new problems or
4415 Each version is given a distinguishing version number. If the
4416 Program specifies that a certain numbered version of the GNU
4417 General Public License "or any later version" applies to it, you
4418 have the option of following the terms and conditions either of
4419 that numbered version or of any later version published by the
4420 Free Software Foundation. If the Program does not specify a
4421 version number of the GNU General Public License, you may choose
4422 any version ever published by the Free Software Foundation.
4424 If the Program specifies that a proxy can decide which future
4425 versions of the GNU General Public License can be used, that
4426 proxy's public statement of acceptance of a version permanently
4427 authorizes you to choose that version for the Program.
4429 Later license versions may give you additional or different
4430 permissions. However, no additional obligations are imposed on any
4431 author or copyright holder as a result of your choosing to follow a
4434 15. Disclaimer of Warranty.
4436 THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
4437 APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE
4438 COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS"
4439 WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED,
4440 INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
4441 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE
4442 RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.
4443 SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL
4444 NECESSARY SERVICING, REPAIR OR CORRECTION.
4446 16. Limitation of Liability.
4448 IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
4449 WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES
4450 AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU
4451 FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR
4452 CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE
4453 THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA
4454 BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
4455 PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
4456 PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF
4457 THE POSSIBILITY OF SUCH DAMAGES.
4459 17. Interpretation of Sections 15 and 16.
4461 If the disclaimer of warranty and limitation of liability provided
4462 above cannot be given local legal effect according to their terms,
4463 reviewing courts shall apply local law that most closely
4464 approximates an absolute waiver of all civil liability in
4465 connection with the Program, unless a warranty or assumption of
4466 liability accompanies a copy of the Program in return for a fee.
4469 END OF TERMS AND CONDITIONS
4470 ===========================
4472 How to Apply These Terms to Your New Programs
4473 =============================================
4475 If you develop a new program, and you want it to be of the greatest
4476 possible use to the public, the best way to achieve this is to make it
4477 free software which everyone can redistribute and change under these
4480 To do so, attach the following notices to the program. It is safest
4481 to attach them to the start of each source file to most effectively
4482 state the exclusion of warranty; and each file should have at least the
4483 "copyright" line and a pointer to where the full notice is found.
4485 ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES.
4486 Copyright (C) YEAR NAME OF AUTHOR
4488 This program is free software: you can redistribute it and/or modify
4489 it under the terms of the GNU General Public License as published by
4490 the Free Software Foundation, either version 3 of the License, or (at
4491 your option) any later version.
4493 This program is distributed in the hope that it will be useful, but
4494 WITHOUT ANY WARRANTY; without even the implied warranty of
4495 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
4496 General Public License for more details.
4498 You should have received a copy of the GNU General Public License
4499 along with this program. If not, see `http://www.gnu.org/licenses/'.
4501 Also add information on how to contact you by electronic and paper
4504 If the program does terminal interaction, make it output a short
4505 notice like this when it starts in an interactive mode:
4507 PROGRAM Copyright (C) YEAR NAME OF AUTHOR
4508 This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
4509 This is free software, and you are welcome to redistribute it
4510 under certain conditions; type `show c' for details.
4512 The hypothetical commands `show w' and `show c' should show the
4513 appropriate parts of the General Public License. Of course, your
4514 program's commands might be different; for a GUI interface, you would
4517 You should also get your employer (if you work as a programmer) or
4518 school, if any, to sign a "copyright disclaimer" for the program, if
4519 necessary. For more information on this, and how to apply and follow
4520 the GNU GPL, see `http://www.gnu.org/licenses/'.
4522 The GNU General Public License does not permit incorporating your
4523 program into proprietary programs. If your program is a subroutine
4524 library, you may consider it more useful to permit linking proprietary
4525 applications with the library. If this is what you want to do, use the
4526 GNU Lesser General Public License instead of this License. But first,
4527 please read `http://www.gnu.org/philosophy/why-not-lgpl.html'.
4530 File: libunistring.info, Node: GNU LGPL, Next: GNU FDL, Prev: GNU GPL, Up: Licenses
4532 A.2 GNU LESSER GENERAL PUBLIC LICENSE
4533 =====================================
4535 Version 3, 29 June 2007
4537 Copyright (C) 2007 Free Software Foundation, Inc. `http://fsf.org/'
4539 Everyone is permitted to copy and distribute verbatim copies of this
4540 license document, but changing it is not allowed.
4542 This version of the GNU Lesser General Public License incorporates
4543 the terms and conditions of version 3 of the GNU General Public
4544 License, supplemented by the additional permissions listed below.
4546 0. Additional Definitions.
4548 As used herein, "this License" refers to version 3 of the GNU
4549 Lesser General Public License, and the "GNU GPL" refers to version
4550 3 of the GNU General Public License.
4552 "The Library" refers to a covered work governed by this License,
4553 other than an Application or a Combined Work as defined below.
4555 An "Application" is any work that makes use of an interface
4556 provided by the Library, but which is not otherwise based on the
4557 Library. Defining a subclass of a class defined by the Library is
4558 deemed a mode of using an interface provided by the Library.
4560 A "Combined Work" is a work produced by combining or linking an
4561 Application with the Library. The particular version of the
4562 Library with which the Combined Work was made is also called the
4565 The "Minimal Corresponding Source" for a Combined Work means the
4566 Corresponding Source for the Combined Work, excluding any source
4567 code for portions of the Combined Work that, considered in
4568 isolation, are based on the Application, and not on the Linked
4571 The "Corresponding Application Code" for a Combined Work means the
4572 object code and/or source code for the Application, including any
4573 data and utility programs needed for reproducing the Combined Work
4574 from the Application, but excluding the System Libraries of the
4577 1. Exception to Section 3 of the GNU GPL.
4579 You may convey a covered work under sections 3 and 4 of this
4580 License without being bound by section 3 of the GNU GPL.
4582 2. Conveying Modified Versions.
4584 If you modify a copy of the Library, and, in your modifications, a
4585 facility refers to a function or data to be supplied by an
4586 Application that uses the facility (other than as an argument
4587 passed when the facility is invoked), then you may convey a copy
4588 of the modified version:
4590 a. under this License, provided that you make a good faith
4591 effort to ensure that, in the event an Application does not
4592 supply the function or data, the facility still operates, and
4593 performs whatever part of its purpose remains meaningful, or
4595 b. under the GNU GPL, with none of the additional permissions of
4596 this License applicable to that copy.
4598 3. Object Code Incorporating Material from Library Header Files.
4600 The object code form of an Application may incorporate material
4601 from a header file that is part of the Library. You may convey
4602 such object code under terms of your choice, provided that, if the
4603 incorporated material is not limited to numerical parameters, data
4604 structure layouts and accessors, or small macros, inline functions
4605 and templates (ten or fewer lines in length), you do both of the
4608 a. Give prominent notice with each copy of the object code that
4609 the Library is used in it and that the Library and its use are
4610 covered by this License.
4612 b. Accompany the object code with a copy of the GNU GPL and this
4617 You may convey a Combined Work under terms of your choice that,
4618 taken together, effectively do not restrict modification of the
4619 portions of the Library contained in the Combined Work and reverse
4620 engineering for debugging such modifications, if you also do each
4623 a. Give prominent notice with each copy of the Combined Work that
4624 the Library is used in it and that the Library and its use are
4625 covered by this License.
4627 b. Accompany the Combined Work with a copy of the GNU GPL and
4628 this license document.
4630 c. For a Combined Work that displays copyright notices during
4631 execution, include the copyright notice for the Library among
4632 these notices, as well as a reference directing the user to
4633 the copies of the GNU GPL and this license document.
4635 d. Do one of the following:
4637 0. Convey the Minimal Corresponding Source under the terms
4638 of this License, and the Corresponding Application Code
4639 in a form suitable for, and under terms that permit, the
4640 user to recombine or relink the Application with a
4641 modified version of the Linked Version to produce a
4642 modified Combined Work, in the manner specified by
4643 section 6 of the GNU GPL for conveying Corresponding
4646 1. Use a suitable shared library mechanism for linking with
4647 the Library. A suitable mechanism is one that (a) uses
4648 at run time a copy of the Library already present on the
4649 user's computer system, and (b) will operate properly
4650 with a modified version of the Library that is
4651 interface-compatible with the Linked Version.
4653 e. Provide Installation Information, but only if you would
4654 otherwise be required to provide such information under
4655 section 6 of the GNU GPL, and only to the extent that such
4656 information is necessary to install and execute a modified
4657 version of the Combined Work produced by recombining or
4658 relinking the Application with a modified version of the
4659 Linked Version. (If you use option 4d0, the Installation
4660 Information must accompany the Minimal Corresponding Source
4661 and Corresponding Application Code. If you use option 4d1,
4662 you must provide the Installation Information in the manner
4663 specified by section 6 of the GNU GPL for conveying
4664 Corresponding Source.)
4666 5. Combined Libraries.
4668 You may place library facilities that are a work based on the
4669 Library side by side in a single library together with other
4670 library facilities that are not Applications and are not covered
4671 by this License, and convey such a combined library under terms of
4672 your choice, if you do both of the following:
4674 a. Accompany the combined library with a copy of the same work
4675 based on the Library, uncombined with any other library
4676 facilities, conveyed under the terms of this License.
4678 b. Give prominent notice with the combined library that part of
4679 it is a work based on the Library, and explaining where to
4680 find the accompanying uncombined form of the same work.
4682 6. Revised Versions of the GNU Lesser General Public License.
4684 The Free Software Foundation may publish revised and/or new
4685 versions of the GNU Lesser General Public License from time to
4686 time. Such new versions will be similar in spirit to the present
4687 version, but may differ in detail to address new problems or
4690 Each version is given a distinguishing version number. If the
4691 Library as you received it specifies that a certain numbered
4692 version of the GNU Lesser General Public License "or any later
4693 version" applies to it, you have the option of following the terms
4694 and conditions either of that published version or of any later
4695 version published by the Free Software Foundation. If the Library
4696 as you received it does not specify a version number of the GNU
4697 Lesser General Public License, you may choose any version of the
4698 GNU Lesser General Public License ever published by the Free
4699 Software Foundation.
4701 If the Library as you received it specifies that a proxy can decide
4702 whether future versions of the GNU Lesser General Public License
4703 shall apply, that proxy's public statement of acceptance of any
4704 version is permanent authorization for you to choose that version
4709 File: libunistring.info, Node: GNU FDL, Prev: GNU LGPL, Up: Licenses
4711 A.3 GNU Free Documentation License
4712 ==================================
4714 Version 1.3, 3 November 2008
4716 Copyright (C) 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc.
4719 Everyone is permitted to copy and distribute verbatim copies
4720 of this license document, but changing it is not allowed.
4724 The purpose of this License is to make a manual, textbook, or other
4725 functional and useful document "free" in the sense of freedom: to
4726 assure everyone the effective freedom to copy and redistribute it,
4727 with or without modifying it, either commercially or
4728 noncommercially. Secondarily, this License preserves for the
4729 author and publisher a way to get credit for their work, while not
4730 being considered responsible for modifications made by others.
4732 This License is a kind of "copyleft", which means that derivative
4733 works of the document must themselves be free in the same sense.
4734 It complements the GNU General Public License, which is a copyleft
4735 license designed for free software.
4737 We have designed this License in order to use it for manuals for
4738 free software, because free software needs free documentation: a
4739 free program should come with manuals providing the same freedoms
4740 that the software does. But this License is not limited to
4741 software manuals; it can be used for any textual work, regardless
4742 of subject matter or whether it is published as a printed book.
4743 We recommend this License principally for works whose purpose is
4744 instruction or reference.
4746 1. APPLICABILITY AND DEFINITIONS
4748 This License applies to any manual or other work, in any medium,
4749 that contains a notice placed by the copyright holder saying it
4750 can be distributed under the terms of this License. Such a notice
4751 grants a world-wide, royalty-free license, unlimited in duration,
4752 to use that work under the conditions stated herein. The
4753 "Document", below, refers to any such manual or work. Any member
4754 of the public is a licensee, and is addressed as "you". You
4755 accept the license if you copy, modify or distribute the work in a
4756 way requiring permission under copyright law.
4758 A "Modified Version" of the Document means any work containing the
4759 Document or a portion of it, either copied verbatim, or with
4760 modifications and/or translated into another language.
4762 A "Secondary Section" is a named appendix or a front-matter section
4763 of the Document that deals exclusively with the relationship of the
4764 publishers or authors of the Document to the Document's overall
4765 subject (or to related matters) and contains nothing that could
4766 fall directly within that overall subject. (Thus, if the Document
4767 is in part a textbook of mathematics, a Secondary Section may not
4768 explain any mathematics.) The relationship could be a matter of
4769 historical connection with the subject or with related matters, or
4770 of legal, commercial, philosophical, ethical or political position
4773 The "Invariant Sections" are certain Secondary Sections whose
4774 titles are designated, as being those of Invariant Sections, in
4775 the notice that says that the Document is released under this
4776 License. If a section does not fit the above definition of
4777 Secondary then it is not allowed to be designated as Invariant.
4778 The Document may contain zero Invariant Sections. If the Document
4779 does not identify any Invariant Sections then there are none.
4781 The "Cover Texts" are certain short passages of text that are
4782 listed, as Front-Cover Texts or Back-Cover Texts, in the notice
4783 that says that the Document is released under this License. A
4784 Front-Cover Text may be at most 5 words, and a Back-Cover Text may
4785 be at most 25 words.
4787 A "Transparent" copy of the Document means a machine-readable copy,
4788 represented in a format whose specification is available to the
4789 general public, that is suitable for revising the document
4790 straightforwardly with generic text editors or (for images
4791 composed of pixels) generic paint programs or (for drawings) some
4792 widely available drawing editor, and that is suitable for input to
4793 text formatters or for automatic translation to a variety of
4794 formats suitable for input to text formatters. A copy made in an
4795 otherwise Transparent file format whose markup, or absence of
4796 markup, has been arranged to thwart or discourage subsequent
4797 modification by readers is not Transparent. An image format is
4798 not Transparent if used for any substantial amount of text. A
4799 copy that is not "Transparent" is called "Opaque".
4801 Examples of suitable formats for Transparent copies include plain
4802 ASCII without markup, Texinfo input format, LaTeX input format,
4803 SGML or XML using a publicly available DTD, and
4804 standard-conforming simple HTML, PostScript or PDF designed for
4805 human modification. Examples of transparent image formats include
4806 PNG, XCF and JPG. Opaque formats include proprietary formats that
4807 can be read and edited only by proprietary word processors, SGML or
4808 XML for which the DTD and/or processing tools are not generally
4809 available, and the machine-generated HTML, PostScript or PDF
4810 produced by some word processors for output purposes only.
4812 The "Title Page" means, for a printed book, the title page itself,
4813 plus such following pages as are needed to hold, legibly, the
4814 material this License requires to appear in the title page. For
4815 works in formats which do not have any title page as such, "Title
4816 Page" means the text near the most prominent appearance of the
4817 work's title, preceding the beginning of the body of the text.
4819 The "publisher" means any person or entity that distributes copies
4820 of the Document to the public.
4822 A section "Entitled XYZ" means a named subunit of the Document
4823 whose title either is precisely XYZ or contains XYZ in parentheses
4824 following text that translates XYZ in another language. (Here XYZ
4825 stands for a specific section name mentioned below, such as
4826 "Acknowledgements", "Dedications", "Endorsements", or "History".)
4827 To "Preserve the Title" of such a section when you modify the
4828 Document means that it remains a section "Entitled XYZ" according
4831 The Document may include Warranty Disclaimers next to the notice
4832 which states that this License applies to the Document. These
4833 Warranty Disclaimers are considered to be included by reference in
4834 this License, but only as regards disclaiming warranties: any other
4835 implication that these Warranty Disclaimers may have is void and
4836 has no effect on the meaning of this License.
4840 You may copy and distribute the Document in any medium, either
4841 commercially or noncommercially, provided that this License, the
4842 copyright notices, and the license notice saying this License
4843 applies to the Document are reproduced in all copies, and that you
4844 add no other conditions whatsoever to those of this License. You
4845 may not use technical measures to obstruct or control the reading
4846 or further copying of the copies you make or distribute. However,
4847 you may accept compensation in exchange for copies. If you
4848 distribute a large enough number of copies you must also follow
4849 the conditions in section 3.
4851 You may also lend copies, under the same conditions stated above,
4852 and you may publicly display copies.
4854 3. COPYING IN QUANTITY
4856 If you publish printed copies (or copies in media that commonly
4857 have printed covers) of the Document, numbering more than 100, and
4858 the Document's license notice requires Cover Texts, you must
4859 enclose the copies in covers that carry, clearly and legibly, all
4860 these Cover Texts: Front-Cover Texts on the front cover, and
4861 Back-Cover Texts on the back cover. Both covers must also clearly
4862 and legibly identify you as the publisher of these copies. The
4863 front cover must present the full title with all words of the
4864 title equally prominent and visible. You may add other material
4865 on the covers in addition. Copying with changes limited to the
4866 covers, as long as they preserve the title of the Document and
4867 satisfy these conditions, can be treated as verbatim copying in
4870 If the required texts for either cover are too voluminous to fit
4871 legibly, you should put the first ones listed (as many as fit
4872 reasonably) on the actual cover, and continue the rest onto
4875 If you publish or distribute Opaque copies of the Document
4876 numbering more than 100, you must either include a
4877 machine-readable Transparent copy along with each Opaque copy, or
4878 state in or with each Opaque copy a computer-network location from
4879 which the general network-using public has access to download
4880 using public-standard network protocols a complete Transparent
4881 copy of the Document, free of added material. If you use the
4882 latter option, you must take reasonably prudent steps, when you
4883 begin distribution of Opaque copies in quantity, to ensure that
4884 this Transparent copy will remain thus accessible at the stated
4885 location until at least one year after the last time you
4886 distribute an Opaque copy (directly or through your agents or
4887 retailers) of that edition to the public.
4889 It is requested, but not required, that you contact the authors of
4890 the Document well before redistributing any large number of
4891 copies, to give them a chance to provide you with an updated
4892 version of the Document.
4896 You may copy and distribute a Modified Version of the Document
4897 under the conditions of sections 2 and 3 above, provided that you
4898 release the Modified Version under precisely this License, with
4899 the Modified Version filling the role of the Document, thus
4900 licensing distribution and modification of the Modified Version to
4901 whoever possesses a copy of it. In addition, you must do these
4902 things in the Modified Version:
4904 A. Use in the Title Page (and on the covers, if any) a title
4905 distinct from that of the Document, and from those of
4906 previous versions (which should, if there were any, be listed
4907 in the History section of the Document). You may use the
4908 same title as a previous version if the original publisher of
4909 that version gives permission.
4911 B. List on the Title Page, as authors, one or more persons or
4912 entities responsible for authorship of the modifications in
4913 the Modified Version, together with at least five of the
4914 principal authors of the Document (all of its principal
4915 authors, if it has fewer than five), unless they release you
4916 from this requirement.
4918 C. State on the Title page the name of the publisher of the
4919 Modified Version, as the publisher.
4921 D. Preserve all the copyright notices of the Document.
4923 E. Add an appropriate copyright notice for your modifications
4924 adjacent to the other copyright notices.
4926 F. Include, immediately after the copyright notices, a license
4927 notice giving the public permission to use the Modified
4928 Version under the terms of this License, in the form shown in
4931 G. Preserve in that license notice the full lists of Invariant
4932 Sections and required Cover Texts given in the Document's
4935 H. Include an unaltered copy of this License.
4937 I. Preserve the section Entitled "History", Preserve its Title,
4938 and add to it an item stating at least the title, year, new
4939 authors, and publisher of the Modified Version as given on
4940 the Title Page. If there is no section Entitled "History" in
4941 the Document, create one stating the title, year, authors,
4942 and publisher of the Document as given on its Title Page,
4943 then add an item describing the Modified Version as stated in
4944 the previous sentence.
4946 J. Preserve the network location, if any, given in the Document
4947 for public access to a Transparent copy of the Document, and
4948 likewise the network locations given in the Document for
4949 previous versions it was based on. These may be placed in
4950 the "History" section. You may omit a network location for a
4951 work that was published at least four years before the
4952 Document itself, or if the original publisher of the version
4953 it refers to gives permission.
4955 K. For any section Entitled "Acknowledgements" or "Dedications",
4956 Preserve the Title of the section, and preserve in the
4957 section all the substance and tone of each of the contributor
4958 acknowledgements and/or dedications given therein.
4960 L. Preserve all the Invariant Sections of the Document,
4961 unaltered in their text and in their titles. Section numbers
4962 or the equivalent are not considered part of the section
4965 M. Delete any section Entitled "Endorsements". Such a section
4966 may not be included in the Modified Version.
4968 N. Do not retitle any existing section to be Entitled
4969 "Endorsements" or to conflict in title with any Invariant
4972 O. Preserve any Warranty Disclaimers.
4974 If the Modified Version includes new front-matter sections or
4975 appendices that qualify as Secondary Sections and contain no
4976 material copied from the Document, you may at your option
4977 designate some or all of these sections as invariant. To do this,
4978 add their titles to the list of Invariant Sections in the Modified
4979 Version's license notice. These titles must be distinct from any
4980 other section titles.
4982 You may add a section Entitled "Endorsements", provided it contains
4983 nothing but endorsements of your Modified Version by various
4984 parties--for example, statements of peer review or that the text
4985 has been approved by an organization as the authoritative
4986 definition of a standard.
4988 You may add a passage of up to five words as a Front-Cover Text,
4989 and a passage of up to 25 words as a Back-Cover Text, to the end
4990 of the list of Cover Texts in the Modified Version. Only one
4991 passage of Front-Cover Text and one of Back-Cover Text may be
4992 added by (or through arrangements made by) any one entity. If the
4993 Document already includes a cover text for the same cover,
4994 previously added by you or by arrangement made by the same entity
4995 you are acting on behalf of, you may not add another; but you may
4996 replace the old one, on explicit permission from the previous
4997 publisher that added the old one.
4999 The author(s) and publisher(s) of the Document do not by this
5000 License give permission to use their names for publicity for or to
5001 assert or imply endorsement of any Modified Version.
5003 5. COMBINING DOCUMENTS
5005 You may combine the Document with other documents released under
5006 this License, under the terms defined in section 4 above for
5007 modified versions, provided that you include in the combination
5008 all of the Invariant Sections of all of the original documents,
5009 unmodified, and list them all as Invariant Sections of your
5010 combined work in its license notice, and that you preserve all
5011 their Warranty Disclaimers.
5013 The combined work need only contain one copy of this License, and
5014 multiple identical Invariant Sections may be replaced with a single
5015 copy. If there are multiple Invariant Sections with the same name
5016 but different contents, make the title of each such section unique
5017 by adding at the end of it, in parentheses, the name of the
5018 original author or publisher of that section if known, or else a
5019 unique number. Make the same adjustment to the section titles in
5020 the list of Invariant Sections in the license notice of the
5023 In the combination, you must combine any sections Entitled
5024 "History" in the various original documents, forming one section
5025 Entitled "History"; likewise combine any sections Entitled
5026 "Acknowledgements", and any sections Entitled "Dedications". You
5027 must delete all sections Entitled "Endorsements."
5029 6. COLLECTIONS OF DOCUMENTS
5031 You may make a collection consisting of the Document and other
5032 documents released under this License, and replace the individual
5033 copies of this License in the various documents with a single copy
5034 that is included in the collection, provided that you follow the
5035 rules of this License for verbatim copying of each of the
5036 documents in all other respects.
5038 You may extract a single document from such a collection, and
5039 distribute it individually under this License, provided you insert
5040 a copy of this License into the extracted document, and follow
5041 this License in all other respects regarding verbatim copying of
5044 7. AGGREGATION WITH INDEPENDENT WORKS
5046 A compilation of the Document or its derivatives with other
5047 separate and independent documents or works, in or on a volume of
5048 a storage or distribution medium, is called an "aggregate" if the
5049 copyright resulting from the compilation is not used to limit the
5050 legal rights of the compilation's users beyond what the individual
5051 works permit. When the Document is included in an aggregate, this
5052 License does not apply to the other works in the aggregate which
5053 are not themselves derivative works of the Document.
5055 If the Cover Text requirement of section 3 is applicable to these
5056 copies of the Document, then if the Document is less than one half
5057 of the entire aggregate, the Document's Cover Texts may be placed
5058 on covers that bracket the Document within the aggregate, or the
5059 electronic equivalent of covers if the Document is in electronic
5060 form. Otherwise they must appear on printed covers that bracket
5061 the whole aggregate.
5065 Translation is considered a kind of modification, so you may
5066 distribute translations of the Document under the terms of section
5067 4. Replacing Invariant Sections with translations requires special
5068 permission from their copyright holders, but you may include
5069 translations of some or all Invariant Sections in addition to the
5070 original versions of these Invariant Sections. You may include a
5071 translation of this License, and all the license notices in the
5072 Document, and any Warranty Disclaimers, provided that you also
5073 include the original English version of this License and the
5074 original versions of those notices and disclaimers. In case of a
5075 disagreement between the translation and the original version of
5076 this License or a notice or disclaimer, the original version will
5079 If a section in the Document is Entitled "Acknowledgements",
5080 "Dedications", or "History", the requirement (section 4) to
5081 Preserve its Title (section 1) will typically require changing the
5086 You may not copy, modify, sublicense, or distribute the Document
5087 except as expressly provided under this License. Any attempt
5088 otherwise to copy, modify, sublicense, or distribute it is void,
5089 and will automatically terminate your rights under this License.
5091 However, if you cease all violation of this License, then your
5092 license from a particular copyright holder is reinstated (a)
5093 provisionally, unless and until the copyright holder explicitly
5094 and finally terminates your license, and (b) permanently, if the
5095 copyright holder fails to notify you of the violation by some
5096 reasonable means prior to 60 days after the cessation.
5098 Moreover, your license from a particular copyright holder is
5099 reinstated permanently if the copyright holder notifies you of the
5100 violation by some reasonable means, this is the first time you have
5101 received notice of violation of this License (for any work) from
5102 that copyright holder, and you cure the violation prior to 30 days
5103 after your receipt of the notice.
5105 Termination of your rights under this section does not terminate
5106 the licenses of parties who have received copies or rights from
5107 you under this License. If your rights have been terminated and
5108 not permanently reinstated, receipt of a copy of some or all of
5109 the same material does not give you any rights to use it.
5111 10. FUTURE REVISIONS OF THIS LICENSE
5113 The Free Software Foundation may publish new, revised versions of
5114 the GNU Free Documentation License from time to time. Such new
5115 versions will be similar in spirit to the present version, but may
5116 differ in detail to address new problems or concerns. See
5117 `http://www.gnu.org/copyleft/'.
5119 Each version of the License is given a distinguishing version
5120 number. If the Document specifies that a particular numbered
5121 version of this License "or any later version" applies to it, you
5122 have the option of following the terms and conditions either of
5123 that specified version or of any later version that has been
5124 published (not as a draft) by the Free Software Foundation. If
5125 the Document does not specify a version number of this License,
5126 you may choose any version ever published (not as a draft) by the
5127 Free Software Foundation. If the Document specifies that a proxy
5128 can decide which future versions of this License can be used, that
5129 proxy's public statement of acceptance of a version permanently
5130 authorizes you to choose that version for the Document.
5134 "Massive Multiauthor Collaboration Site" (or "MMC Site") means any
5135 World Wide Web server that publishes copyrightable works and also
5136 provides prominent facilities for anybody to edit those works. A
5137 public wiki that anybody can edit is an example of such a server.
5138 A "Massive Multiauthor Collaboration" (or "MMC") contained in the
5139 site means any set of copyrightable works thus published on the MMC
5142 "CC-BY-SA" means the Creative Commons Attribution-Share Alike 3.0
5143 license published by Creative Commons Corporation, a not-for-profit
5144 corporation with a principal place of business in San Francisco,
5145 California, as well as future copyleft versions of that license
5146 published by that same organization.
5148 "Incorporate" means to publish or republish a Document, in whole or
5149 in part, as part of another Document.
5151 An MMC is "eligible for relicensing" if it is licensed under this
5152 License, and if all works that were first published under this
5153 License somewhere other than this MMC, and subsequently
5154 incorporated in whole or in part into the MMC, (1) had no cover
5155 texts or invariant sections, and (2) were thus incorporated prior
5156 to November 1, 2008.
5158 The operator of an MMC Site may republish an MMC contained in the
5159 site under CC-BY-SA on the same site at any time before August 1,
5160 2009, provided the MMC is eligible for relicensing.
5163 ADDENDUM: How to use this License for your documents
5164 ====================================================
5166 To use this License in a document you have written, include a copy of
5167 the License in the document and put the following copyright and license
5168 notices just after the title page:
5170 Copyright (C) YEAR YOUR NAME.
5171 Permission is granted to copy, distribute and/or modify this document
5172 under the terms of the GNU Free Documentation License, Version 1.3
5173 or any later version published by the Free Software Foundation;
5174 with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
5175 Texts. A copy of the license is included in the section entitled ``GNU
5176 Free Documentation License''.
5178 If you have Invariant Sections, Front-Cover Texts and Back-Cover
5179 Texts, replace the "with...Texts." line with this:
5181 with the Invariant Sections being LIST THEIR TITLES, with
5182 the Front-Cover Texts being LIST, and with the Back-Cover Texts
5185 If you have Invariant Sections without Cover Texts, or some other
5186 combination of the three, merge those two alternatives to suit the
5189 If your document contains nontrivial examples of program code, we
5190 recommend releasing these examples in parallel under your choice of
5191 free software license, such as the GNU General Public License, to
5192 permit their use in free software.
5195 File: libunistring.info, Node: Index, Prev: Licenses, Up: Top
5203 * ambiguous width: uniwidth.h. (line 10)
5204 * argument conventions: Conventions. (line 9)
5205 * autoconf macro: Autoconf macro. (line 6)
5206 * bidirectional category: Bidirectional category.
5208 * bidirectional reordering: More functionality. (line 6)
5209 * block: Blocks. (line 6)
5210 * breaks, line: unilbrk.h. (line 6)
5211 * breaks, word: uniwbrk.h. (line 6)
5212 * bug reports: Reporting problems. (line 6)
5213 * bug tracker: Reporting problems. (line 6)
5214 * C string functions: char * strings. (line 6)
5215 * C, programming language: ISO C and Java syntax.
5217 * C-like API: Classifications like in ISO C.
5219 * canonical combining class: Canonical combining class.
5221 * case detection: Case detection. (line 6)
5222 * case mappings: Case mappings of strings.
5224 * casing_prefix_context_t: Case mappings of substrings.
5226 * casing_suffix_context_t: Case mappings of substrings.
5228 * char, type: char * strings. (line 23)
5229 * combining, Unicode characters: Composition of characters.
5231 * comparing <1>: Elementary string functions on NUL terminated strings.
5233 * comparing: Elementary string functions.
5235 * comparing, ignoring case: Case insensitive comparison.
5237 * comparing, ignoring case, with collation rules: Case insensitive comparison.
5239 * comparing, ignoring normalization: Normalizing comparisons.
5241 * comparing, ignoring normalization and case: Case insensitive comparison.
5243 * comparing, ignoring normalization and case, with collation rules: Case insensitive comparison.
5245 * comparing, ignoring normalization, with collation rules: Normalizing comparisons.
5247 * comparing, with collation rules: Elementary string functions on NUL terminated strings.
5249 * comparing, with collation rules, ignoring case: Case insensitive comparison.
5251 * comparing, with collation rules, ignoring normalization: Normalizing comparisons.
5253 * comparing, with collation rules, ignoring normalization and case: Case insensitive comparison.
5255 * compiler options: Compiler options. (line 24)
5256 * composing, Unicode characters: Composition of characters.
5258 * converting <1>: uniconv.h. (line 45)
5259 * converting: Elementary string conversions.
5261 * copying <1>: Elementary string functions on NUL terminated strings.
5263 * copying: Elementary string functions.
5265 * counting: Elementary string functions.
5267 * decomposing: Decomposition of characters.
5269 * dependencies: Installation. (line 6)
5270 * detecting case: Case detection. (line 6)
5271 * duplicating <1>: Elementary string functions on NUL terminated strings.
5273 * duplicating: Elementary string functions with memory allocation.
5275 * enum iconv_ilseq_handler: uniconv.h. (line 30)
5276 * FDL, GNU Free Documentation License: GNU FDL. (line 6)
5277 * formatted output: unistdio.h. (line 6)
5278 * fullwidth: uniwidth.h. (line 22)
5279 * general category: General category. (line 6)
5280 * gl_LIBUNISTRING: Autoconf macro. (line 11)
5281 * GPL, GNU General Public License: GNU GPL. (line 6)
5282 * halfwidth: uniwidth.h. (line 22)
5283 * identifiers: ISO C and Java syntax.
5285 * installation: Installation. (line 10)
5286 * internationalization: Unicode and i18n. (line 6)
5287 * iterating <1>: Elementary string functions on NUL terminated strings.
5289 * iterating: Elementary string functions.
5291 * Java, programming language: ISO C and Java syntax.
5293 * LGPL, GNU Lesser General Public License: GNU LGPL. (line 6)
5294 * License, GNU FDL: GNU FDL. (line 6)
5295 * License, GNU GPL: GNU GPL. (line 6)
5296 * License, GNU LGPL: GNU LGPL. (line 6)
5297 * Licenses: Licenses. (line 6)
5298 * line breaks: unilbrk.h. (line 6)
5299 * locale: Locale encodings. (line 6)
5300 * locale categories: Locale encodings. (line 10)
5301 * locale encoding <1>: uniconv.h. (line 10)
5302 * locale encoding: Locale encodings. (line 28)
5303 * locale language: Case mappings of strings.
5305 * locale, multibyte: char * strings. (line 13)
5306 * locale_charset: uniconv.h. (line 13)
5307 * lowercasing: Case mappings of strings.
5309 * mailing list: Reporting problems. (line 6)
5310 * mirroring, of Unicode character: Mirrored character. (line 6)
5311 * normal forms: uninorm.h. (line 6)
5312 * normalizing: uninorm.h. (line 6)
5313 * output, formatted: unistdio.h. (line 6)
5314 * properties, of Unicode character: Properties. (line 6)
5315 * regular expression: uniregex.h. (line 6)
5316 * rendering: More functionality. (line 9)
5317 * return value conventions: Conventions. (line 47)
5318 * scripts: Scripts. (line 6)
5319 * searching, for a character <1>: Elementary string functions on NUL terminated strings.
5321 * searching, for a character: Elementary string functions.
5323 * searching, for a substring: Elementary string functions on NUL terminated strings.
5325 * stream, normalizing a: Normalization of streams.
5327 * struct uninorm_filter: Normalization of streams.
5329 * titlecasing: Case mappings of strings.
5331 * u16_asnprintf: unistdio.h. (line 132)
5332 * u16_asprintf: unistdio.h. (line 129)
5333 * u16_casecmp: Case insensitive comparison.
5335 * u16_casecoll: Case insensitive comparison.
5337 * u16_casefold: Case insensitive comparison.
5339 * u16_casexfrm: Case insensitive comparison.
5341 * u16_casing_prefix_context: Case mappings of substrings.
5343 * u16_casing_prefixes_context: Case mappings of substrings.
5345 * u16_casing_suffix_context: Case mappings of substrings.
5347 * u16_casing_suffixes_context: Case mappings of substrings.
5349 * u16_check: Elementary string checks.
5351 * u16_chr: Elementary string functions.
5353 * u16_cmp: Elementary string functions.
5355 * u16_cmp2: Elementary string functions.
5357 * u16_conv_from_encoding: uniconv.h. (line 54)
5358 * u16_conv_to_encoding: uniconv.h. (line 91)
5359 * u16_cpy: Elementary string functions.
5361 * u16_cpy_alloc: Elementary string functions with memory allocation.
5363 * u16_ct_casefold: Case insensitive comparison.
5365 * u16_ct_tolower: Case mappings of substrings.
5367 * u16_ct_totitle: Case mappings of substrings.
5369 * u16_ct_toupper: Case mappings of substrings.
5371 * u16_endswith: Elementary string functions on NUL terminated strings.
5373 * u16_is_cased: Case detection. (line 57)
5374 * u16_is_casefolded: Case detection. (line 44)
5375 * u16_is_lowercase: Case detection. (line 24)
5376 * u16_is_titlecase: Case detection. (line 34)
5377 * u16_is_uppercase: Case detection. (line 14)
5378 * u16_mblen: Elementary string functions.
5380 * u16_mbsnlen: Elementary string functions.
5382 * u16_mbtouc: Elementary string functions.
5384 * u16_mbtouc_unsafe: Elementary string functions.
5386 * u16_mbtoucr: Elementary string functions.
5388 * u16_move: Elementary string functions.
5390 * u16_next: Elementary string functions on NUL terminated strings.
5392 * u16_normalize: Normalization of strings.
5394 * u16_normcmp: Normalizing comparisons.
5396 * u16_normcoll: Normalizing comparisons.
5398 * u16_normxfrm: Normalizing comparisons.
5400 * u16_possible_linebreaks: unilbrk.h. (line 46)
5401 * u16_prev: Elementary string functions on NUL terminated strings.
5403 * u16_set: Elementary string functions.
5405 * u16_snprintf: unistdio.h. (line 126)
5406 * u16_sprintf: unistdio.h. (line 123)
5407 * u16_startswith: Elementary string functions on NUL terminated strings.
5409 * u16_stpcpy: Elementary string functions on NUL terminated strings.
5411 * u16_stpncpy: Elementary string functions on NUL terminated strings.
5413 * u16_strcat: Elementary string functions on NUL terminated strings.
5415 * u16_strchr: Elementary string functions on NUL terminated strings.
5417 * u16_strcmp: Elementary string functions on NUL terminated strings.
5419 * u16_strcoll: Elementary string functions on NUL terminated strings.
5421 * u16_strconv_from_encoding: uniconv.h. (line 129)
5422 * u16_strconv_from_locale: uniconv.h. (line 157)
5423 * u16_strconv_to_encoding: uniconv.h. (line 142)
5424 * u16_strconv_to_locale: uniconv.h. (line 167)
5425 * u16_strcpy: Elementary string functions on NUL terminated strings.
5427 * u16_strcspn: Elementary string functions on NUL terminated strings.
5429 * u16_strdup: Elementary string functions on NUL terminated strings.
5431 * u16_strlen: Elementary string functions on NUL terminated strings.
5433 * u16_strmblen: Elementary string functions on NUL terminated strings.
5435 * u16_strmbtouc: Elementary string functions on NUL terminated strings.
5437 * u16_strncat: Elementary string functions on NUL terminated strings.
5439 * u16_strncmp: Elementary string functions on NUL terminated strings.
5441 * u16_strncpy: Elementary string functions on NUL terminated strings.
5443 * u16_strnlen: Elementary string functions on NUL terminated strings.
5445 * u16_strpbrk: Elementary string functions on NUL terminated strings.
5447 * u16_strrchr: Elementary string functions on NUL terminated strings.
5449 * u16_strspn: Elementary string functions on NUL terminated strings.
5451 * u16_strstr: Elementary string functions on NUL terminated strings.
5453 * u16_strtok: Elementary string functions on NUL terminated strings.
5455 * u16_strwidth: uniwidth.h. (line 39)
5456 * u16_to_u32: Elementary string conversions.
5458 * u16_to_u8: Elementary string conversions.
5460 * u16_tolower: Case mappings of strings.
5462 * u16_totitle: Case mappings of strings.
5464 * u16_toupper: Case mappings of strings.
5466 * u16_u16_asnprintf: unistdio.h. (line 159)
5467 * u16_u16_asprintf: unistdio.h. (line 156)
5468 * u16_u16_snprintf: unistdio.h. (line 153)
5469 * u16_u16_sprintf: unistdio.h. (line 150)
5470 * u16_u16_vasnprintf: unistdio.h. (line 171)
5471 * u16_u16_vasprintf: unistdio.h. (line 168)
5472 * u16_u16_vsnprintf: unistdio.h. (line 165)
5473 * u16_u16_vsprintf: unistdio.h. (line 162)
5474 * u16_uctomb: Elementary string functions.
5476 * u16_vasnprintf: unistdio.h. (line 144)
5477 * u16_vasprintf: unistdio.h. (line 141)
5478 * u16_vsnprintf: unistdio.h. (line 138)
5479 * u16_vsprintf: unistdio.h. (line 135)
5480 * u16_width: uniwidth.h. (line 31)
5481 * u16_width_linebreaks: unilbrk.h. (line 65)
5482 * u16_wordbreaks: Word breaks in a string.
5484 * u32_asnprintf: unistdio.h. (line 185)
5485 * u32_asprintf: unistdio.h. (line 182)
5486 * u32_casecmp: Case insensitive comparison.
5488 * u32_casecoll: Case insensitive comparison.
5490 * u32_casefold: Case insensitive comparison.
5492 * u32_casexfrm: Case insensitive comparison.
5494 * u32_casing_prefix_context: Case mappings of substrings.
5496 * u32_casing_prefixes_context: Case mappings of substrings.
5498 * u32_casing_suffix_context: Case mappings of substrings.
5500 * u32_casing_suffixes_context: Case mappings of substrings.
5502 * u32_check: Elementary string checks.
5504 * u32_chr: Elementary string functions.
5506 * u32_cmp: Elementary string functions.
5508 * u32_cmp2: Elementary string functions.
5510 * u32_conv_from_encoding: uniconv.h. (line 57)
5511 * u32_conv_to_encoding: uniconv.h. (line 94)
5512 * u32_cpy: Elementary string functions.
5514 * u32_cpy_alloc: Elementary string functions with memory allocation.
5516 * u32_ct_casefold: Case insensitive comparison.
5518 * u32_ct_tolower: Case mappings of substrings.
5520 * u32_ct_totitle: Case mappings of substrings.
5522 * u32_ct_toupper: Case mappings of substrings.
5524 * u32_endswith: Elementary string functions on NUL terminated strings.
5526 * u32_is_cased: Case detection. (line 59)
5527 * u32_is_casefolded: Case detection. (line 46)
5528 * u32_is_lowercase: Case detection. (line 26)
5529 * u32_is_titlecase: Case detection. (line 36)
5530 * u32_is_uppercase: Case detection. (line 16)
5531 * u32_mblen: Elementary string functions.
5533 * u32_mbsnlen: Elementary string functions.
5535 * u32_mbtouc: Elementary string functions.
5537 * u32_mbtouc_unsafe: Elementary string functions.
5539 * u32_mbtoucr: Elementary string functions.
5541 * u32_move: Elementary string functions.
5543 * u32_next: Elementary string functions on NUL terminated strings.
5545 * u32_normalize: Normalization of strings.
5547 * u32_normcmp: Normalizing comparisons.
5549 * u32_normcoll: Normalizing comparisons.
5551 * u32_normxfrm: Normalizing comparisons.
5553 * u32_possible_linebreaks: unilbrk.h. (line 48)
5554 * u32_prev: Elementary string functions on NUL terminated strings.
5556 * u32_set: Elementary string functions.
5558 * u32_snprintf: unistdio.h. (line 179)
5559 * u32_sprintf: unistdio.h. (line 176)
5560 * u32_startswith: Elementary string functions on NUL terminated strings.
5562 * u32_stpcpy: Elementary string functions on NUL terminated strings.
5564 * u32_stpncpy: Elementary string functions on NUL terminated strings.
5566 * u32_strcat: Elementary string functions on NUL terminated strings.
5568 * u32_strchr: Elementary string functions on NUL terminated strings.
5570 * u32_strcmp: Elementary string functions on NUL terminated strings.
5572 * u32_strcoll: Elementary string functions on NUL terminated strings.
5574 * u32_strconv_from_encoding: uniconv.h. (line 131)
5575 * u32_strconv_from_locale: uniconv.h. (line 158)
5576 * u32_strconv_to_encoding: uniconv.h. (line 144)
5577 * u32_strconv_to_locale: uniconv.h. (line 168)
5578 * u32_strcpy: Elementary string functions on NUL terminated strings.
5580 * u32_strcspn: Elementary string functions on NUL terminated strings.
5582 * u32_strdup: Elementary string functions on NUL terminated strings.
5584 * u32_strlen: Elementary string functions on NUL terminated strings.
5586 * u32_strmblen: Elementary string functions on NUL terminated strings.
5588 * u32_strmbtouc: Elementary string functions on NUL terminated strings.
5590 * u32_strncat: Elementary string functions on NUL terminated strings.
5592 * u32_strncmp: Elementary string functions on NUL terminated strings.
5594 * u32_strncpy: Elementary string functions on NUL terminated strings.
5596 * u32_strnlen: Elementary string functions on NUL terminated strings.
5598 * u32_strpbrk: Elementary string functions on NUL terminated strings.
5600 * u32_strrchr: Elementary string functions on NUL terminated strings.
5602 * u32_strspn: Elementary string functions on NUL terminated strings.
5604 * u32_strstr: Elementary string functions on NUL terminated strings.
5606 * u32_strtok: Elementary string functions on NUL terminated strings.
5608 * u32_strwidth: uniwidth.h. (line 40)
5609 * u32_to_u16: Elementary string conversions.
5611 * u32_to_u8: Elementary string conversions.
5613 * u32_tolower: Case mappings of strings.
5615 * u32_totitle: Case mappings of strings.
5617 * u32_toupper: Case mappings of strings.
5619 * u32_u32_asnprintf: unistdio.h. (line 212)
5620 * u32_u32_asprintf: unistdio.h. (line 209)
5621 * u32_u32_snprintf: unistdio.h. (line 206)
5622 * u32_u32_sprintf: unistdio.h. (line 203)
5623 * u32_u32_vasnprintf: unistdio.h. (line 224)
5624 * u32_u32_vasprintf: unistdio.h. (line 221)
5625 * u32_u32_vsnprintf: unistdio.h. (line 218)
5626 * u32_u32_vsprintf: unistdio.h. (line 215)
5627 * u32_uctomb: Elementary string functions.
5629 * u32_vasnprintf: unistdio.h. (line 197)
5630 * u32_vasprintf: unistdio.h. (line 194)
5631 * u32_vsnprintf: unistdio.h. (line 191)
5632 * u32_vsprintf: unistdio.h. (line 188)
5633 * u32_width: uniwidth.h. (line 33)
5634 * u32_width_linebreaks: unilbrk.h. (line 68)
5635 * u32_wordbreaks: Word breaks in a string.
5637 * u8_asnprintf: unistdio.h. (line 79)
5638 * u8_asprintf: unistdio.h. (line 76)
5639 * u8_casecmp: Case insensitive comparison.
5641 * u8_casecoll: Case insensitive comparison.
5643 * u8_casefold: Case insensitive comparison.
5645 * u8_casexfrm: Case insensitive comparison.
5647 * u8_casing_prefix_context: Case mappings of substrings.
5649 * u8_casing_prefixes_context: Case mappings of substrings.
5651 * u8_casing_suffix_context: Case mappings of substrings.
5653 * u8_casing_suffixes_context: Case mappings of substrings.
5655 * u8_check: Elementary string checks.
5657 * u8_chr: Elementary string functions.
5659 * u8_cmp: Elementary string functions.
5661 * u8_cmp2: Elementary string functions.
5663 * u8_conv_from_encoding: uniconv.h. (line 51)
5664 * u8_conv_to_encoding: uniconv.h. (line 88)
5665 * u8_cpy: Elementary string functions.
5667 * u8_cpy_alloc: Elementary string functions with memory allocation.
5669 * u8_ct_casefold: Case insensitive comparison.
5671 * u8_ct_tolower: Case mappings of substrings.
5673 * u8_ct_totitle: Case mappings of substrings.
5675 * u8_ct_toupper: Case mappings of substrings.
5677 * u8_endswith: Elementary string functions on NUL terminated strings.
5679 * u8_is_cased: Case detection. (line 55)
5680 * u8_is_casefolded: Case detection. (line 42)
5681 * u8_is_lowercase: Case detection. (line 22)
5682 * u8_is_titlecase: Case detection. (line 32)
5683 * u8_is_uppercase: Case detection. (line 12)
5684 * u8_mblen: Elementary string functions.
5686 * u8_mbsnlen: Elementary string functions.
5688 * u8_mbtouc: Elementary string functions.
5690 * u8_mbtouc_unsafe: Elementary string functions.
5692 * u8_mbtoucr: Elementary string functions.
5694 * u8_move: Elementary string functions.
5696 * u8_next: Elementary string functions on NUL terminated strings.
5698 * u8_normalize: Normalization of strings.
5700 * u8_normcmp: Normalizing comparisons.
5702 * u8_normcoll: Normalizing comparisons.
5704 * u8_normxfrm: Normalizing comparisons.
5706 * u8_possible_linebreaks: unilbrk.h. (line 44)
5707 * u8_prev: Elementary string functions on NUL terminated strings.
5709 * u8_set: Elementary string functions.
5711 * u8_snprintf: unistdio.h. (line 73)
5712 * u8_sprintf: unistdio.h. (line 70)
5713 * u8_startswith: Elementary string functions on NUL terminated strings.
5715 * u8_stpcpy: Elementary string functions on NUL terminated strings.
5717 * u8_stpncpy: Elementary string functions on NUL terminated strings.
5719 * u8_strcat: Elementary string functions on NUL terminated strings.
5721 * u8_strchr: Elementary string functions on NUL terminated strings.
5723 * u8_strcmp: Elementary string functions on NUL terminated strings.
5725 * u8_strcoll: Elementary string functions on NUL terminated strings.
5727 * u8_strconv_from_encoding: uniconv.h. (line 127)
5728 * u8_strconv_from_locale: uniconv.h. (line 156)
5729 * u8_strconv_to_encoding: uniconv.h. (line 140)
5730 * u8_strconv_to_locale: uniconv.h. (line 166)
5731 * u8_strcpy: Elementary string functions on NUL terminated strings.
5733 * u8_strcspn: Elementary string functions on NUL terminated strings.
5735 * u8_strdup: Elementary string functions on NUL terminated strings.
5737 * u8_strlen: Elementary string functions on NUL terminated strings.
5739 * u8_strmblen: Elementary string functions on NUL terminated strings.
5741 * u8_strmbtouc: Elementary string functions on NUL terminated strings.
5743 * u8_strncat: Elementary string functions on NUL terminated strings.
5745 * u8_strncmp: Elementary string functions on NUL terminated strings.
5747 * u8_strncpy: Elementary string functions on NUL terminated strings.
5749 * u8_strnlen: Elementary string functions on NUL terminated strings.
5751 * u8_strpbrk: Elementary string functions on NUL terminated strings.
5753 * u8_strrchr: Elementary string functions on NUL terminated strings.
5755 * u8_strspn: Elementary string functions on NUL terminated strings.
5757 * u8_strstr: Elementary string functions on NUL terminated strings.
5759 * u8_strtok: Elementary string functions on NUL terminated strings.
5761 * u8_strwidth: uniwidth.h. (line 38)
5762 * u8_to_u16: Elementary string conversions.
5764 * u8_to_u32: Elementary string conversions.
5766 * u8_tolower: Case mappings of strings.
5768 * u8_totitle: Case mappings of strings.
5770 * u8_toupper: Case mappings of strings.
5772 * u8_u8_asnprintf: unistdio.h. (line 106)
5773 * u8_u8_asprintf: unistdio.h. (line 103)
5774 * u8_u8_snprintf: unistdio.h. (line 100)
5775 * u8_u8_sprintf: unistdio.h. (line 97)
5776 * u8_u8_vasnprintf: unistdio.h. (line 118)
5777 * u8_u8_vasprintf: unistdio.h. (line 115)
5778 * u8_u8_vsnprintf: unistdio.h. (line 112)
5779 * u8_u8_vsprintf: unistdio.h. (line 109)
5780 * u8_uctomb: Elementary string functions.
5782 * u8_vasnprintf: unistdio.h. (line 91)
5783 * u8_vasprintf: unistdio.h. (line 88)
5784 * u8_vsnprintf: unistdio.h. (line 85)
5785 * u8_vsprintf: unistdio.h. (line 82)
5786 * u8_width: uniwidth.h. (line 29)
5787 * u8_width_linebreaks: unilbrk.h. (line 62)
5788 * u8_wordbreaks: Word breaks in a string.
5790 * uc_all_blocks: Blocks. (line 38)
5791 * uc_all_scripts: Scripts. (line 37)
5792 * uc_bidi_category: Bidirectional category.
5794 * uc_bidi_category_byname: Bidirectional category.
5796 * uc_bidi_category_name: Bidirectional category.
5798 * uc_block: Blocks. (line 27)
5799 * uc_block_t: Blocks. (line 12)
5800 * uc_c_ident_category: ISO C and Java syntax.
5802 * uc_canonical_decomposition: Decomposition of characters.
5804 * uc_combining_class: Canonical combining class.
5806 * uc_composition: Composition of characters.
5808 * uc_decimal_value: Decimal digit value. (line 11)
5809 * uc_decomposition: Decomposition of characters.
5811 * uc_digit_value: Digit value. (line 11)
5812 * uc_fraction_t: Numeric value. (line 14)
5813 * uc_general_category: Object oriented API. (line 207)
5814 * uc_general_category_and: Object oriented API. (line 179)
5815 * uc_general_category_and_not: Object oriented API. (line 186)
5816 * uc_general_category_byname: Object oriented API. (line 201)
5817 * uc_general_category_name: Object oriented API. (line 195)
5818 * uc_general_category_or: Object oriented API. (line 173)
5819 * uc_general_category_t: Object oriented API. (line 7)
5820 * uc_is_alnum: Classifications like in ISO C.
5822 * uc_is_alpha: Classifications like in ISO C.
5824 * uc_is_bidi_category: Bidirectional category.
5826 * uc_is_blank: Classifications like in ISO C.
5828 * uc_is_block: Blocks. (line 32)
5829 * uc_is_c_whitespace: ISO C and Java syntax.
5831 * uc_is_cntrl: Classifications like in ISO C.
5833 * uc_is_digit: Classifications like in ISO C.
5835 * uc_is_general_category: Object oriented API. (line 213)
5836 * uc_is_general_category_withtable: Bit mask API. (line 52)
5837 * uc_is_graph: Classifications like in ISO C.
5839 * uc_is_java_whitespace: ISO C and Java syntax.
5841 * uc_is_lower: Classifications like in ISO C.
5843 * uc_is_print: Classifications like in ISO C.
5845 * uc_is_property: Properties as objects.
5847 * uc_is_property_alphabetic: Properties as functions.
5849 * uc_is_property_ascii_hex_digit: Properties as functions.
5851 * uc_is_property_bidi_arabic_digit: Properties as functions.
5853 * uc_is_property_bidi_arabic_right_to_left: Properties as functions.
5855 * uc_is_property_bidi_block_separator: Properties as functions.
5857 * uc_is_property_bidi_boundary_neutral: Properties as functions.
5859 * uc_is_property_bidi_common_separator: Properties as functions.
5861 * uc_is_property_bidi_control: Properties as functions.
5863 * uc_is_property_bidi_embedding_or_override: Properties as functions.
5865 * uc_is_property_bidi_eur_num_separator: Properties as functions.
5867 * uc_is_property_bidi_eur_num_terminator: Properties as functions.
5869 * uc_is_property_bidi_european_digit: Properties as functions.
5871 * uc_is_property_bidi_hebrew_right_to_left: Properties as functions.
5873 * uc_is_property_bidi_left_to_right: Properties as functions.
5875 * uc_is_property_bidi_non_spacing_mark: Properties as functions.
5877 * uc_is_property_bidi_other_neutral: Properties as functions.
5879 * uc_is_property_bidi_pdf: Properties as functions.
5881 * uc_is_property_bidi_segment_separator: Properties as functions.
5883 * uc_is_property_bidi_whitespace: Properties as functions.
5885 * uc_is_property_combining: Properties as functions.
5887 * uc_is_property_composite: Properties as functions.
5889 * uc_is_property_currency_symbol: Properties as functions.
5891 * uc_is_property_dash: Properties as functions.
5893 * uc_is_property_decimal_digit: Properties as functions.
5895 * uc_is_property_default_ignorable_code_point: Properties as functions.
5897 * uc_is_property_deprecated: Properties as functions.
5899 * uc_is_property_diacritic: Properties as functions.
5901 * uc_is_property_extender: Properties as functions.
5903 * uc_is_property_format_control: Properties as functions.
5905 * uc_is_property_grapheme_base: Properties as functions.
5907 * uc_is_property_grapheme_extend: Properties as functions.
5909 * uc_is_property_grapheme_link: Properties as functions.
5911 * uc_is_property_hex_digit: Properties as functions.
5913 * uc_is_property_hyphen: Properties as functions.
5915 * uc_is_property_id_continue: Properties as functions.
5917 * uc_is_property_id_start: Properties as functions.
5919 * uc_is_property_ideographic: Properties as functions.
5921 * uc_is_property_ids_binary_operator: Properties as functions.
5923 * uc_is_property_ids_trinary_operator: Properties as functions.
5925 * uc_is_property_ignorable_control: Properties as functions.
5927 * uc_is_property_iso_control: Properties as functions.
5929 * uc_is_property_join_control: Properties as functions.
5931 * uc_is_property_left_of_pair: Properties as functions.
5933 * uc_is_property_line_separator: Properties as functions.
5935 * uc_is_property_logical_order_exception: Properties as functions.
5937 * uc_is_property_lowercase: Properties as functions.
5939 * uc_is_property_math: Properties as functions.
5941 * uc_is_property_non_break: Properties as functions.
5943 * uc_is_property_not_a_character: Properties as functions.
5945 * uc_is_property_numeric: Properties as functions.
5947 * uc_is_property_other_alphabetic: Properties as functions.
5949 * uc_is_property_other_default_ignorable_code_point: Properties as functions.
5951 * uc_is_property_other_grapheme_extend: Properties as functions.
5953 * uc_is_property_other_id_continue: Properties as functions.
5955 * uc_is_property_other_id_start: Properties as functions.
5957 * uc_is_property_other_lowercase: Properties as functions.
5959 * uc_is_property_other_math: Properties as functions.
5961 * uc_is_property_other_uppercase: Properties as functions.
5963 * uc_is_property_paired_punctuation: Properties as functions.
5965 * uc_is_property_paragraph_separator: Properties as functions.
5967 * uc_is_property_pattern_syntax: Properties as functions.
5969 * uc_is_property_pattern_white_space: Properties as functions.
5971 * uc_is_property_private_use: Properties as functions.
5973 * uc_is_property_punctuation: Properties as functions.
5975 * uc_is_property_quotation_mark: Properties as functions.
5977 * uc_is_property_radical: Properties as functions.
5979 * uc_is_property_sentence_terminal: Properties as functions.
5981 * uc_is_property_soft_dotted: Properties as functions.
5983 * uc_is_property_space: Properties as functions.
5985 * uc_is_property_terminal_punctuation: Properties as functions.
5987 * uc_is_property_titlecase: Properties as functions.
5989 * uc_is_property_unassigned_code_value: Properties as functions.
5991 * uc_is_property_unified_ideograph: Properties as functions.
5993 * uc_is_property_uppercase: Properties as functions.
5995 * uc_is_property_variation_selector: Properties as functions.
5997 * uc_is_property_white_space: Properties as functions.
5999 * uc_is_property_xid_continue: Properties as functions.
6001 * uc_is_property_xid_start: Properties as functions.
6003 * uc_is_property_zero_width: Properties as functions.
6005 * uc_is_punct: Classifications like in ISO C.
6007 * uc_is_script: Scripts. (line 31)
6008 * uc_is_space: Classifications like in ISO C.
6010 * uc_is_upper: Classifications like in ISO C.
6012 * uc_is_xdigit: Classifications like in ISO C.
6014 * uc_java_ident_category: ISO C and Java syntax.
6016 * uc_locale_language: Case mappings of strings.
6018 * uc_mirror_char: Mirrored character. (line 14)
6019 * uc_numeric_value: Numeric value. (line 23)
6020 * uc_property_byname: Properties as objects.
6022 * uc_property_is_valid: Properties as objects.
6024 * uc_property_t: Properties as objects.
6026 * uc_script: Scripts. (line 20)
6027 * uc_script_byname: Scripts. (line 25)
6028 * uc_script_t: Scripts. (line 11)
6029 * uc_tolower: Case mappings of characters.
6031 * uc_totitle: Case mappings of characters.
6033 * uc_toupper: Case mappings of characters.
6035 * uc_width: uniwidth.h. (line 23)
6036 * uc_wordbreak_property: Word break property. (line 32)
6037 * UCS-4: Unicode. (line 14)
6038 * ucs4_t: unitypes.h. (line 16)
6039 * uint16_t: unitypes.h. (line 10)
6040 * uint32_t: unitypes.h. (line 11)
6041 * uint8_t: unitypes.h. (line 9)
6042 * ulc_asnprintf: unistdio.h. (line 53)
6043 * ulc_asprintf: unistdio.h. (line 50)
6044 * ulc_casecmp: Case insensitive comparison.
6046 * ulc_casecoll: Case insensitive comparison.
6048 * ulc_casexfrm: Case insensitive comparison.
6050 * ulc_fprintf: unistdio.h. (line 229)
6051 * ulc_possible_linebreaks: unilbrk.h. (line 50)
6052 * ulc_snprintf: unistdio.h. (line 48)
6053 * ulc_sprintf: unistdio.h. (line 45)
6054 * ulc_vasnprintf: unistdio.h. (line 65)
6055 * ulc_vasprintf: unistdio.h. (line 62)
6056 * ulc_vfprintf: unistdio.h. (line 232)
6057 * ulc_vsnprintf: unistdio.h. (line 59)
6058 * ulc_vsprintf: unistdio.h. (line 56)
6059 * ulc_width_linebreaks: unilbrk.h. (line 71)
6060 * ulc_wordbreaks: Word breaks in a string.
6062 * Unicode: Unicode. (line 6)
6063 * Unicode character, bidirectional category: Bidirectional category.
6065 * Unicode character, block: Blocks. (line 24)
6066 * Unicode character, canonical combining class: Canonical combining class.
6068 * Unicode character, case mappings: Case mappings of characters.
6070 * Unicode character, classification: General category. (line 6)
6071 * Unicode character, classification like in C: Classifications like in ISO C.
6073 * Unicode character, general category: General category. (line 6)
6074 * Unicode character, mirroring: Mirrored character. (line 6)
6075 * Unicode character, name: uniname.h. (line 6)
6076 * Unicode character, properties: Properties. (line 6)
6077 * Unicode character, script: Scripts. (line 17)
6078 * Unicode character, validity in C identifiers: ISO C and Java syntax.
6080 * Unicode character, validity in Java identifiers: ISO C and Java syntax.
6082 * Unicode character, value <1>: Numeric value. (line 6)
6083 * Unicode character, value <2>: Digit value. (line 6)
6084 * Unicode character, value: Decimal digit value. (line 6)
6085 * Unicode character, width: uniwidth.h. (line 22)
6086 * unicode_character_name: uniname.h. (line 19)
6087 * unicode_name_character: uniname.h. (line 25)
6088 * uninorm_decomposing_form: Normalization of strings.
6090 * uninorm_filter_create: Normalization of streams.
6092 * uninorm_filter_flush: Normalization of streams.
6094 * uninorm_filter_free: Normalization of streams.
6096 * uninorm_filter_write: Normalization of streams.
6098 * uninorm_is_compat_decomposing: Normalization of strings.
6100 * uninorm_is_composing: Normalization of strings.
6102 * uninorm_t: Normalization of strings.
6104 * uppercasing: Case mappings of strings.
6106 * use cases: Introduction. (line 44)
6107 * UTF-16: Unicode. (line 14)
6108 * UTF-16, strings: Unicode strings. (line 6)
6109 * UTF-32: Unicode. (line 14)
6110 * UTF-32, strings: Unicode strings. (line 6)
6111 * UTF-8: Unicode. (line 14)
6112 * UTF-8, strings: Unicode strings. (line 6)
6113 * validity: Elementary string checks.
6115 * value, of libunistring: Introduction. (line 44)
6116 * value, of Unicode character <1>: Numeric value. (line 6)
6117 * value, of Unicode character <2>: Digit value. (line 6)
6118 * value, of Unicode character: Decimal digit value. (line 6)
6119 * verification: Elementary string checks.
6121 * wchar_t, type: The wchar_t mess. (line 6)
6122 * width: uniwidth.h. (line 6)
6123 * word breaks: uniwbrk.h. (line 6)
6124 * wrapping: unilbrk.h. (line 6)
6130 Node: Introduction
\7f3239
6131 Node: Unicode
\7f5236
6132 Node: Unicode and i18n
\7f7116
6133 Node: Locale encodings
\7f8579
6134 Node: In-memory representation
\7f10787
6135 Node: char * strings
\7f11896
6136 Node: The wchar_t mess
\7f17153
6137 Node: Unicode strings
\7f19357
6138 Node: Conventions
\7f20508
6139 Node: unitypes.h
\7f22708
6140 Node: unistr.h
\7f23280
6141 Node: Elementary string checks
\7f23837
6142 Node: Elementary string conversions
\7f24459
6143 Node: Elementary string functions
\7f25761
6144 Node: Elementary string functions with memory allocation
\7f32732
6145 Node: Elementary string functions on NUL terminated strings
\7f33354
6146 Node: uniconv.h
\7f45258
6147 Node: unistdio.h
\7f52969
6148 Node: uniname.h
\7f61172
6149 Node: unictype.h
\7f62505
6150 Node: General category
\7f63414
6151 Node: Object oriented API
\7f64457
6152 Node: Bit mask API
\7f72919
6153 Node: Canonical combining class
\7f75173
6154 Node: Bidirectional category
\7f78387
6155 Node: Decimal digit value
\7f81444
6156 Node: Digit value
\7f82005
6157 Node: Numeric value
\7f82566
6158 Node: Mirrored character
\7f83457
6159 Node: Properties
\7f84130
6160 Node: Properties as objects
\7f84821
6161 Node: Properties as functions
\7f91199
6162 Node: Scripts
\7f96750
6163 Node: Blocks
\7f98136
6164 Node: ISO C and Java syntax
\7f99459
6165 Node: Classifications like in ISO C
\7f101169
6166 Node: uniwidth.h
\7f103873
6167 Node: uniwbrk.h
\7f105910
6168 Node: Word breaks in a string
\7f106437
6169 Node: Word break property
\7f107488
6170 Node: unilbrk.h
\7f108584
6171 Node: uninorm.h
\7f112755
6172 Node: Decomposition of characters
\7f113387
6173 Node: Composition of characters
\7f116763
6174 Node: Normalization of strings
\7f117472
6175 Node: Normalizing comparisons
\7f119534
6176 Node: Normalization of streams
\7f121890
6177 Node: unicase.h
\7f123978
6178 Node: Case mappings of characters
\7f124663
6179 Node: Case mappings of strings
\7f126710
6180 Node: Case mappings of substrings
\7f130043
6181 Node: Case insensitive comparison
\7f136973
6182 Node: Case detection
\7f142324
6183 Node: uniregex.h
\7f145592
6184 Node: Using the library
\7f145815
6185 Node: Installation
\7f146226
6186 Node: Compiler options
\7f146699
6187 Node: Include files
\7f148258
6188 Node: Autoconf macro
\7f149482
6189 Node: Reporting problems
\7f151040
6190 Node: More functionality
\7f151837
6191 Node: Licenses
\7f152280
6192 Node: GNU GPL
\7f153915
6193 Node: GNU LGPL
\7f191460
6194 Node: GNU FDL
\7f199906
6195 Node: Index
\7f225031