1 This is libunistring.info, produced by makeinfo version 5.1 from
4 INFO-DIR-SECTION Software development
6 * GNU libunistring: (libunistring). Unicode string library.
9 This manual is for GNU libunistring.
12 File: libunistring.info, Node: Top, Next: Introduction, Up: (dir)
19 * Introduction:: Who may need Unicode strings?
20 * Conventions:: Conventions used in this manual
21 * unitypes.h:: Elementary types
22 * unistr.h:: Elementary Unicode string functions
23 * uniconv.h:: Conversions between Unicode and encodings
24 * unistdio.h:: Output with Unicode strings
25 * uniname.h:: Names of Unicode characters
26 * unictype.h:: Unicode character classification and properties
27 * uniwidth.h:: Display width
28 * unigbrk.h:: Grapheme cluster breaking
29 * uniwbrk.h:: Word breaks in strings
30 * unilbrk.h:: Line breaking
31 * uninorm.h:: Normalization forms
32 * unicase.h:: Case mappings
33 * uniregex.h:: Regular expressions
34 * Using the library:: How to link with the library and use it?
35 * More functionality:: More advanced functionality
38 * Index:: General Index
40 — The Detailed Node Listing —
44 * Unicode:: What is Unicode?
45 * Unicode and i18n:: Unicode and internationalization
46 * Locale encodings:: What is a locale encoding?
47 * In-memory representation:: How to represent strings in memory?
48 * char * strings:: What to keep in mind with ‘char *’ strings
49 * The wchar_t mess:: Why ‘wchar_t *’ strings are useless
50 * Unicode strings:: How are Unicode strings represented?
54 * Elementary string checks::
55 * Elementary string conversions::
56 * Elementary string functions::
57 * Elementary string functions with memory allocation::
58 * Elementary string functions on NUL terminated strings::
63 * Canonical combining class::
65 * Decimal digit value::
68 * Mirrored character::
73 * ISO C and Java syntax::
74 * Classifications like in ISO C::
78 * Object oriented API::
83 * Properties as objects::
84 * Properties as functions::
88 * Grapheme cluster breaks in a string::
89 * Grapheme cluster break property::
93 * Word breaks in a string::
94 * Word break property::
98 * Decomposition of characters::
99 * Composition of characters::
100 * Normalization of strings::
101 * Normalizing comparisons::
102 * Normalization of streams::
106 * Case mappings of characters::
107 * Case mappings of strings::
108 * Case mappings of substrings::
109 * Case insensitive comparison::
118 * Reporting problems::
122 * GNU GPL:: GNU General Public License
123 * GNU LGPL:: GNU Lesser General Public License
124 * GNU FDL:: GNU Free Documentation License
128 File: libunistring.info, Node: Introduction, Next: Conventions, Prev: Top, Up: Top
133 This library provides functions for manipulating Unicode strings and
134 for manipulating C strings according to the Unicode standard.
136 It consists of the following parts:
139 elementary string functions
141 conversion from/to legacy encodings
143 formatted output to strings
147 character classification and properties
149 string width when using nonproportional fonts
151 grapheme cluster breaks
155 line breaking algorithm
157 normalization (composition and decomposition)
161 regular expressions (not yet implemented)
163 libunistring is for you if your application involves non-trivial text
164 processing, such as upper/lower case conversions, line breaking,
165 operations on words, or more advanced analysis of text. Text provided
166 by the user can, in general, contain characters of all kinds of scripts.
167 The text processing functions provided by this library handle all
168 scripts and all languages.
170 libunistring is for you if your application already uses the ISO C /
171 POSIX ‘<ctype.h>’, ‘<wctype.h>’ functions and the text it operates on is
172 provided by the user and can be in any language.
174 libunistring is also for you if your application uses Unicode strings
175 as internal in-memory representation.
179 * Unicode:: What is Unicode?
180 * Unicode and i18n:: Unicode and internationalization
181 * Locale encodings:: What is a locale encoding?
182 * In-memory representation:: How to represent strings in memory?
183 * char * strings:: What to keep in mind with ‘char *’ strings
184 * The wchar_t mess:: Why ‘wchar_t *’ strings are useless
185 * Unicode strings:: How are Unicode strings represented?
188 File: libunistring.info, Node: Unicode, Next: Unicode and i18n, Up: Introduction
193 Unicode is a standardized repertoire of characters that contains
194 characters from all scripts of the world, from Latin letters to Chinese
195 ideographs and Babylonian cuneiform glyphs. It also specifies how these
196 characters are to be rendered on a screen or on paper, and how common
197 text processing (word selection, line breaking, uppercasing of page
198 titles etc.) is supposed to behave on Unicode text.
200 Unicode also specifies three ways of storing sequences of Unicode
201 characters in a computer whose basic unit of data is an 8-bit byte:
203 Every character is represented as 1 to 4 bytes.
205 Every character is represented as 1 to 2 units of 16 bits.
207 Every character is represented as 1 unit of 32 bits.
209 For encoding Unicode text in a file, UTF-8 is usually used. For
210 encoding Unicode strings in memory for a program, either of the three
211 encoding forms can be reasonably used.
213 Unicode is widely used on the web. Prior to the use of Unicode, web
214 pages were in many different encodings (ISO-8859-1 for English, French,
215 Spanish, ISO-8859-2 for Polish, ISO-8859-7 for Greek, KOI8-R for
216 Russian, GB2312 or BIG5 for Chinese, ISO-2022-JP-2 or EUC-JP or
217 Shift_JIS for Japanese, and many many others). It was next to
218 impossible to create a document that contained Chinese and Polish text
219 in the same document. Due to the many encodings for Japanese, even the
220 processing of pure Japanese text was error prone.
223 • The Unicode standard: <http://www.unicode.org/>
224 • Definition of UTF-8: <http://www.rfc-editor.org/rfc/rfc3629.txt>
225 • Definition of UTF-16: <http://www.rfc-editor.org/rfc/rfc2781.txt>
226 • Markus Kuhn’s UTF-8 and Unicode FAQ:
227 <http://www.cl.cam.ac.uk/~mgk25/unicode.html>
230 File: libunistring.info, Node: Unicode and i18n, Next: Locale encodings, Prev: Unicode, Up: Introduction
232 1.2 Unicode and Internationalization
233 ====================================
235 Internationalization is the process of changing the source code of a
236 program so that it can meet the expectations of users in any culture, if
237 culture specific data (translations, images etc.) are provided.
239 Use of Unicode is not strictly required for internationalization, but
240 it makes internationalization much easier, because operations that need
241 to look at specific characters (like hyphenation, spell checking, or the
242 automatic conversion of double-quotes to opening and closing
243 double-quote characters) don’t need to consider multiple possible
244 encodings of the text.
246 Use of Unicode also enables multilingualization: the ability of
247 having text in multiple languages present in the same document or even
248 in the same line of text.
250 But use of Unicode is not everything. Internationalization usually
251 consists of three features:
252 • Use of Unicode where needed for text processing. This is what this
254 • Use of message catalogs for messages shown to the user, This is
255 what GNU gettext is about.
256 • Use of locale specific conventions for date and time formats, for
257 numeric formatting, or for sorting of text. This can be done
258 adequately with the POSIX APIs and the implementation of locales in
262 File: libunistring.info, Node: Locale encodings, Next: In-memory representation, Prev: Unicode and i18n, Up: Introduction
267 A locale is a set of cultural conventions. According to POSIX, for a
268 program, at any moment, there is one locale being designated as the
269 “current locale”. (Actually, POSIX supports also one locale per thread,
270 but this feature is not yet universally implemented and not widely
271 used.) The locale is partitioned into several aspects, called the
272 “categories” of the locale. The main various aspects are:
273 • The character encoding and the character properties. This is the
275 • The sorting rules for text. This is the ‘LC_COLLATE’ category.
276 • The language specific translations of messages. This is the
277 ‘LC_MESSAGES’ category.
278 • The formatting rules for numbers, such as the decimal separator.
279 This is the ‘LC_NUMERIC’ category.
280 • The formatting rules for amounts of money. This is the
281 ‘LC_MONETARY’ category.
282 • The formatting of date and time. This is the ‘LC_TIME’ category.
284 In particular, the ‘LC_CTYPE’ category of the current locale
285 determines the character encoding. This is the encoding of ‘char *’
286 strings. We also call it the “locale encoding”. GNU libunistring has a
287 function, ‘locale_charset’, that returns a standardized (platform
288 independent) name for this encoding.
290 All locale encodings used on glibc systems are essentially ASCII
291 compatible: Most graphic ASCII characters have the same representation,
292 as a single byte, in that encoding as in ASCII.
294 Among the possible locale encodings are UTF-8 and GB18030. Both
295 allow to represent any Unicode character as a sequence of bytes. UTF-8
296 is used in most of the world, whereas GB18030 is used in the People’s
297 Republic of China, because it is backward compatible with the GB2312
298 encoding that was used in this country earlier.
300 The legacy locale encodings, ISO-8859-15 (which supplanted ISO-8859-1
301 in most of Europe), ISO-8859-2, KOI8-R, EUC-JP, etc., are still in use
302 in many places, though.
304 UTF-16 and UTF-32 are not used as locale encodings, because they are
305 not ASCII compatible.
308 File: libunistring.info, Node: In-memory representation, Next: char * strings, Prev: Locale encodings, Up: Introduction
310 1.4 Choice of in-memory representation of strings
311 =================================================
313 There are three ways of representing strings in memory of a running
315 • As ‘char *’ strings. Such strings are represented in locale
316 encoding. This approach is employed when not much text processing
317 is done by the program. When some Unicode aware processing is to
318 be done, a string is converted to Unicode on the fly and back to
319 locale encoding afterwards.
320 • As UTF-8 or UTF-16 or UTF-32 strings. This implies that conversion
321 from locale encoding to Unicode is performed on input, and in the
322 opposite direction on output. This approach is employed when the
323 program does a significant amount of text processing, or when the
324 program has multiple threads operating on the same data but in
326 • As ‘wchar_t *’, a.k.a. “wide strings”. This approach is
327 misguided, see *note The wchar_t mess::.
330 File: libunistring.info, Node: char * strings, Next: The wchar_t mess, Prev: In-memory representation, Up: Introduction
335 The classical C strings, with its C library support standardized by
336 ISO C and POSIX, can be used in internationalized programs with some
337 precautions. The problem with this API is that many of the C library
338 functions for strings don’t work correctly on strings in locale
339 encodings, leading to bugs that only people in some cultures of the
340 world will experience.
342 The first problem with the C library API is the support of multibyte
343 locales. According to the locale encoding, in general, every character
344 is represented by one or more bytes (up to 4 bytes in practice — but use
345 ‘MB_LEN_MAX’ instead of the number 4 in the code). When every character
346 is represented by only 1 byte, we speak of an “unibyte locale”,
347 otherwise of a “multibyte locale”. It is important to realize that the
348 majority of Unix installations nowadays use UTF-8 or GB18030 as locale
349 encoding; therefore, the majority of users are using multibyte locales.
351 The important fact to remember is:
352 _A ‘char’ is a byte, not a character._
355 • The ‘<ctype.h>’ API is useless in this context; it does not work in
357 • The ‘strlen’ function does not return the number of characters in a
358 string. Nor does it return the number of screen columns occupied
359 by a string after it is output. It merely returns the number of
360 _bytes_ occupied by a string.
361 • Truncating a string, for example, with ‘strncpy’, can have the
362 effect of truncating it in the middle of a multibyte character.
363 Such a string will, when output, have a garbled character at its
364 end, often represented by a hollow box.
365 • ‘strchr’ and ‘strrchr’ do not work with multibyte strings if the
366 locale encoding is GB18030 and the character to be searched is a
368 • ‘strstr’ does not work with multibyte strings if the locale
369 encoding is different from UTF-8.
370 • ‘strcspn’, ‘strpbrk’, ‘strspn’ cannot work correctly in multibyte
371 locales: they assume the second argument is a list of single-byte
372 characters. Even in this simple case, they do not work with
373 multibyte strings if the locale encoding is GB18030 and one of the
374 characters to be searched is a digit.
375 • ‘strsep’ and ‘strtok_r’ do not work with multibyte strings unless
376 all of the delimiter characters are ASCII characters < 0x30.
377 • The ‘strcasecmp’, ‘strncasecmp’, and ‘strcasestr’ functions do not
378 work with multibyte strings.
380 The workarounds can be found in GNU gnulib
381 <http://www.gnu.org/software/gnulib/>.
382 • gnulib has modules ‘mbchar’, ‘mbiter’, ‘mbuiter’ that represent
383 multibyte characters and allow to iterate across a multibyte string
384 with the same ease as through a unibyte string.
385 • gnulib has functions ‘mbslen’ and ‘mbswidth’ that can be used
386 instead of ‘strlen’ when the number of characters or the number of
387 screen columns of a string is requested.
388 • gnulib has functions ‘mbschr’ and ‘mbsrrchr’ that are like ‘strchr’
389 and ‘strrchr’, but work in multibyte locales.
390 • gnulib has a function ‘mbsstr’, like ‘strstr’, but works in
392 • gnulib has functions ‘mbscspn’, ‘mbspbrk’, ‘mbsspn’ that are like
393 ‘strcspn’, ‘strpbrk’, ‘strspn’, but work in multibyte locales.
394 • gnulib has functions ‘mbssep’ and ‘mbstok_r’ that are like ‘strsep’
395 and ‘strtok_r’ but work in multibyte locales.
396 • gnulib has functions ‘mbscasecmp’, ‘mbsncasecmp’, ‘mbspcasecmp’,
397 and ‘mbscasestr’ that are like ‘strcasecmp’, ‘strncasecmp’, and
398 ‘strcasestr’, but work in multibyte locales. Still, the function
399 ‘ulc_casecmp’ is preferable to these functions; see below.
401 The second problem with the C library API is that it has some
402 assumptions built-in that are not valid in some languages:
403 • It assumes that there are only two forms of every character:
404 uppercase and lowercase. This is not true for Croatian, where the
405 character LETTER DZ WITH CARON comes in three forms: LATIN CAPITAL
406 LETTER DZ WITH CARON (DZ), LATIN CAPITAL LETTER D WITH SMALL LETTER
407 Z WITH CARON (Dz), LATIN SMALL LETTER DZ WITH CARON (dz).
408 • It assumes that uppercasing of 1 character leads to 1 character.
409 This is not true for German, where the LATIN SMALL LETTER SHARP S,
410 when uppercased, becomes ‘SS’.
411 • It assumes that there is 1:1 mapping between uppercase and
412 lowercase forms. This is not true for the Greek sigma: GREEK
413 CAPITAL LETTER SIGMA is the uppercase of both GREEK SMALL LETTER
414 SIGMA and GREEK SMALL LETTER FINAL SIGMA.
415 • It assumes that the upper/lowercase mappings are position
416 independent. This is not true for the Greek sigma and the
419 The correct way to deal with this problem is
420 1. to provide functions for titlecasing, as well as for upper- and
422 2. to view case transformations as functions that operates on strings,
423 rather than on characters.
425 This is implemented in this library, through the functions declared
426 in ‘<unicase.h>’, see *note unicase.h::.
429 File: libunistring.info, Node: The wchar_t mess, Next: Unicode strings, Prev: char * strings, Up: Introduction
431 1.6 The ‘wchar_t’ mess
432 ======================
434 The ISO C and POSIX standard creators made an attempt to fix the
435 first problem mentioned in the previous section. They introduced
436 • a type ‘wchar_t’, designed to encapsulate an entire character,
437 • a “wide string” type ‘wchar_t *’, and
438 • functions declared in ‘<wctype.h>’ that were meant to supplant the
441 Unfortunately, this API and its implementation has numerous problems:
443 • On AIX and Windows platforms, ‘wchar_t’ is a 16-bit type. This
444 means that it can never accommodate an entire Unicode character.
445 Either the ‘wchar_t *’ strings are limited to characters in UCS-2
446 (the “Basic Multilingual Plane” of Unicode), or — if ‘wchar_t *’
447 strings are encoded in UTF-16 — a ‘wchar_t’ represents only half of
448 a character in the worst case, making the ‘<wctype.h>’ functions
451 • On Solaris and FreeBSD, the ‘wchar_t’ encoding is locale dependent
452 and undocumented. This means, if you want to know any property of
453 a ‘wchar_t’ character, other than the properties defined by
454 ‘<wctype.h>’ — such as whether it’s a dash, currency symbol,
455 paragraph separator, or similar —, you have to convert it to ‘char
456 *’ encoding first, by use of the function ‘wctomb’.
458 • When you read a stream of wide characters, through the functions
459 ‘fgetwc’ and ‘fgetws’, and when the input stream/file is not in the
460 expected encoding, you have no way to determine the invalid byte
461 sequence and do some corrective action. If you use these
462 functions, your program becomes “garbage in - more garbage out” or
463 “garbage in - abort”.
465 As a consequence, it is better to use multibyte strings, as explained
466 in the previous section. Such multibyte strings can bypass limitations
467 of the ‘wchar_t’ type, if you use functions defined in gnulib and
468 libunistring for text processing. They can also faithfully transport
469 malformed characters that were present in the input, without requiring
470 the program to produce garbage or abort.
473 File: libunistring.info, Node: Unicode strings, Prev: The wchar_t mess, Up: Introduction
478 libunistring supports Unicode strings in three representations:
479 • UTF-8 strings, through the type ‘uint8_t *’. The units are bytes
481 • UTF-16 strings, through the type ‘uint16_t *’, The units are 16-bit
482 memory words (‘uint16_t’).
483 • UTF-32 strings, through the type ‘uint32_t *’. The units are
484 32-bit memory words (‘uint32_t’).
486 As with C strings, there are two variants:
487 • Unicode strings with a terminating NUL character are represented as
488 a pointer to the first unit of the string. There is a unit
489 containing a 0 value at the end. It is considered part of the
490 string for all memory allocation purposes, but is not considered
491 part of the string for all other logical purposes.
492 • Unicode strings where embedded NUL characters are allowed. These
493 are represented by a pointer to the first unit and the number of
494 units (not bytes!) of the string. In this setting, there is no
495 trailing zero-valued unit used as “end marker”.
498 File: libunistring.info, Node: Conventions, Next: unitypes.h, Prev: Introduction, Up: Top
503 This chapter explains conventions valid throughout the libunistring
506 Variables of type ‘char *’ denote C strings in locale encoding. See
507 *note Locale encodings::.
509 Variables of type ‘uint8_t *’ denote UTF-8 strings. Their units are
512 Variables of type ‘uint16_t *’ denote UTF-16 strings, without byte
513 order mark. Their units are 2-byte words.
515 Variables of type ‘uint32_t *’ denote UTF-32 strings, without byte
516 order mark. Their units are 4-byte words.
518 Argument pairs ‘(S, N)’ denote a string ‘S[0..N-1]’ with exactly N
521 All functions with prefix ‘ulc_’ operate on C strings in locale
524 All functions with prefix ‘u8_’ operate on UTF-8 strings.
526 All functions with prefix ‘u16_’ operate on UTF-16 strings.
528 All functions with prefix ‘u32_’ operate on UTF-32 strings.
530 For every function with prefix ‘u8_’, operating on UTF-8 strings,
531 there is also a corresponding function with prefix ‘u16_’, operating on
532 UTF-16 strings, and a corresponding function with prefix ‘u32_’,
533 operating on UTF-32 strings. Their description is analogous; in this
534 documentation we describe only the function that operates on UTF-8
535 strings, for brevity.
537 A declaration with a variable N denotes the three concrete
538 declarations with N = 8, N = 16, N = 32.
540 All parameters starting with ‘str’ and the parameters of functions
541 starting with ‘u8_str’/‘u16_str’/‘u32_str’ denote a NUL terminated
544 Error values are always returned through the ‘errno’ variable,
545 usually with a return value that indicates the presence of an error
546 (NULL for functions that return an pointer, or -1 for functions that
549 Functions returning a string result take a ‘(RESULTBUF, LENGTHP)’
550 argument pair. If RESULTBUF is not NULL and the result fits into
551 ‘*LENGTHP’ units, it is put in RESULTBUF, and RESULTBUF is returned.
552 Otherwise, a freshly allocated string is returned. In both cases,
553 ‘*LENGTHP’ is set to the length (number of units) of the returned
554 string. In case of error, NULL is returned and ‘errno’ is set.
557 File: libunistring.info, Node: unitypes.h, Next: unistr.h, Prev: Conventions, Up: Top
559 3 Elementary types ‘<unitypes.h>’
560 *********************************
562 The include file ‘<unitypes.h>’ provides the following basic types.
567 These are the storage units of UTF-8/16/32 strings, respectively.
568 The definitions are taken from ‘<stdint.h>’, on platforms where
569 this include file is present.
572 This type represents a single Unicode character, outside of an
576 File: libunistring.info, Node: unistr.h, Next: uniconv.h, Prev: unitypes.h, Up: Top
578 4 Elementary Unicode string functions ‘<unistr.h>’
579 **************************************************
581 This include file declares elementary functions for Unicode strings.
582 It is essentially the equivalent of what ‘<string.h>’ is for C strings.
586 * Elementary string checks::
587 * Elementary string conversions::
588 * Elementary string functions::
589 * Elementary string functions with memory allocation::
590 * Elementary string functions on NUL terminated strings::
593 File: libunistring.info, Node: Elementary string checks, Next: Elementary string conversions, Up: unistr.h
595 4.1 Elementary string checks
596 ============================
598 The following function is available to verify the integrity of a
601 -- Function: const uint8_t * u8_check (const uint8_t *S, size_t N)
602 -- Function: const uint16_t * u16_check (const uint16_t *S, size_t N)
603 -- Function: const uint32_t * u32_check (const uint32_t *S, size_t N)
604 This function checks whether a Unicode string is well-formed. It
605 returns NULL if valid, or a pointer to the first invalid unit
609 File: libunistring.info, Node: Elementary string conversions, Next: Elementary string functions, Prev: Elementary string checks, Up: unistr.h
611 4.2 Elementary string conversions
612 =================================
614 The following functions perform conversions between the different
615 forms of Unicode strings.
617 -- Function: uint16_t * u8_to_u16 (const uint8_t *S, size_t N, uint16_t
618 *RESULTBUF, size_t *LENGTHP)
619 Converts an UTF-8 string to an UTF-16 string.
621 -- Function: uint32_t * u8_to_u32 (const uint8_t *S, size_t N, uint32_t
622 *RESULTBUF, size_t *LENGTHP)
623 Converts an UTF-8 string to an UTF-32 string.
625 -- Function: uint8_t * u16_to_u8 (const uint16_t *S, size_t N, uint8_t
626 *RESULTBUF, size_t *LENGTHP)
627 Converts an UTF-16 string to an UTF-8 string.
629 -- Function: uint32_t * u16_to_u32 (const uint16_t *S, size_t N,
630 uint32_t *RESULTBUF, size_t *LENGTHP)
631 Converts an UTF-16 string to an UTF-32 string.
633 -- Function: uint8_t * u32_to_u8 (const uint32_t *S, size_t N, uint8_t
634 *RESULTBUF, size_t *LENGTHP)
635 Converts an UTF-32 string to an UTF-8 string.
637 -- Function: uint16_t * u32_to_u16 (const uint32_t *S, size_t N,
638 uint16_t *RESULTBUF, size_t *LENGTHP)
639 Converts an UTF-32 string to an UTF-16 string.
642 File: libunistring.info, Node: Elementary string functions, Next: Elementary string functions with memory allocation, Prev: Elementary string conversions, Up: unistr.h
644 4.3 Elementary string functions
645 ===============================
647 The following functions inspect and return details about the first
648 character in a Unicode string.
650 -- Function: int u8_mblen (const uint8_t *S, size_t N)
651 -- Function: int u16_mblen (const uint16_t *S, size_t N)
652 -- Function: int u32_mblen (const uint32_t *S, size_t N)
653 Returns the length (number of units) of the first character in S,
654 which is no longer than N. Returns 0 if it is the NUL character.
655 Returns -1 upon failure.
657 This function is similar to ‘mblen’, except that it operates on a
658 Unicode string and that S must not be NULL.
660 -- Function: int u8_mbtouc_unsafe (ucs4_t *PUC, const uint8_t *S,
662 -- Function: int u16_mbtouc_unsafe (ucs4_t *PUC, const uint16_t *S,
664 -- Function: int u32_mbtouc_unsafe (ucs4_t *PUC, const uint32_t *S,
666 Returns the length (number of units) of the first character in S,
667 putting its ‘ucs4_t’ representation in ‘*PUC’. Upon failure,
668 ‘*PUC’ is set to ‘0xfffd’, and an appropriate number of units is
671 The number of available units, N, must be > 0.
673 This function is similar to ‘mbtowc’, except that it operates on a
674 Unicode string, PUC and S must not be NULL, N must be > 0, and the
675 NUL character is not treated specially.
677 -- Function: int u8_mbtouc (ucs4_t *PUC, const uint8_t *S, size_t N)
678 -- Function: int u16_mbtouc (ucs4_t *PUC, const uint16_t *S, size_t N)
679 -- Function: int u32_mbtouc (ucs4_t *PUC, const uint32_t *S, size_t N)
680 This function is like ‘u8_mbtouc_unsafe’, except that it will
681 detect an invalid UTF-8 character, even if the library is compiled
682 without ‘--enable-safety’.
684 -- Function: int u8_mbtoucr (ucs4_t *PUC, const uint8_t *S, size_t N)
685 -- Function: int u16_mbtoucr (ucs4_t *PUC, const uint16_t *S, size_t N)
686 -- Function: int u32_mbtoucr (ucs4_t *PUC, const uint32_t *S, size_t N)
687 Returns the length (number of units) of the first character in S,
688 putting its ‘ucs4_t’ representation in ‘*PUC’. Upon failure,
689 ‘*PUC’ is set to ‘0xfffd’, and -1 is returned for an invalid
690 sequence of units, -2 is returned for an incomplete sequence of
693 The number of available units, N, must be > 0.
695 This function is similar to ‘u8_mbtouc’, except that the return
696 value gives more details about the failure, similar to ‘mbrtowc’.
698 The following function stores a Unicode character as a Unicode string
701 -- Function: int u8_uctomb (uint8_t *S, ucs4_t UC, int N)
702 -- Function: int u16_uctomb (uint16_t *S, ucs4_t UC, int N)
703 -- Function: int u32_uctomb (uint32_t *S, ucs4_t UC, int N)
704 Puts the multibyte character represented by UC in S, returning its
705 length. Returns -1 upon failure, -2 if the number of available
706 units, N, is too small. The latter case cannot occur if N >=
709 This function is similar to ‘wctomb’, except that it operates on a
710 Unicode strings, S must not be NULL, and the argument N must be
713 The following functions copy Unicode strings in memory.
715 -- Function: uint8_t * u8_cpy (uint8_t *DEST, const uint8_t *SRC,
717 -- Function: uint16_t * u16_cpy (uint16_t *DEST, const uint16_t *SRC,
719 -- Function: uint32_t * u32_cpy (uint32_t *DEST, const uint32_t *SRC,
721 Copies N units from SRC to DEST.
723 This function is similar to ‘memcpy’, except that it operates on
726 -- Function: uint8_t * u8_move (uint8_t *DEST, const uint8_t *SRC,
728 -- Function: uint16_t * u16_move (uint16_t *DEST, const uint16_t *SRC,
730 -- Function: uint32_t * u32_move (uint32_t *DEST, const uint32_t *SRC,
732 Copies N units from SRC to DEST, guaranteeing correct behavior for
733 overlapping memory areas.
735 This function is similar to ‘memmove’, except that it operates on
738 The following function fills a Unicode string.
740 -- Function: uint8_t * u8_set (uint8_t *S, ucs4_t UC, size_t N)
741 -- Function: uint16_t * u16_set (uint16_t *S, ucs4_t UC, size_t N)
742 -- Function: uint32_t * u32_set (uint32_t *S, ucs4_t UC, size_t N)
743 Sets the first N characters of S to UC. UC should be a character
744 that occupies only 1 unit.
746 This function is similar to ‘memset’, except that it operates on
749 The following function compares two Unicode strings of the same
752 -- Function: int u8_cmp (const uint8_t *S1, const uint8_t *S2, size_t
754 -- Function: int u16_cmp (const uint16_t *S1, const uint16_t *S2,
756 -- Function: int u32_cmp (const uint32_t *S1, const uint32_t *S2,
758 Compares S1 and S2, each of length N, lexicographically. Returns a
759 negative value if S1 compares smaller than S2, a positive value if
760 S1 compares larger than S2, or 0 if they compare equal.
762 This function is similar to ‘memcmp’, except that it operates on
765 The following function compares two Unicode strings of possibly
768 -- Function: int u8_cmp2 (const uint8_t *S1, size_t N1, const uint8_t
770 -- Function: int u16_cmp2 (const uint16_t *S1, size_t N1, const
771 uint16_t *S2, size_t N2)
772 -- Function: int u32_cmp2 (const uint32_t *S1, size_t N1, const
773 uint32_t *S2, size_t N2)
774 Compares S1 and S2, lexicographically. Returns a negative value if
775 S1 compares smaller than S2, a positive value if S1 compares larger
776 than S2, or 0 if they compare equal.
778 This function is similar to the gnulib function ‘memcmp2’, except
779 that it operates on Unicode strings.
781 The following function searches for a given Unicode character.
783 -- Function: uint8_t * u8_chr (const uint8_t *S, size_t N, ucs4_t UC)
784 -- Function: uint16_t * u16_chr (const uint16_t *S, size_t N, ucs4_t
786 -- Function: uint32_t * u32_chr (const uint32_t *S, size_t N, ucs4_t
788 Searches the string at S for UC. Returns a pointer to the first
789 occurrence of UC in S, or NULL if UC does not occur in S.
791 This function is similar to ‘memchr’, except that it operates on
794 The following function counts the number of Unicode characters.
796 -- Function: size_t u8_mbsnlen (const uint8_t *S, size_t N)
797 -- Function: size_t u16_mbsnlen (const uint16_t *S, size_t N)
798 -- Function: size_t u32_mbsnlen (const uint32_t *S, size_t N)
799 Counts and returns the number of Unicode characters in the N units
802 This function is similar to the gnulib function ‘mbsnlen’, except
803 that it operates on Unicode strings.
806 File: libunistring.info, Node: Elementary string functions with memory allocation, Next: Elementary string functions on NUL terminated strings, Prev: Elementary string functions, Up: unistr.h
808 4.4 Elementary string functions with memory allocation
809 ======================================================
811 The following function copies a Unicode string.
813 -- Function: uint8_t * u8_cpy_alloc (const uint8_t *S, size_t N)
814 -- Function: uint16_t * u16_cpy_alloc (const uint16_t *S, size_t N)
815 -- Function: uint32_t * u32_cpy_alloc (const uint32_t *S, size_t N)
816 Makes a freshly allocated copy of S, of length N.
819 File: libunistring.info, Node: Elementary string functions on NUL terminated strings, Prev: Elementary string functions with memory allocation, Up: unistr.h
821 4.5 Elementary string functions on NUL terminated strings
822 =========================================================
824 The following functions inspect and return details about the first
825 character in a Unicode string.
827 -- Function: int u8_strmblen (const uint8_t *S)
828 -- Function: int u16_strmblen (const uint16_t *S)
829 -- Function: int u32_strmblen (const uint32_t *S)
830 Returns the length (number of units) of the first character in S.
831 Returns 0 if it is the NUL character. Returns -1 upon failure.
833 -- Function: int u8_strmbtouc (ucs4_t *PUC, const uint8_t *S)
834 -- Function: int u16_strmbtouc (ucs4_t *PUC, const uint16_t *S)
835 -- Function: int u32_strmbtouc (ucs4_t *PUC, const uint32_t *S)
836 Returns the length (number of units) of the first character in S,
837 putting its ‘ucs4_t’ representation in ‘*PUC’. Returns 0 if it is
838 the NUL character. Returns -1 upon failure.
840 -- Function: const uint8_t * u8_next (ucs4_t *PUC, const uint8_t *S)
841 -- Function: const uint16_t * u16_next (ucs4_t *PUC, const uint16_t *S)
842 -- Function: const uint32_t * u32_next (ucs4_t *PUC, const uint32_t *S)
843 Forward iteration step. Advances the pointer past the next
844 character, or returns NULL if the end of the string has been
845 reached. Puts the character’s ‘ucs4_t’ representation in ‘*PUC’.
847 The following function inspects and returns details about the
848 previous character in a Unicode string.
850 -- Function: const uint8_t * u8_prev (ucs4_t *PUC, const uint8_t *S,
851 const uint8_t *START)
852 -- Function: const uint16_t * u16_prev (ucs4_t *PUC, const uint16_t *S,
853 const uint16_t *START)
854 -- Function: const uint32_t * u32_prev (ucs4_t *PUC, const uint32_t *S,
855 const uint32_t *START)
856 Backward iteration step. Advances the pointer to point to the
857 previous character (the one that ends at ‘S’), or returns NULL if
858 the beginning of the string (specified by ‘START’) had been
859 reached. Puts the character’s ‘ucs4_t’ representation in ‘*PUC’.
860 Note that this function works only on well-formed Unicode strings.
862 The following functions determine the length of a Unicode string.
864 -- Function: size_t u8_strlen (const uint8_t *S)
865 -- Function: size_t u16_strlen (const uint16_t *S)
866 -- Function: size_t u32_strlen (const uint32_t *S)
867 Returns the number of units in S.
869 This function is similar to ‘strlen’ and ‘wcslen’, except that it
870 operates on Unicode strings.
872 -- Function: size_t u8_strnlen (const uint8_t *S, size_t MAXLEN)
873 -- Function: size_t u16_strnlen (const uint16_t *S, size_t MAXLEN)
874 -- Function: size_t u32_strnlen (const uint32_t *S, size_t MAXLEN)
875 Returns the number of units in S, but at most MAXLEN.
877 This function is similar to ‘strnlen’ and ‘wcsnlen’, except that it
878 operates on Unicode strings.
880 The following functions copy portions of Unicode strings in memory.
882 -- Function: uint8_t * u8_strcpy (uint8_t *DEST, const uint8_t *SRC)
883 -- Function: uint16_t * u16_strcpy (uint16_t *DEST, const uint16_t
885 -- Function: uint32_t * u32_strcpy (uint32_t *DEST, const uint32_t
889 This function is similar to ‘strcpy’ and ‘wcscpy’, except that it
890 operates on Unicode strings.
892 -- Function: uint8_t * u8_stpcpy (uint8_t *DEST, const uint8_t *SRC)
893 -- Function: uint16_t * u16_stpcpy (uint16_t *DEST, const uint16_t
895 -- Function: uint32_t * u32_stpcpy (uint32_t *DEST, const uint32_t
897 Copies SRC to DEST, returning the address of the terminating NUL in
900 This function is similar to ‘stpcpy’, except that it operates on
903 -- Function: uint8_t * u8_strncpy (uint8_t *DEST, const uint8_t *SRC,
905 -- Function: uint16_t * u16_strncpy (uint16_t *DEST, const uint16_t
907 -- Function: uint32_t * u32_strncpy (uint32_t *DEST, const uint32_t
909 Copies no more than N units of SRC to DEST.
911 This function is similar to ‘strncpy’ and ‘wcsncpy’, except that it
912 operates on Unicode strings.
914 -- Function: uint8_t * u8_stpncpy (uint8_t *DEST, const uint8_t *SRC,
916 -- Function: uint16_t * u16_stpncpy (uint16_t *DEST, const uint16_t
918 -- Function: uint32_t * u32_stpncpy (uint32_t *DEST, const uint32_t
920 Copies no more than N units of SRC to DEST. Returns a pointer past
921 the last non-NUL unit written into DEST. In other words, if the
922 units written into DEST include a NUL, the return value is the
923 address of the first such NUL unit, otherwise it is ‘DEST + N’.
925 This function is similar to ‘stpncpy’, except that it operates on
928 -- Function: uint8_t * u8_strcat (uint8_t *DEST, const uint8_t *SRC)
929 -- Function: uint16_t * u16_strcat (uint16_t *DEST, const uint16_t
931 -- Function: uint32_t * u32_strcat (uint32_t *DEST, const uint32_t
933 Appends SRC onto DEST.
935 This function is similar to ‘strcat’ and ‘wcscat’, except that it
936 operates on Unicode strings.
938 -- Function: uint8_t * u8_strncat (uint8_t *DEST, const uint8_t *SRC,
940 -- Function: uint16_t * u16_strncat (uint16_t *DEST, const uint16_t
942 -- Function: uint32_t * u32_strncat (uint32_t *DEST, const uint32_t
944 Appends no more than N units of SRC onto DEST.
946 This function is similar to ‘strncat’ and ‘wcsncat’, except that it
947 operates on Unicode strings.
949 The following functions compare two Unicode strings.
951 -- Function: int u8_strcmp (const uint8_t *S1, const uint8_t *S2)
952 -- Function: int u16_strcmp (const uint16_t *S1, const uint16_t *S2)
953 -- Function: int u32_strcmp (const uint32_t *S1, const uint32_t *S2)
954 Compares S1 and S2, lexicographically. Returns a negative value if
955 S1 compares smaller than S2, a positive value if S1 compares larger
956 than S2, or 0 if they compare equal.
958 This function is similar to ‘strcmp’ and ‘wcscmp’, except that it
959 operates on Unicode strings.
961 -- Function: int u8_strcoll (const uint8_t *S1, const uint8_t *S2)
962 -- Function: int u16_strcoll (const uint16_t *S1, const uint16_t *S2)
963 -- Function: int u32_strcoll (const uint32_t *S1, const uint32_t *S2)
964 Compares S1 and S2 using the collation rules of the current locale.
965 Returns -1 if S1 < S2, 0 if S1 = S2, 1 if S1 > S2. Upon failure,
966 sets ‘errno’ and returns any value.
968 This function is similar to ‘strcoll’ and ‘wcscoll’, except that it
969 operates on Unicode strings.
971 Note that this function may consider different canonical
972 normalizations of the same string as having a large distance. It
973 is therefore better to use the function ‘u8_normcoll’ instead of
974 this one; see *note uninorm.h::.
976 -- Function: int u8_strncmp (const uint8_t *S1, const uint8_t *S2,
978 -- Function: int u16_strncmp (const uint16_t *S1, const uint16_t *S2,
980 -- Function: int u32_strncmp (const uint32_t *S1, const uint32_t *S2,
982 Compares no more than N units of S1 and S2.
984 This function is similar to ‘strncmp’ and ‘wcsncmp’, except that it
985 operates on Unicode strings.
987 The following function allocates a duplicate of a Unicode string.
989 -- Function: uint8_t * u8_strdup (const uint8_t *S)
990 -- Function: uint16_t * u16_strdup (const uint16_t *S)
991 -- Function: uint32_t * u32_strdup (const uint32_t *S)
992 Duplicates S, returning an identical malloc’d string.
994 This function is similar to ‘strdup’ and ‘wcsdup’, except that it
995 operates on Unicode strings.
997 The following functions search for a given Unicode character.
999 -- Function: uint8_t * u8_strchr (const uint8_t *STR, ucs4_t UC)
1000 -- Function: uint16_t * u16_strchr (const uint16_t *STR, ucs4_t UC)
1001 -- Function: uint32_t * u32_strchr (const uint32_t *STR, ucs4_t UC)
1002 Finds the first occurrence of UC in STR.
1004 This function is similar to ‘strchr’ and ‘wcschr’, except that it
1005 operates on Unicode strings.
1007 -- Function: uint8_t * u8_strrchr (const uint8_t *STR, ucs4_t UC)
1008 -- Function: uint16_t * u16_strrchr (const uint16_t *STR, ucs4_t UC)
1009 -- Function: uint32_t * u32_strrchr (const uint32_t *STR, ucs4_t UC)
1010 Finds the last occurrence of UC in STR.
1012 This function is similar to ‘strrchr’ and ‘wcsrchr’, except that it
1013 operates on Unicode strings.
1015 The following functions search for the first occurrence of some
1016 Unicode character in or outside a given set of Unicode characters.
1018 -- Function: size_t u8_strcspn (const uint8_t *STR, const uint8_t
1020 -- Function: size_t u16_strcspn (const uint16_t *STR, const uint16_t
1022 -- Function: size_t u32_strcspn (const uint32_t *STR, const uint32_t
1024 Returns the length of the initial segment of STR which consists
1025 entirely of Unicode characters not in REJECT.
1027 This function is similar to ‘strcspn’ and ‘wcscspn’, except that it
1028 operates on Unicode strings.
1030 -- Function: size_t u8_strspn (const uint8_t *STR, const uint8_t
1032 -- Function: size_t u16_strspn (const uint16_t *STR, const uint16_t
1034 -- Function: size_t u32_strspn (const uint32_t *STR, const uint32_t
1036 Returns the length of the initial segment of STR which consists
1037 entirely of Unicode characters in ACCEPT.
1039 This function is similar to ‘strspn’ and ‘wcsspn’, except that it
1040 operates on Unicode strings.
1042 -- Function: uint8_t * u8_strpbrk (const uint8_t *STR, const uint8_t
1044 -- Function: uint16_t * u16_strpbrk (const uint16_t *STR, const
1046 -- Function: uint32_t * u32_strpbrk (const uint32_t *STR, const
1048 Finds the first occurrence in STR of any character in ACCEPT.
1050 This function is similar to ‘strpbrk’ and ‘wcspbrk’, except that it
1051 operates on Unicode strings.
1053 The following functions search whether a given Unicode string is a
1054 substring of another Unicode string.
1056 -- Function: uint8_t * u8_strstr (const uint8_t *HAYSTACK, const
1058 -- Function: uint16_t * u16_strstr (const uint16_t *HAYSTACK, const
1060 -- Function: uint32_t * u32_strstr (const uint32_t *HAYSTACK, const
1062 Finds the first occurrence of NEEDLE in HAYSTACK.
1064 This function is similar to ‘strstr’ and ‘wcsstr’, except that it
1065 operates on Unicode strings.
1067 -- Function: bool u8_startswith (const uint8_t *STR, const uint8_t
1069 -- Function: bool u16_startswith (const uint16_t *STR, const uint16_t
1071 -- Function: bool u32_startswith (const uint32_t *STR, const uint32_t
1073 Tests whether STR starts with PREFIX.
1075 -- Function: bool u8_endswith (const uint8_t *STR, const uint8_t
1077 -- Function: bool u16_endswith (const uint16_t *STR, const uint16_t
1079 -- Function: bool u32_endswith (const uint32_t *STR, const uint32_t
1081 Tests whether STR ends with SUFFIX.
1083 The following function does one step in tokenizing a Unicode string.
1085 -- Function: uint8_t * u8_strtok (uint8_t *STR, const uint8_t *DELIM,
1087 -- Function: uint16_t * u16_strtok (uint16_t *STR, const uint16_t
1088 *DELIM, uint16_t **PTR)
1089 -- Function: uint32_t * u32_strtok (uint32_t *STR, const uint32_t
1090 *DELIM, uint32_t **PTR)
1091 Divides STR into tokens separated by characters in DELIM.
1093 This function is similar to ‘strtok_r’ and ‘wcstok’, except that it
1094 operates on Unicode strings. Its interface is actually more
1095 similar to ‘wcstok’ than to ‘strtok’.
1098 File: libunistring.info, Node: uniconv.h, Next: unistdio.h, Prev: unistr.h, Up: Top
1100 5 Conversions between Unicode and encodings ‘<uniconv.h>’
1101 *********************************************************
1103 This include file declares functions for converting between Unicode
1104 strings and ‘char *’ strings in locale encoding or in other specified
1107 The following function returns the locale encoding.
1109 -- Function: const char * locale_charset ()
1110 Determines the current locale’s character encoding, and
1111 canonicalizes it into one of the canonical names listed in
1112 ‘config.charset’. If the canonical name cannot be determined, the
1113 result is a non-canonical name.
1115 The result must not be freed; it is statically allocated.
1117 The result of this function can be used as an argument to the
1118 ‘iconv_open’ function in GNU libc, in GNU libiconv, or in the
1119 gnulib provided wrapper around the native ‘iconv_open’ function.
1120 It may not work as an argument to the native ‘iconv_open’ function
1123 The handling of unconvertible characters during the conversions can
1124 be parametrized through the following enumeration type:
1126 -- Type: enum iconv_ilseq_handler
1127 This type specifies how unconvertible characters in the input are
1130 -- Constant: enum iconv_ilseq_handler iconveh_error
1131 This handler causes the function to return with ‘errno’ set to
1134 -- Constant: enum iconv_ilseq_handler iconveh_question_mark
1135 This handler produces one question mark ‘?’ per unconvertible
1138 -- Constant: enum iconv_ilseq_handler iconveh_escape_sequence
1139 This handler produces an escape sequence ‘\uXXXX’ or ‘\UXXXXXXXX’
1140 for each unconvertible character.
1142 The following functions convert between strings in a specified
1143 encoding and Unicode strings.
1145 -- Function: uint8_t * u8_conv_from_encoding (const char *FROMCODE,
1146 enum iconv_ilseq_handler HANDLER, const char *SRC, size_t
1147 SRCLEN, size_t *OFFSETS, uint8_t *RESULTBUF, size_t *LENGTHP)
1148 -- Function: uint16_t * u16_conv_from_encoding (const char *FROMCODE,
1149 enum iconv_ilseq_handler HANDLER, const char *SRC, size_t
1150 SRCLEN, size_t *OFFSETS, uint16_t *RESULTBUF, size_t *LENGTHP)
1151 -- Function: uint32_t * u32_conv_from_encoding (const char *FROMCODE,
1152 enum iconv_ilseq_handler HANDLER, const char *SRC, size_t
1153 SRCLEN, size_t *OFFSETS, uint32_t *RESULTBUF, size_t *LENGTHP)
1154 Converts an entire string, possibly including NUL bytes, from one
1155 encoding to UTF-8 encoding.
1157 Converts a memory region given in encoding FROMCODE. FROMCODE is
1158 as for the ‘iconv_open’ function.
1160 The input is in the memory region between SRC (inclusive) and ‘SRC
1161 + SRCLEN’ (exclusive).
1163 If OFFSETS is not NULL, it should point to an array of SRCLEN
1164 integers; this array is filled with offsets into the result, i.e.
1165 the character starting at ‘SRC[i]’ corresponds to the character
1166 starting at ‘RESULT[OFFSETS[i]]’, and other offsets are set to
1169 ‘RESULTBUF’ and ‘*LENGTHP’ should be a scratch buffer and its size,
1170 or ‘RESULTBUF’ can be NULL.
1172 May erase the contents of the memory at ‘RESULTBUF’.
1174 If successful: The resulting Unicode string (non-NULL) is returned
1175 and its length stored in ‘*LENGTHP’. The resulting string is
1176 ‘RESULTBUF’ if no dynamic memory allocation was necessary, or a
1177 freshly allocated memory block otherwise.
1179 In case of error: NULL is returned and ‘errno’ is set. Particular
1180 ‘errno’ values: ‘EINVAL’, ‘EILSEQ’, ‘ENOMEM’.
1182 -- Function: char * u8_conv_to_encoding (const char *TOCODE, enum
1183 iconv_ilseq_handler HANDLER, const uint8_t *SRC, size_t
1184 SRCLEN, size_t *OFFSETS, char *RESULTBUF, size_t *LENGTHP)
1185 -- Function: char * u16_conv_to_encoding (const char *TOCODE, enum
1186 iconv_ilseq_handler HANDLER, const uint16_t *SRC, size_t
1187 SRCLEN, size_t *OFFSETS, char *RESULTBUF, size_t *LENGTHP)
1188 -- Function: char * u32_conv_to_encoding (const char *TOCODE, enum
1189 iconv_ilseq_handler HANDLER, const uint32_t *SRC, size_t
1190 SRCLEN, size_t *OFFSETS, char *RESULTBUF, size_t *LENGTHP)
1191 Converts an entire Unicode string, possibly including NUL units,
1192 from UTF-8 encoding to a given encoding.
1194 Converts a memory region to encoding TOCODE. TOCODE is as for the
1195 ‘iconv_open’ function.
1197 The input is in the memory region between SRC (inclusive) and ‘SRC
1198 + SRCLEN’ (exclusive).
1200 If OFFSETS is not NULL, it should point to an array of SRCLEN
1201 integers; this array is filled with offsets into the result, i.e.
1202 the character starting at ‘SRC[i]’ corresponds to the character
1203 starting at ‘RESULT[OFFSETS[i]]’, and other offsets are set to
1206 ‘RESULTBUF’ and ‘*LENGTHP’ should be a scratch buffer and its size,
1207 or ‘RESULTBUF’ can be NULL.
1209 May erase the contents of the memory at ‘RESULTBUF’.
1211 If successful: The resulting Unicode string (non-NULL) is returned
1212 and its length stored in ‘*LENGTHP’. The resulting string is
1213 ‘RESULTBUF’ if no dynamic memory allocation was necessary, or a
1214 freshly allocated memory block otherwise.
1216 In case of error: NULL is returned and ‘errno’ is set. Particular
1217 ‘errno’ values: ‘EINVAL’, ‘EILSEQ’, ‘ENOMEM’.
1219 The following functions convert between NUL terminated strings in a
1220 specified encoding and NUL terminated Unicode strings.
1222 -- Function: uint8_t * u8_strconv_from_encoding (const char *STRING,
1223 const char *FROMCODE, enum iconv_ilseq_handler HANDLER)
1224 -- Function: uint16_t * u16_strconv_from_encoding (const char *STRING,
1225 const char *FROMCODE, enum iconv_ilseq_handler HANDLER)
1226 -- Function: uint32_t * u32_strconv_from_encoding (const char *STRING,
1227 const char *FROMCODE, enum iconv_ilseq_handler HANDLER)
1228 Converts a NUL terminated string from a given encoding.
1230 The result is ‘malloc’ allocated, or NULL (with ERRNO set) in case
1233 Particular ‘errno’ values: ‘EILSEQ’, ‘ENOMEM’.
1235 -- Function: char * u8_strconv_to_encoding (const uint8_t *STRING,
1236 const char *TOCODE, enum iconv_ilseq_handler HANDLER)
1237 -- Function: char * u16_strconv_to_encoding (const uint16_t *STRING,
1238 const char *TOCODE, enum iconv_ilseq_handler HANDLER)
1239 -- Function: char * u32_strconv_to_encoding (const uint32_t *STRING,
1240 const char *TOCODE, enum iconv_ilseq_handler HANDLER)
1241 Converts a NUL terminated string to a given encoding.
1243 The result is ‘malloc’ allocated, or NULL (with ‘errno’ set) in
1246 Particular ‘errno’ values: ‘EILSEQ’, ‘ENOMEM’.
1248 The following functions are shorthands that convert between NUL
1249 terminated strings in locale encoding and NUL terminated Unicode
1252 -- Function: uint8_t * u8_strconv_from_locale (const char *STRING)
1253 -- Function: uint16_t * u16_strconv_from_locale (const char *STRING)
1254 -- Function: uint32_t * u32_strconv_from_locale (const char *STRING)
1255 Converts a NUL terminated string from the locale encoding.
1257 The result is ‘malloc’ allocated, or NULL (with ‘errno’ set) in
1260 Particular ‘errno’ values: ‘ENOMEM’.
1262 -- Function: char * u8_strconv_to_locale (const uint8_t *STRING)
1263 -- Function: char * u16_strconv_to_locale (const uint16_t *STRING)
1264 -- Function: char * u32_strconv_to_locale (const uint32_t *STRING)
1265 Converts a NUL terminated string to the locale encoding.
1267 The result is ‘malloc’ allocated, or NULL (with ‘errno’ set) in
1270 Particular ‘errno’ values: ‘ENOMEM’.
1273 File: libunistring.info, Node: unistdio.h, Next: uniname.h, Prev: uniconv.h, Up: Top
1275 6 Output with Unicode strings ‘<unistdio.h>’
1276 ********************************************
1278 This include file declares functions for doing formatted output with
1279 Unicode strings. It defines a set of functions similar to ‘fprintf’ and
1280 ‘sprintf’, which are declared in ‘<stdio.h>’.
1282 These functions work like the ‘printf’ function family. In the
1284 • The format directive ‘U’ takes an UTF-8 string (‘const uint8_t *’).
1285 • The format directive ‘lU’ takes an UTF-16 string (‘const uint16_t
1287 • The format directive ‘llU’ takes an UTF-32 string (‘const uint32_t
1290 A function name with an infix ‘v’ indicates that a ‘va_list’ is
1291 passed instead of multiple arguments.
1293 The functions ‘*sprintf’ have a BUF argument that is assumed to be
1294 large enough. (_DANGEROUS! Overflowing the buffer will crash the
1297 The functions ‘*snprintf’ have a BUF argument that is assumed to be
1298 SIZE units large. (_DANGEROUS! The resulting string might be truncated
1299 in the middle of a multibyte character._)
1301 The functions ‘*asprintf’ have a RESULTP argument. The result will
1302 be freshly allocated and stored in ‘*resultp’.
1304 The functions ‘*asnprintf’ have a (RESULTBUF, LENGTHP) argument pair.
1305 If RESULTBUF is not NULL and the result fits into ‘*LENGTHP’ units, it
1306 is put in RESULTBUF, and RESULTBUF is returned. Otherwise, a freshly
1307 allocated string is returned. In both cases, ‘*LENGTHP’ is set to the
1308 length (number of units) of the returned string. In case of error, NULL
1309 is returned and ‘errno’ is set.
1311 The following functions take an ASCII format string and return a
1312 result that is a ‘char *’ string in locale encoding.
1314 -- Function: int ulc_sprintf (char *BUF, const char *FORMAT, ...)
1316 -- Function: int ulc_snprintf (char *BUF, size_t size, const char
1319 -- Function: int ulc_asprintf (char **RESULTP, const char *FORMAT, ...)
1321 -- Function: char * ulc_asnprintf (char *RESULTBUF, size_t *LENGTHP,
1322 const char *FORMAT, ...)
1324 -- Function: int ulc_vsprintf (char *BUF, const char *FORMAT, va_list
1327 -- Function: int ulc_vsnprintf (char *BUF, size_t size, const char
1328 *FORMAT, va_list AP)
1330 -- Function: int ulc_vasprintf (char **RESULTP, const char *FORMAT,
1333 -- Function: char * ulc_vasnprintf (char *RESULTBUF, size_t *LENGTHP,
1334 const char *FORMAT, va_list AP)
1336 The following functions take an ASCII format string and return a
1337 result in UTF-8 format.
1339 -- Function: int u8_sprintf (uint8_t *BUF, const char *FORMAT, ...)
1340 -- Function: int u8_snprintf (uint8_t *BUF, size_t SIZE, const char
1342 -- Function: int u8_asprintf (uint8_t **RESULTP, const char *FORMAT,
1344 -- Function: uint8_t * u8_asnprintf (uint8_t *RESULTBUF, size_t
1345 *LENGTHP, const char *FORMAT, ...)
1346 -- Function: int u8_vsprintf (uint8_t *BUF, const char *FORMAT, va_list
1348 -- Function: int u8_vsnprintf (uint8_t *BUF, size_t SIZE, const char
1349 *FORMAT, va_list AP)
1350 -- Function: int u8_vasprintf (uint8_t **RESULTP, const char *FORMAT,
1352 -- Function: uint8_t * u8_vasnprintf (uint8_t *resultbuf, size_t
1353 *LENGTHP, const char *FORMAT, va_list AP)
1355 The following functions take an UTF-8 format string and return a
1356 result in UTF-8 format.
1358 -- Function: int u8_u8_sprintf (uint8_t *BUF, const uint8_t *FORMAT,
1360 -- Function: int u8_u8_snprintf (uint8_t *BUF, size_t SIZE, const
1361 uint8_t *FORMAT, ...)
1362 -- Function: int u8_u8_asprintf (uint8_t **RESULTP, const uint8_t
1364 -- Function: uint8_t * u8_u8_asnprintf (uint8_t *resultbuf, size_t
1365 *LENGTHP, const uint8_t *FORMAT, ...)
1366 -- Function: int u8_u8_vsprintf (uint8_t *BUF, const uint8_t *FORMAT,
1368 -- Function: int u8_u8_vsnprintf (uint8_t *BUF, size_t SIZE, const
1369 uint8_t *FORMAT, va_list AP)
1370 -- Function: int u8_u8_vasprintf (uint8_t **RESULTP, const uint8_t
1371 *FORMAT, va_list AP)
1372 -- Function: uint8_t * u8_u8_vasnprintf (uint8_t *resultbuf, size_t
1373 *LENGTHP, const uint8_t *FORMAT, va_list AP)
1375 The following functions take an ASCII format string and return a
1376 result in UTF-16 format.
1378 -- Function: int u16_sprintf (uint16_t *BUF, const char *FORMAT, ...)
1379 -- Function: int u16_snprintf (uint16_t *BUF, size_t SIZE, const char
1381 -- Function: int u16_asprintf (uint16_t **RESULTP, const char *FORMAT,
1383 -- Function: uint16_t * u16_asnprintf (uint16_t *RESULTBUF, size_t
1384 *LENGTHP, const char *FORMAT, ...)
1385 -- Function: int u16_vsprintf (uint16_t *BUF, const char *FORMAT,
1387 -- Function: int u16_vsnprintf (uint16_t *BUF, size_t SIZE, const char
1388 *FORMAT, va_list AP)
1389 -- Function: int u16_vasprintf (uint16_t **RESULTP, const char *FORMAT,
1391 -- Function: uint16_t * u16_vasnprintf (uint16_t *resultbuf, size_t
1392 *LENGTHP, const char *FORMAT, va_list AP)
1394 The following functions take an UTF-16 format string and return a
1395 result in UTF-16 format.
1397 -- Function: int u16_u16_sprintf (uint16_t *BUF, const uint16_t
1399 -- Function: int u16_u16_snprintf (uint16_t *BUF, size_t SIZE, const
1400 uint16_t *FORMAT, ...)
1401 -- Function: int u16_u16_asprintf (uint16_t **RESULTP, const uint16_t
1403 -- Function: uint16_t * u16_u16_asnprintf (uint16_t *resultbuf, size_t
1404 *LENGTHP, const uint16_t *FORMAT, ...)
1405 -- Function: int u16_u16_vsprintf (uint16_t *BUF, const uint16_t
1406 *FORMAT, va_list AP)
1407 -- Function: int u16_u16_vsnprintf (uint16_t *BUF, size_t SIZE, const
1408 uint16_t *FORMAT, va_list AP)
1409 -- Function: int u16_u16_vasprintf (uint16_t **RESULTP, const uint16_t
1410 *FORMAT, va_list AP)
1411 -- Function: uint16_t * u16_u16_vasnprintf (uint16_t *resultbuf, size_t
1412 *LENGTHP, const uint16_t *FORMAT, va_list AP)
1414 The following functions take an ASCII format string and return a
1415 result in UTF-32 format.
1417 -- Function: int u32_sprintf (uint32_t *BUF, const char *FORMAT, ...)
1418 -- Function: int u32_snprintf (uint32_t *BUF, size_t SIZE, const char
1420 -- Function: int u32_asprintf (uint32_t **RESULTP, const char *FORMAT,
1422 -- Function: uint32_t * u32_asnprintf (uint32_t *RESULTBUF, size_t
1423 *LENGTHP, const char *FORMAT, ...)
1424 -- Function: int u32_vsprintf (uint32_t *BUF, const char *FORMAT,
1426 -- Function: int u32_vsnprintf (uint32_t *BUF, size_t SIZE, const char
1427 *FORMAT, va_list AP)
1428 -- Function: int u32_vasprintf (uint32_t **RESULTP, const char *FORMAT,
1430 -- Function: uint32_t * u32_vasnprintf (uint32_t *resultbuf, size_t
1431 *LENGTHP, const char *FORMAT, va_list AP)
1433 The following functions take an UTF-32 format string and return a
1434 result in UTF-32 format.
1436 -- Function: int u32_u32_sprintf (uint32_t *BUF, const uint32_t
1438 -- Function: int u32_u32_snprintf (uint32_t *BUF, size_t SIZE, const
1439 uint32_t *FORMAT, ...)
1440 -- Function: int u32_u32_asprintf (uint32_t **RESULTP, const uint32_t
1442 -- Function: uint32_t * u32_u32_asnprintf (uint32_t *resultbuf, size_t
1443 *LENGTHP, const uint32_t *FORMAT, ...)
1444 -- Function: int u32_u32_vsprintf (uint32_t *BUF, const uint32_t
1445 *FORMAT, va_list AP)
1446 -- Function: int u32_u32_vsnprintf (uint32_t *BUF, size_t SIZE, const
1447 uint32_t *FORMAT, va_list AP)
1448 -- Function: int u32_u32_vasprintf (uint32_t **RESULTP, const uint32_t
1449 *FORMAT, va_list AP)
1450 -- Function: uint32_t * u32_u32_vasnprintf (uint32_t *resultbuf, size_t
1451 *LENGTHP, const uint32_t *FORMAT, va_list AP)
1453 The following functions take an ASCII format string and produce
1454 output in locale encoding to a ‘FILE’ stream.
1456 -- Function: int ulc_fprintf (FILE *STREAM, const char *FORMAT, ...)
1457 -- Function: int ulc_vfprintf (FILE *STREAM, const char *FORMAT,
1461 File: libunistring.info, Node: uniname.h, Next: unictype.h, Prev: unistdio.h, Up: Top
1463 7 Names of Unicode characters ‘<uniname.h>’
1464 *******************************************
1466 This include file implements the association between a Unicode
1467 character and its name.
1469 The name of a Unicode character allows to distinguish it from other,
1470 similar looking characters. For example, the character ‘x’ has the name
1471 ‘"LATIN SMALL LETTER X"’ and is therefore different from the character
1472 named ‘"MULTIPLICATION SIGN"’.
1474 -- Macro: unsigned int UNINAME_MAX
1475 This macro expands to a constant that is the required size of
1476 buffer for a Unicode character name.
1478 -- Function: char * unicode_character_name (ucs4_t UC, char *BUF)
1479 Looks up the name of a Unicode character, in uppercase ASCII. BUF
1480 must point to a buffer, at least ‘UNINAME_MAX’ bytes in size.
1481 Returns the filled BUF, or NULL if the character does not have a
1484 -- Function: ucs4_t unicode_name_character (const char *NAME)
1485 Looks up the Unicode character with a given name, in upper- or
1486 lowercase ASCII. Returns the character if found, or
1487 ‘UNINAME_INVALID’ if not found.
1489 -- Macro: ucs4_t UNINAME_INVALID
1490 This macro expands to a constant that is a special return value of
1491 the ‘unicode_name_character’ function.
1494 File: libunistring.info, Node: unictype.h, Next: uniwidth.h, Prev: uniname.h, Up: Top
1496 8 Unicode character classification and properties ‘<unictype.h>’
1497 ****************************************************************
1499 This include file declares functions that classify Unicode characters
1500 and that test whether Unicode characters have specific properties.
1502 The classification assigns a “general category” to every Unicode
1503 character. This is similar to the classification provided by ISO C in
1506 Properties are the data that guides various text processing
1507 algorithms in the presence of specific Unicode characters.
1511 * General category::
1512 * Canonical combining class::
1514 * Decimal digit value::
1517 * Mirrored character::
1522 * ISO C and Java syntax::
1523 * Classifications like in ISO C::
1526 File: libunistring.info, Node: General category, Next: Canonical combining class, Up: unictype.h
1528 8.1 General category
1529 ====================
1531 Every Unicode character or code point has a _general category_
1532 assigned to it. This classification is important for most algorithms
1533 that work on Unicode text.
1535 The GNU libunistring library provides two kinds of API for working
1536 with general categories. The object oriented API uses a variable to
1537 denote every predefined general category value or combinations thereof.
1538 The low-level API uses a bit mask instead. The advantage of the object
1539 oriented API is that if only a few predefined general category values
1540 are used, the data tables are relatively small. When you combine
1541 general category values (using ‘uc_general_category_or’,
1542 ‘uc_general_category_and’, or ‘uc_general_category_and_not’), or when
1543 you use the low level bit masks, a big table is used thats holds the
1544 complete general category information for all Unicode characters.
1548 * Object oriented API::
1552 File: libunistring.info, Node: Object oriented API, Next: Bit mask API, Up: General category
1554 8.1.1 The object oriented API for general category
1555 --------------------------------------------------
1557 -- Type: uc_general_category_t
1558 This data type denotes a general category value. It is an
1559 immediate type that can be copied by simple assignment, without
1560 involving memory allocation. It is not an array type.
1562 The following are the predefined general category value. Additional
1563 general categories may be added in the future.
1565 -- Constant: uc_general_category_t UC_CATEGORY_L
1566 -- Constant: uc_general_category_t UC_CATEGORY_LC
1567 -- Constant: uc_general_category_t UC_CATEGORY_Lu
1568 -- Constant: uc_general_category_t UC_CATEGORY_Ll
1569 -- Constant: uc_general_category_t UC_CATEGORY_Lt
1570 -- Constant: uc_general_category_t UC_CATEGORY_Lm
1571 -- Constant: uc_general_category_t UC_CATEGORY_Lo
1572 -- Constant: uc_general_category_t UC_CATEGORY_M
1573 -- Constant: uc_general_category_t UC_CATEGORY_Mn
1574 -- Constant: uc_general_category_t UC_CATEGORY_Mc
1575 -- Constant: uc_general_category_t UC_CATEGORY_Me
1576 -- Constant: uc_general_category_t UC_CATEGORY_N
1577 -- Constant: uc_general_category_t UC_CATEGORY_Nd
1578 -- Constant: uc_general_category_t UC_CATEGORY_Nl
1579 -- Constant: uc_general_category_t UC_CATEGORY_No
1580 -- Constant: uc_general_category_t UC_CATEGORY_P
1581 -- Constant: uc_general_category_t UC_CATEGORY_Pc
1582 -- Constant: uc_general_category_t UC_CATEGORY_Pd
1583 -- Constant: uc_general_category_t UC_CATEGORY_Ps
1584 -- Constant: uc_general_category_t UC_CATEGORY_Pe
1585 -- Constant: uc_general_category_t UC_CATEGORY_Pi
1586 -- Constant: uc_general_category_t UC_CATEGORY_Pf
1587 -- Constant: uc_general_category_t UC_CATEGORY_Po
1588 -- Constant: uc_general_category_t UC_CATEGORY_S
1589 -- Constant: uc_general_category_t UC_CATEGORY_Sm
1590 -- Constant: uc_general_category_t UC_CATEGORY_Sc
1591 -- Constant: uc_general_category_t UC_CATEGORY_Sk
1592 -- Constant: uc_general_category_t UC_CATEGORY_So
1593 -- Constant: uc_general_category_t UC_CATEGORY_Z
1594 -- Constant: uc_general_category_t UC_CATEGORY_Zs
1595 -- Constant: uc_general_category_t UC_CATEGORY_Zl
1596 -- Constant: uc_general_category_t UC_CATEGORY_Zp
1597 -- Constant: uc_general_category_t UC_CATEGORY_C
1598 -- Constant: uc_general_category_t UC_CATEGORY_Cc
1599 -- Constant: uc_general_category_t UC_CATEGORY_Cf
1600 -- Constant: uc_general_category_t UC_CATEGORY_Cs
1601 -- Constant: uc_general_category_t UC_CATEGORY_Co
1602 -- Constant: uc_general_category_t UC_CATEGORY_Cn
1604 The following are alias names for predefined General category values.
1606 -- Macro: uc_general_category_t UC_LETTER
1607 This is another name for ‘UC_CATEGORY_L’.
1609 -- Macro: uc_general_category_t UC_CASED_LETTER
1610 This is another name for ‘UC_CATEGORY_LC’.
1612 -- Macro: uc_general_category_t UC_UPPERCASE_LETTER
1613 This is another name for ‘UC_CATEGORY_Lu’.
1615 -- Macro: uc_general_category_t UC_LOWERCASE_LETTER
1616 This is another name for ‘UC_CATEGORY_Ll’.
1618 -- Macro: uc_general_category_t UC_TITLECASE_LETTER
1619 This is another name for ‘UC_CATEGORY_Lt’.
1621 -- Macro: uc_general_category_t UC_MODIFIER_LETTER
1622 This is another name for ‘UC_CATEGORY_Lm’.
1624 -- Macro: uc_general_category_t UC_OTHER_LETTER
1625 This is another name for ‘UC_CATEGORY_Lo’.
1627 -- Macro: uc_general_category_t UC_MARK
1628 This is another name for ‘UC_CATEGORY_M’.
1630 -- Macro: uc_general_category_t UC_NON_SPACING_MARK
1631 This is another name for ‘UC_CATEGORY_Mn’.
1633 -- Macro: uc_general_category_t UC_COMBINING_SPACING_MARK
1634 This is another name for ‘UC_CATEGORY_Mc’.
1636 -- Macro: uc_general_category_t UC_ENCLOSING_MARK
1637 This is another name for ‘UC_CATEGORY_Me’.
1639 -- Macro: uc_general_category_t UC_NUMBER
1640 This is another name for ‘UC_CATEGORY_N’.
1642 -- Macro: uc_general_category_t UC_DECIMAL_DIGIT_NUMBER
1643 This is another name for ‘UC_CATEGORY_Nd’.
1645 -- Macro: uc_general_category_t UC_LETTER_NUMBER
1646 This is another name for ‘UC_CATEGORY_Nl’.
1648 -- Macro: uc_general_category_t UC_OTHER_NUMBER
1649 This is another name for ‘UC_CATEGORY_No’.
1651 -- Macro: uc_general_category_t UC_PUNCTUATION
1652 This is another name for ‘UC_CATEGORY_P’.
1654 -- Macro: uc_general_category_t UC_CONNECTOR_PUNCTUATION
1655 This is another name for ‘UC_CATEGORY_Pc’.
1657 -- Macro: uc_general_category_t UC_DASH_PUNCTUATION
1658 This is another name for ‘UC_CATEGORY_Pd’.
1660 -- Macro: uc_general_category_t UC_OPEN_PUNCTUATION
1661 This is another name for ‘UC_CATEGORY_Ps’ (“start punctuation”).
1663 -- Macro: uc_general_category_t UC_CLOSE_PUNCTUATION
1664 This is another name for ‘UC_CATEGORY_Pe’ (“end punctuation”).
1666 -- Macro: uc_general_category_t UC_INITIAL_QUOTE_PUNCTUATION
1667 This is another name for ‘UC_CATEGORY_Pi’.
1669 -- Macro: uc_general_category_t UC_FINAL_QUOTE_PUNCTUATION
1670 This is another name for ‘UC_CATEGORY_Pf’.
1672 -- Macro: uc_general_category_t UC_OTHER_PUNCTUATION
1673 This is another name for ‘UC_CATEGORY_Po’.
1675 -- Macro: uc_general_category_t UC_SYMBOL
1676 This is another name for ‘UC_CATEGORY_S’.
1678 -- Macro: uc_general_category_t UC_MATH_SYMBOL
1679 This is another name for ‘UC_CATEGORY_Sm’.
1681 -- Macro: uc_general_category_t UC_CURRENCY_SYMBOL
1682 This is another name for ‘UC_CATEGORY_Sc’.
1684 -- Macro: uc_general_category_t UC_MODIFIER_SYMBOL
1685 This is another name for ‘UC_CATEGORY_Sk’.
1687 -- Macro: uc_general_category_t UC_OTHER_SYMBOL
1688 This is another name for ‘UC_CATEGORY_So’.
1690 -- Macro: uc_general_category_t UC_SEPARATOR
1691 This is another name for ‘UC_CATEGORY_Z’.
1693 -- Macro: uc_general_category_t UC_SPACE_SEPARATOR
1694 This is another name for ‘UC_CATEGORY_Zs’.
1696 -- Macro: uc_general_category_t UC_LINE_SEPARATOR
1697 This is another name for ‘UC_CATEGORY_Zl’.
1699 -- Macro: uc_general_category_t UC_PARAGRAPH_SEPARATOR
1700 This is another name for ‘UC_CATEGORY_Zp’.
1702 -- Macro: uc_general_category_t UC_OTHER
1703 This is another name for ‘UC_CATEGORY_C’.
1705 -- Macro: uc_general_category_t UC_CONTROL
1706 This is another name for ‘UC_CATEGORY_Cc’.
1708 -- Macro: uc_general_category_t UC_FORMAT
1709 This is another name for ‘UC_CATEGORY_Cf’.
1711 -- Macro: uc_general_category_t UC_SURROGATE
1712 This is another name for ‘UC_CATEGORY_Cs’. All code points in this
1713 category are invalid characters.
1715 -- Macro: uc_general_category_t UC_PRIVATE_USE
1716 This is another name for ‘UC_CATEGORY_Co’.
1718 -- Macro: uc_general_category_t UC_UNASSIGNED
1719 This is another name for ‘UC_CATEGORY_Cn’. Some code points in
1720 this category are invalid characters.
1722 The following functions combine general categories, like in a boolean
1723 algebra, except that there is no ‘not’ operation.
1725 -- Function: uc_general_category_t uc_general_category_or
1726 (uc_general_category_t CATEGORY1, uc_general_category_t
1728 Returns the union of two general categories. This corresponds to
1729 the unions of the two sets of characters.
1731 -- Function: uc_general_category_t uc_general_category_and
1732 (uc_general_category_t CATEGORY1, uc_general_category_t
1734 Returns the intersection of two general categories as bit masks.
1735 This _does not_ correspond to the intersection of the two sets of
1738 -- Function: uc_general_category_t uc_general_category_and_not
1739 (uc_general_category_t CATEGORY1, uc_general_category_t
1741 Returns the intersection of a general category with the complement
1742 of a second general category, as bit masks. This _does not_
1743 correspond to the intersection with complement, when viewing the
1744 categories as sets of characters.
1746 The following functions associate general categories with their name.
1748 -- Function: const char * uc_general_category_name
1749 (uc_general_category_t CATEGORY)
1750 Returns the name of a general category, more precisely, the
1751 abbreviated name. Returns NULL if the general category corresponds
1752 to a bit mask that does not have a name.
1754 -- Function: const char * uc_general_category_long_name
1755 (uc_general_category_t CATEGORY)
1756 Returns the long name of a general category. Returns NULL if the
1757 general category corresponds to a bit mask that does not have a
1760 -- Function: uc_general_category_t uc_general_category_byname (const
1761 char *CATEGORY_NAME)
1762 Returns the general category given by name, e.g. ‘"Lu"’, or by
1763 long name, e.g. ‘"Uppercase Letter"’. This lookup ignores spaces,
1764 underscores, or hyphens as word separators and is
1767 The following functions view general categories as sets of Unicode
1770 -- Function: uc_general_category_t uc_general_category (ucs4_t UC)
1771 Returns the general category of a Unicode character.
1773 This function uses a big table.
1775 -- Function: bool uc_is_general_category (ucs4_t UC,
1776 uc_general_category_t CATEGORY)
1777 Tests whether a Unicode character belongs to a given category. The
1778 CATEGORY argument can be a predefined general category or the
1779 combination of several predefined general categories.
1782 File: libunistring.info, Node: Bit mask API, Prev: Object oriented API, Up: General category
1784 8.1.2 The bit mask API for general category
1785 -------------------------------------------
1787 The following are the predefined general category value as bit masks.
1788 Additional general categories may be added in the future.
1790 -- Macro: uint32_t UC_CATEGORY_MASK_L
1791 -- Macro: uint32_t UC_CATEGORY_MASK_LC
1792 -- Macro: uint32_t UC_CATEGORY_MASK_Lu
1793 -- Macro: uint32_t UC_CATEGORY_MASK_Ll
1794 -- Macro: uint32_t UC_CATEGORY_MASK_Lt
1795 -- Macro: uint32_t UC_CATEGORY_MASK_Lm
1796 -- Macro: uint32_t UC_CATEGORY_MASK_Lo
1797 -- Macro: uint32_t UC_CATEGORY_MASK_M
1798 -- Macro: uint32_t UC_CATEGORY_MASK_Mn
1799 -- Macro: uint32_t UC_CATEGORY_MASK_Mc
1800 -- Macro: uint32_t UC_CATEGORY_MASK_Me
1801 -- Macro: uint32_t UC_CATEGORY_MASK_N
1802 -- Macro: uint32_t UC_CATEGORY_MASK_Nd
1803 -- Macro: uint32_t UC_CATEGORY_MASK_Nl
1804 -- Macro: uint32_t UC_CATEGORY_MASK_No
1805 -- Macro: uint32_t UC_CATEGORY_MASK_P
1806 -- Macro: uint32_t UC_CATEGORY_MASK_Pc
1807 -- Macro: uint32_t UC_CATEGORY_MASK_Pd
1808 -- Macro: uint32_t UC_CATEGORY_MASK_Ps
1809 -- Macro: uint32_t UC_CATEGORY_MASK_Pe
1810 -- Macro: uint32_t UC_CATEGORY_MASK_Pi
1811 -- Macro: uint32_t UC_CATEGORY_MASK_Pf
1812 -- Macro: uint32_t UC_CATEGORY_MASK_Po
1813 -- Macro: uint32_t UC_CATEGORY_MASK_S
1814 -- Macro: uint32_t UC_CATEGORY_MASK_Sm
1815 -- Macro: uint32_t UC_CATEGORY_MASK_Sc
1816 -- Macro: uint32_t UC_CATEGORY_MASK_Sk
1817 -- Macro: uint32_t UC_CATEGORY_MASK_So
1818 -- Macro: uint32_t UC_CATEGORY_MASK_Z
1819 -- Macro: uint32_t UC_CATEGORY_MASK_Zs
1820 -- Macro: uint32_t UC_CATEGORY_MASK_Zl
1821 -- Macro: uint32_t UC_CATEGORY_MASK_Zp
1822 -- Macro: uint32_t UC_CATEGORY_MASK_C
1823 -- Macro: uint32_t UC_CATEGORY_MASK_Cc
1824 -- Macro: uint32_t UC_CATEGORY_MASK_Cf
1825 -- Macro: uint32_t UC_CATEGORY_MASK_Cs
1826 -- Macro: uint32_t UC_CATEGORY_MASK_Co
1827 -- Macro: uint32_t UC_CATEGORY_MASK_Cn
1829 The following function views general categories as sets of Unicode
1832 -- Function: bool uc_is_general_category_withtable (ucs4_t UC, uint32_t
1834 Tests whether a Unicode character belongs to a given category. The
1835 BITMASK argument can be a predefined general category bitmask or
1836 the combination of several predefined general category bitmasks.
1838 This function uses a big table comprising all general categories.
1841 File: libunistring.info, Node: Canonical combining class, Next: Bidi class, Prev: General category, Up: unictype.h
1843 8.2 Canonical combining class
1844 =============================
1846 Every Unicode character or code point has a _canonical combining
1847 class_ assigned to it.
1849 What is the meaning of the canonical combining class? Essentially,
1850 it indicates the priority with which a combining character is attached
1851 to its base character. The characters for which the canonical combining
1852 class is 0 are the base characters, and the characters for which it is
1853 greater than 0 are the combining characters. Combining characters are
1854 rendered near/attached/around their base character, and combining
1855 characters with small combining classes are attached "first" or "closer"
1856 to the base character.
1858 The canonical combining class of a character is a number in the range
1859 0..255. The possible values are described in the Unicode Character
1860 Database <http://www.unicode.org/Public/UNIDATA/UCD.html>. The list
1861 here is not definitive; more values can be added in future versions.
1863 -- Constant: int UC_CCC_NR
1864 The canonical combining class value for “Not Reordered” characters.
1867 -- Constant: int UC_CCC_OV
1868 The canonical combining class value for “Overlay” characters.
1870 -- Constant: int UC_CCC_NK
1871 The canonical combining class value for “Nukta” characters.
1873 -- Constant: int UC_CCC_KV
1874 The canonical combining class value for “Kana Voicing” characters.
1876 -- Constant: int UC_CCC_VR
1877 The canonical combining class value for “Virama” characters.
1879 -- Constant: int UC_CCC_ATBL
1880 The canonical combining class value for “Attached Below Left”
1883 -- Constant: int UC_CCC_ATB
1884 The canonical combining class value for “Attached Below”
1887 -- Constant: int UC_CCC_ATA
1888 The canonical combining class value for “Attached Above”
1891 -- Constant: int UC_CCC_ATAR
1892 The canonical combining class value for “Attached Above Right”
1895 -- Constant: int UC_CCC_BL
1896 The canonical combining class value for “Below Left” characters.
1898 -- Constant: int UC_CCC_B
1899 The canonical combining class value for “Below” characters.
1901 -- Constant: int UC_CCC_BR
1902 The canonical combining class value for “Below Right” characters.
1904 -- Constant: int UC_CCC_L
1905 The canonical combining class value for “Left” characters.
1907 -- Constant: int UC_CCC_R
1908 The canonical combining class value for “Right” characters.
1910 -- Constant: int UC_CCC_AL
1911 The canonical combining class value for “Above Left” characters.
1913 -- Constant: int UC_CCC_A
1914 The canonical combining class value for “Above” characters.
1916 -- Constant: int UC_CCC_AR
1917 The canonical combining class value for “Above Right” characters.
1919 -- Constant: int UC_CCC_DB
1920 The canonical combining class value for “Double Below” characters.
1922 -- Constant: int UC_CCC_DA
1923 The canonical combining class value for “Double Above” characters.
1925 -- Constant: int UC_CCC_IS
1926 The canonical combining class value for “Iota Subscript”
1929 The following functions associate canonical combining classes with
1932 -- Function: const char * uc_combining_class_name (int CCC)
1933 Returns the name of a canonical combining class, more precisely,
1934 the abbreviated name. Returns NULL if the canonical combining
1935 class is a numeric value without a name.
1937 -- Function: const char * uc_combining_class_long_name (int CCC)
1938 Returns the long name of a canonical combining class. Returns NULL
1939 if the canonical combining class is a numeric value without a name.
1941 -- Function: int uc_combining_class_byname (const char *CCC_NAME)
1942 Returns the canonical combining class given by name, e.g. ‘"BL"’,
1943 or by long name, e.g. ‘"Below Left"’. This lookup ignores spaces,
1944 underscores, or hyphens as word separators and is
1947 The following function looks up the canonical combining class of a
1950 -- Function: int uc_combining_class (ucs4_t UC)
1951 Returns the canonical combining class of a Unicode character.
1954 File: libunistring.info, Node: Bidi class, Next: Decimal digit value, Prev: Canonical combining class, Up: unictype.h
1959 Every Unicode character or code point has a _bidi class_ assigned to
1960 it. Before Unicode 4.0, this concept was known as _bidirectional
1963 The bidi class guides the bidirectional algorithm
1964 (<http://www.unicode.org/reports/tr9/>). The possible values are the
1967 -- Constant: int UC_BIDI_L
1968 The bidi class for ‘Left-to-Right‘” characters.
1970 -- Constant: int UC_BIDI_LRE
1971 The bidi class for “Left-to-Right Embedding” characters.
1973 -- Constant: int UC_BIDI_LRO
1974 The bidi class for “Left-to-Right Override” characters.
1976 -- Constant: int UC_BIDI_R
1977 The bidi class for “Right-to-Left” characters.
1979 -- Constant: int UC_BIDI_AL
1980 The bidi class for “Right-to-Left Arabic” characters.
1982 -- Constant: int UC_BIDI_RLE
1983 The bidi class for “Right-to-Left Embedding” characters.
1985 -- Constant: int UC_BIDI_RLO
1986 The bidi class for “Right-to-Left Override” characters.
1988 -- Constant: int UC_BIDI_PDF
1989 The bidi class for “Pop Directional Format” characters.
1991 -- Constant: int UC_BIDI_EN
1992 The bidi class for “European Number” characters.
1994 -- Constant: int UC_BIDI_ES
1995 The bidi class for “European Number Separator” characters.
1997 -- Constant: int UC_BIDI_ET
1998 The bidi class for “European Number Terminator” characters.
2000 -- Constant: int UC_BIDI_AN
2001 The bidi class for “Arabic Number” characters.
2003 -- Constant: int UC_BIDI_CS
2004 The bidi class for “Common Number Separator” characters.
2006 -- Constant: int UC_BIDI_NSM
2007 The bidi class for “Non-Spacing Mark” characters.
2009 -- Constant: int UC_BIDI_BN
2010 The bidi class for “Boundary Neutral” characters.
2012 -- Constant: int UC_BIDI_B
2013 The bidi class for “Paragraph Separator” characters.
2015 -- Constant: int UC_BIDI_S
2016 The bidi class for “Segment Separator” characters.
2018 -- Constant: int UC_BIDI_WS
2019 The bidi class for “Whitespace” characters.
2021 -- Constant: int UC_BIDI_ON
2022 The bidi class for “Other Neutral” characters.
2024 The following functions implement the association between a
2025 bidirectional category and its name.
2027 -- Function: const char * uc_bidi_class_name (int BIDI_CLASS)
2028 -- Function: const char * uc_bidi_category_name (int CATEGORY)
2029 Returns the name of a bidi class, more precisely, the abbreviated
2032 -- Function: const char * uc_bidi_class_long_name (int BIDI_CLASS)
2033 Returns the long name of a bidi class.
2035 -- Function: int uc_bidi_class_byname (const char *BIDI_CLASS_NAME)
2036 -- Function: int uc_bidi_category_byname (const char *CATEGORY_NAME)
2037 Returns the bidi class given by name, e.g. ‘"LRE"’, or by long
2038 name, e.g. ‘"Left-to-Right Embedding"’. This lookup ignores
2039 spaces, underscores, or hyphens as word separators and is
2042 The following functions view bidirectional categories as sets of
2045 -- Function: int uc_bidi_class (ucs4_t UC)
2046 -- Function: int uc_bidi_category (ucs4_t UC)
2047 Returns the bidi class of a Unicode character.
2049 -- Function: bool uc_is_bidi_class (ucs4_t UC, int BIDI_CLASS)
2050 -- Function: bool uc_is_bidi_category (ucs4_t UC, int CATEGORY)
2051 Tests whether a Unicode character belongs to a given bidi class.
2054 File: libunistring.info, Node: Decimal digit value, Next: Digit value, Prev: Bidi class, Up: unictype.h
2056 8.4 Decimal digit value
2057 =======================
2059 Decimal digits (like the digits from ‘0’ to ‘9’) exist in many
2060 scripts. The following function converts a decimal digit character to
2061 its numerical value.
2063 -- Function: int uc_decimal_value (ucs4_t UC)
2064 Returns the decimal digit value of a Unicode character. The return
2065 value is an integer in the range 0..9, or -1 for characters that do
2066 not represent a decimal digit.
2069 File: libunistring.info, Node: Digit value, Next: Numeric value, Prev: Decimal digit value, Up: unictype.h
2074 Digit characters are like decimal digit characters, possibly in
2075 special forms, like as superscript, subscript, or circled. The
2076 following function converts a digit character to its numerical value.
2078 -- Function: int uc_digit_value (ucs4_t UC)
2079 Returns the digit value of a Unicode character. The return value
2080 is an integer in the range 0..9, or -1 for characters that do not
2084 File: libunistring.info, Node: Numeric value, Next: Mirrored character, Prev: Digit value, Up: unictype.h
2089 There are also characters that represent numbers without a digit
2090 system, like the Roman numerals, and fractional numbers, like 1/4 or
2093 The following type represents the numeric value of a Unicode
2095 -- Type: uc_fraction_t
2096 This is a structure type with the following fields:
2099 An integer N is represented by ‘numerator = N’, ‘denominator = 1’.
2101 The following function converts a number character to its numerical
2104 -- Function: uc_fraction_t uc_numeric_value (ucs4_t UC)
2105 Returns the numeric value of a Unicode character. The return value
2106 is a fraction, or the pseudo-fraction ‘{ 0, 0 }’ for characters
2107 that do not represent a number.
2110 File: libunistring.info, Node: Mirrored character, Next: Arabic shaping, Prev: Numeric value, Up: unictype.h
2112 8.7 Mirrored character
2113 ======================
2115 Character mirroring is used to associate the closing parenthesis
2116 character to the opening parenthesis character, the closing brace
2117 character with the opening brace character, and so on.
2119 The following function looks up the mirrored character of a Unicode
2122 -- Function: bool uc_mirror_char (ucs4_t UC, ucs4_t *PUC)
2123 Stores the mirrored character of a Unicode character UC in ‘*PUC’
2124 and returns ‘true’, if it exists. Otherwise it stores UC
2125 unmodified in ‘*PUC’ and returns ‘false’.
2128 File: libunistring.info, Node: Arabic shaping, Next: Properties, Prev: Mirrored character, Up: unictype.h
2133 When Arabic characters are rendered, after bidi reordering has taken
2134 place, the shape of the glyphs are modified so that many adjacent glyphs
2135 are joined. Two character properties describe how this “Arabic shaping”
2136 takes place: the joining type and the joining group.
2144 File: libunistring.info, Node: Joining type, Next: Joining group, Up: Arabic shaping
2146 8.8.1 Joining type of Arabic characters
2147 ---------------------------------------
2149 The joining type of a character describes on which of the left and
2150 right neighbour characters the character’s shape depends, and which of
2151 the two neighbour characters are rendered depending on this character.
2153 The joining type has the following possible values:
2155 -- Constant: int UC_JOINING_TYPE_U
2156 “Non joining”: Characters of this joining type prohibit joining.
2158 -- Constant: int UC_JOINING_TYPE_T
2159 “Transparent”: Characters of this joining type are skipped when
2160 considering joining.
2162 -- Constant: int UC_JOINING_TYPE_C
2163 “Join causing”: Characters of this joining type cause their
2164 neighbour characters to change their shapes but don’t change their
2167 -- Constant: int UC_JOINING_TYPE_L
2168 “Left joining”: Characters of this joining type have two shapes,
2169 isolated and initial. Such characters currently don’t exist.
2171 -- Constant: int UC_JOINING_TYPE_R
2172 “Right joining”: Characters of this joining type have two shapes,
2175 -- Constant: int UC_JOINING_TYPE_D
2176 “Dual joining”: Characters of this joining type have four shapes,
2177 initial, medial, final, and isolated.
2179 The following functions implement the association between a joining
2182 -- Function: const char * uc_joining_type_name (int JOINING_TYPE)
2183 Returns the name of a joining type.
2185 -- Function: const char * uc_joining_type_long_name (int JOINING_TYPE)
2186 Returns the long name of a joining type.
2188 -- Function: int uc_joining_type_byname (const char *JOINING_TYPE_NAME)
2189 Returns the joining type given by name, e.g. ‘"D"’, or by long
2190 name, e.g. ‘"Dual Joining’. This lookup ignores spaces,
2191 underscores, or hyphens as word separators and is
2194 The following function gives the joining type of every Unicode
2197 -- Function: int uc_joining_type (ucs4_t UC)
2198 Returns the joining type of a Unicode character.
2201 File: libunistring.info, Node: Joining group, Prev: Joining type, Up: Arabic shaping
2203 8.8.2 Joining group of Arabic characters
2204 ----------------------------------------
2206 The joining group of a character describes how the character’s shape
2207 is modified in the four contexts of dual-joining characters or in the
2208 two contexts of right-joining characters.
2210 The joining group has the following possible values:
2212 -- Constant: int UC_JOINING_GROUP_NONE
2213 -- Constant: int UC_JOINING_GROUP_AIN
2214 -- Constant: int UC_JOINING_GROUP_ALAPH
2215 -- Constant: int UC_JOINING_GROUP_ALEF
2216 -- Constant: int UC_JOINING_GROUP_BEH
2217 -- Constant: int UC_JOINING_GROUP_BETH
2218 -- Constant: int UC_JOINING_GROUP_BURUSHASKI_YEH_BARREE
2219 -- Constant: int UC_JOINING_GROUP_DAL
2220 -- Constant: int UC_JOINING_GROUP_DALATH_RISH
2221 -- Constant: int UC_JOINING_GROUP_E
2222 -- Constant: int UC_JOINING_GROUP_FARSI_YEH
2223 -- Constant: int UC_JOINING_GROUP_FE
2224 -- Constant: int UC_JOINING_GROUP_FEH
2225 -- Constant: int UC_JOINING_GROUP_FINAL_SEMKATH
2226 -- Constant: int UC_JOINING_GROUP_GAF
2227 -- Constant: int UC_JOINING_GROUP_GAMAL
2228 -- Constant: int UC_JOINING_GROUP_HAH
2229 -- Constant: int UC_JOINING_GROUP_HE
2230 -- Constant: int UC_JOINING_GROUP_HEH
2231 -- Constant: int UC_JOINING_GROUP_HEH_GOAL
2232 -- Constant: int UC_JOINING_GROUP_HETH
2233 -- Constant: int UC_JOINING_GROUP_KAF
2234 -- Constant: int UC_JOINING_GROUP_KAPH
2235 -- Constant: int UC_JOINING_GROUP_KHAPH
2236 -- Constant: int UC_JOINING_GROUP_KNOTTED_HEH
2237 -- Constant: int UC_JOINING_GROUP_LAM
2238 -- Constant: int UC_JOINING_GROUP_LAMADH
2239 -- Constant: int UC_JOINING_GROUP_MEEM
2240 -- Constant: int UC_JOINING_GROUP_MIM
2241 -- Constant: int UC_JOINING_GROUP_NOON
2242 -- Constant: int UC_JOINING_GROUP_NUN
2243 -- Constant: int UC_JOINING_GROUP_NYA
2244 -- Constant: int UC_JOINING_GROUP_PE
2245 -- Constant: int UC_JOINING_GROUP_QAF
2246 -- Constant: int UC_JOINING_GROUP_QAPH
2247 -- Constant: int UC_JOINING_GROUP_REH
2248 -- Constant: int UC_JOINING_GROUP_REVERSED_PE
2249 -- Constant: int UC_JOINING_GROUP_SAD
2250 -- Constant: int UC_JOINING_GROUP_SADHE
2251 -- Constant: int UC_JOINING_GROUP_SEEN
2252 -- Constant: int UC_JOINING_GROUP_SEMKATH
2253 -- Constant: int UC_JOINING_GROUP_SHIN
2254 -- Constant: int UC_JOINING_GROUP_SWASH_KAF
2255 -- Constant: int UC_JOINING_GROUP_SYRIAC_WAW
2256 -- Constant: int UC_JOINING_GROUP_TAH
2257 -- Constant: int UC_JOINING_GROUP_TAW
2258 -- Constant: int UC_JOINING_GROUP_TEH_MARBUTA
2259 -- Constant: int UC_JOINING_GROUP_TEH_MARBUTA_GOAL
2260 -- Constant: int UC_JOINING_GROUP_TETH
2261 -- Constant: int UC_JOINING_GROUP_WAW
2262 -- Constant: int UC_JOINING_GROUP_YEH
2263 -- Constant: int UC_JOINING_GROUP_YEH_BARREE
2264 -- Constant: int UC_JOINING_GROUP_YEH_WITH_TAIL
2265 -- Constant: int UC_JOINING_GROUP_YUDH
2266 -- Constant: int UC_JOINING_GROUP_YUDH_HE
2267 -- Constant: int UC_JOINING_GROUP_ZAIN
2268 -- Constant: int UC_JOINING_GROUP_ZHAIN
2270 The following functions implement the association between a joining
2273 -- Function: const char * uc_joining_group_name (int JOINING_GROUP)
2274 Returns the name of a joining group.
2276 -- Function: int uc_joining_group_byname (const char
2277 *JOINING_GROUP_NAME)
2278 Returns the joining group given by name, e.g. ‘"Teh_Marbuta"’.
2279 This lookup ignores spaces, underscores, or hyphens as word
2280 separators and is case-insignificant.
2282 The following function gives the joining group of every Unicode
2285 -- Function: int uc_joining_group (ucs4_t UC)
2286 Returns the joining group of a Unicode character.
2289 File: libunistring.info, Node: Properties, Next: Scripts, Prev: Arabic shaping, Up: unictype.h
2294 This section defines boolean properties of Unicode characters. This
2295 means, a character either has the given property or does not have it.
2296 In other words, the property can be viewed as a subset of the set of
2299 The GNU libunistring library provides two kinds of API for working
2300 with properties. The object oriented API uses a type ‘uc_property_t’ to
2301 designate a property. In the function-based API, which is a bit more
2302 low level, a property is merely a function.
2306 * Properties as objects::
2307 * Properties as functions::
2310 File: libunistring.info, Node: Properties as objects, Next: Properties as functions, Up: Properties
2312 8.9.1 Properties as objects – the object oriented API
2313 -----------------------------------------------------
2315 The following type designates a property on Unicode characters.
2317 -- Type: uc_property_t
2318 This data type denotes a boolean property on Unicode characters.
2319 It is an immediate type that can be copied by simple assignment,
2320 without involving memory allocation. It is not an array type.
2322 Many Unicode properties are predefined.
2324 The following are general properties.
2326 -- Constant: uc_property_t UC_PROPERTY_WHITE_SPACE
2327 -- Constant: uc_property_t UC_PROPERTY_ALPHABETIC
2328 -- Constant: uc_property_t UC_PROPERTY_OTHER_ALPHABETIC
2329 -- Constant: uc_property_t UC_PROPERTY_NOT_A_CHARACTER
2330 -- Constant: uc_property_t UC_PROPERTY_DEFAULT_IGNORABLE_CODE_POINT
2331 -- Constant: uc_property_t
2332 UC_PROPERTY_OTHER_DEFAULT_IGNORABLE_CODE_POINT
2333 -- Constant: uc_property_t UC_PROPERTY_DEPRECATED
2334 -- Constant: uc_property_t UC_PROPERTY_LOGICAL_ORDER_EXCEPTION
2335 -- Constant: uc_property_t UC_PROPERTY_VARIATION_SELECTOR
2336 -- Constant: uc_property_t UC_PROPERTY_PRIVATE_USE
2337 -- Constant: uc_property_t UC_PROPERTY_UNASSIGNED_CODE_VALUE
2339 The following properties are related to case folding.
2341 -- Constant: uc_property_t UC_PROPERTY_UPPERCASE
2342 -- Constant: uc_property_t UC_PROPERTY_OTHER_UPPERCASE
2343 -- Constant: uc_property_t UC_PROPERTY_LOWERCASE
2344 -- Constant: uc_property_t UC_PROPERTY_OTHER_LOWERCASE
2345 -- Constant: uc_property_t UC_PROPERTY_TITLECASE
2346 -- Constant: uc_property_t UC_PROPERTY_CASED
2347 -- Constant: uc_property_t UC_PROPERTY_CASE_IGNORABLE
2348 -- Constant: uc_property_t UC_PROPERTY_CHANGES_WHEN_LOWERCASED
2349 -- Constant: uc_property_t UC_PROPERTY_CHANGES_WHEN_UPPERCASED
2350 -- Constant: uc_property_t UC_PROPERTY_CHANGES_WHEN_TITLECASED
2351 -- Constant: uc_property_t UC_PROPERTY_CHANGES_WHEN_CASEFOLDED
2352 -- Constant: uc_property_t UC_PROPERTY_CHANGES_WHEN_CASEMAPPED
2353 -- Constant: uc_property_t UC_PROPERTY_SOFT_DOTTED
2355 The following properties are related to identifiers.
2357 -- Constant: uc_property_t UC_PROPERTY_ID_START
2358 -- Constant: uc_property_t UC_PROPERTY_OTHER_ID_START
2359 -- Constant: uc_property_t UC_PROPERTY_ID_CONTINUE
2360 -- Constant: uc_property_t UC_PROPERTY_OTHER_ID_CONTINUE
2361 -- Constant: uc_property_t UC_PROPERTY_XID_START
2362 -- Constant: uc_property_t UC_PROPERTY_XID_CONTINUE
2363 -- Constant: uc_property_t UC_PROPERTY_PATTERN_WHITE_SPACE
2364 -- Constant: uc_property_t UC_PROPERTY_PATTERN_SYNTAX
2366 The following properties have an influence on shaping and rendering.
2368 -- Constant: uc_property_t UC_PROPERTY_JOIN_CONTROL
2369 -- Constant: uc_property_t UC_PROPERTY_GRAPHEME_BASE
2370 -- Constant: uc_property_t UC_PROPERTY_GRAPHEME_EXTEND
2371 -- Constant: uc_property_t UC_PROPERTY_OTHER_GRAPHEME_EXTEND
2372 -- Constant: uc_property_t UC_PROPERTY_GRAPHEME_LINK
2374 The following properties relate to bidirectional reordering.
2376 -- Constant: uc_property_t UC_PROPERTY_BIDI_CONTROL
2377 -- Constant: uc_property_t UC_PROPERTY_BIDI_LEFT_TO_RIGHT
2378 -- Constant: uc_property_t UC_PROPERTY_BIDI_HEBREW_RIGHT_TO_LEFT
2379 -- Constant: uc_property_t UC_PROPERTY_BIDI_ARABIC_RIGHT_TO_LEFT
2380 -- Constant: uc_property_t UC_PROPERTY_BIDI_EUROPEAN_DIGIT
2381 -- Constant: uc_property_t UC_PROPERTY_BIDI_EUR_NUM_SEPARATOR
2382 -- Constant: uc_property_t UC_PROPERTY_BIDI_EUR_NUM_TERMINATOR
2383 -- Constant: uc_property_t UC_PROPERTY_BIDI_ARABIC_DIGIT
2384 -- Constant: uc_property_t UC_PROPERTY_BIDI_COMMON_SEPARATOR
2385 -- Constant: uc_property_t UC_PROPERTY_BIDI_BLOCK_SEPARATOR
2386 -- Constant: uc_property_t UC_PROPERTY_BIDI_SEGMENT_SEPARATOR
2387 -- Constant: uc_property_t UC_PROPERTY_BIDI_WHITESPACE
2388 -- Constant: uc_property_t UC_PROPERTY_BIDI_NON_SPACING_MARK
2389 -- Constant: uc_property_t UC_PROPERTY_BIDI_BOUNDARY_NEUTRAL
2390 -- Constant: uc_property_t UC_PROPERTY_BIDI_PDF
2391 -- Constant: uc_property_t UC_PROPERTY_BIDI_EMBEDDING_OR_OVERRIDE
2392 -- Constant: uc_property_t UC_PROPERTY_BIDI_OTHER_NEUTRAL
2394 The following properties deal with number representations.
2396 -- Constant: uc_property_t UC_PROPERTY_HEX_DIGIT
2397 -- Constant: uc_property_t UC_PROPERTY_ASCII_HEX_DIGIT
2399 The following properties deal with CJK.
2401 -- Constant: uc_property_t UC_PROPERTY_IDEOGRAPHIC
2402 -- Constant: uc_property_t UC_PROPERTY_UNIFIED_IDEOGRAPH
2403 -- Constant: uc_property_t UC_PROPERTY_RADICAL
2404 -- Constant: uc_property_t UC_PROPERTY_IDS_BINARY_OPERATOR
2405 -- Constant: uc_property_t UC_PROPERTY_IDS_TRINARY_OPERATOR
2407 Other miscellaneous properties are:
2409 -- Constant: uc_property_t UC_PROPERTY_ZERO_WIDTH
2410 -- Constant: uc_property_t UC_PROPERTY_SPACE
2411 -- Constant: uc_property_t UC_PROPERTY_NON_BREAK
2412 -- Constant: uc_property_t UC_PROPERTY_ISO_CONTROL
2413 -- Constant: uc_property_t UC_PROPERTY_FORMAT_CONTROL
2414 -- Constant: uc_property_t UC_PROPERTY_DASH
2415 -- Constant: uc_property_t UC_PROPERTY_HYPHEN
2416 -- Constant: uc_property_t UC_PROPERTY_PUNCTUATION
2417 -- Constant: uc_property_t UC_PROPERTY_LINE_SEPARATOR
2418 -- Constant: uc_property_t UC_PROPERTY_PARAGRAPH_SEPARATOR
2419 -- Constant: uc_property_t UC_PROPERTY_QUOTATION_MARK
2420 -- Constant: uc_property_t UC_PROPERTY_SENTENCE_TERMINAL
2421 -- Constant: uc_property_t UC_PROPERTY_TERMINAL_PUNCTUATION
2422 -- Constant: uc_property_t UC_PROPERTY_CURRENCY_SYMBOL
2423 -- Constant: uc_property_t UC_PROPERTY_MATH
2424 -- Constant: uc_property_t UC_PROPERTY_OTHER_MATH
2425 -- Constant: uc_property_t UC_PROPERTY_PAIRED_PUNCTUATION
2426 -- Constant: uc_property_t UC_PROPERTY_LEFT_OF_PAIR
2427 -- Constant: uc_property_t UC_PROPERTY_COMBINING
2428 -- Constant: uc_property_t UC_PROPERTY_COMPOSITE
2429 -- Constant: uc_property_t UC_PROPERTY_DECIMAL_DIGIT
2430 -- Constant: uc_property_t UC_PROPERTY_NUMERIC
2431 -- Constant: uc_property_t UC_PROPERTY_DIACRITIC
2432 -- Constant: uc_property_t UC_PROPERTY_EXTENDER
2433 -- Constant: uc_property_t UC_PROPERTY_IGNORABLE_CONTROL
2435 The following function looks up a property by its name.
2437 -- Function: uc_property_t uc_property_byname (const char
2439 Returns the property given by name, e.g. ‘"White space"’. If a
2440 property with the given name exists, the result will satisfy the
2441 ‘uc_property_is_valid’ predicate. Otherwise the result will not
2442 satisfy this predicate and must not be passed to functions that
2443 expect an ‘uc_property_t’ argument.
2445 This lookup ignores spaces, underscores, or hyphens as word
2446 separators, is case-insignificant, and supports the aliases listed
2447 in Unicode’s ‘PropertyAliases.txt’ file.
2449 This function references a big table of all predefined properties.
2450 Its use can significantly increase the size of your application.
2452 -- Function: bool uc_property_is_valid (uc_property_t property)
2453 Returns ‘true’ when the given property is valid, or ‘false’
2456 The following function views a property as a set of Unicode
2459 -- Function: bool uc_is_property (ucs4_t UC, uc_property_t PROPERTY)
2460 Tests whether the Unicode character UC has the given property.
2463 File: libunistring.info, Node: Properties as functions, Prev: Properties as objects, Up: Properties
2465 8.9.2 Properties as functions – the functional API
2466 --------------------------------------------------
2468 The following are general properties.
2470 -- Function: bool uc_is_property_white_space (ucs4_t UC)
2471 -- Function: bool uc_is_property_alphabetic (ucs4_t UC)
2472 -- Function: bool uc_is_property_other_alphabetic (ucs4_t UC)
2473 -- Function: bool uc_is_property_not_a_character (ucs4_t UC)
2474 -- Function: bool uc_is_property_default_ignorable_code_point (ucs4_t
2476 -- Function: bool uc_is_property_other_default_ignorable_code_point
2478 -- Function: bool uc_is_property_deprecated (ucs4_t UC)
2479 -- Function: bool uc_is_property_logical_order_exception (ucs4_t UC)
2480 -- Function: bool uc_is_property_variation_selector (ucs4_t UC)
2481 -- Function: bool uc_is_property_private_use (ucs4_t UC)
2482 -- Function: bool uc_is_property_unassigned_code_value (ucs4_t UC)
2484 The following properties are related to case folding.
2486 -- Function: bool uc_is_property_uppercase (ucs4_t UC)
2487 -- Function: bool uc_is_property_other_uppercase (ucs4_t UC)
2488 -- Function: bool uc_is_property_lowercase (ucs4_t UC)
2489 -- Function: bool uc_is_property_other_lowercase (ucs4_t UC)
2490 -- Function: bool uc_is_property_titlecase (ucs4_t UC)
2491 -- Function: bool uc_is_property_cased (ucs4_t UC)
2492 -- Function: bool uc_is_property_case_ignorable (ucs4_t UC)
2493 -- Function: bool uc_is_property_changes_when_lowercased (ucs4_t UC)
2494 -- Function: bool uc_is_property_changes_when_uppercased (ucs4_t UC)
2495 -- Function: bool uc_is_property_changes_when_titlecased (ucs4_t UC)
2496 -- Function: bool uc_is_property_changes_when_casefolded (ucs4_t UC)
2497 -- Function: bool uc_is_property_changes_when_casemapped (ucs4_t UC)
2498 -- Function: bool uc_is_property_soft_dotted (ucs4_t UC)
2500 The following properties are related to identifiers.
2502 -- Function: bool uc_is_property_id_start (ucs4_t UC)
2503 -- Function: bool uc_is_property_other_id_start (ucs4_t UC)
2504 -- Function: bool uc_is_property_id_continue (ucs4_t UC)
2505 -- Function: bool uc_is_property_other_id_continue (ucs4_t UC)
2506 -- Function: bool uc_is_property_xid_start (ucs4_t UC)
2507 -- Function: bool uc_is_property_xid_continue (ucs4_t UC)
2508 -- Function: bool uc_is_property_pattern_white_space (ucs4_t UC)
2509 -- Function: bool uc_is_property_pattern_syntax (ucs4_t UC)
2511 The following properties have an influence on shaping and rendering.
2513 -- Function: bool uc_is_property_join_control (ucs4_t UC)
2514 -- Function: bool uc_is_property_grapheme_base (ucs4_t UC)
2515 -- Function: bool uc_is_property_grapheme_extend (ucs4_t UC)
2516 -- Function: bool uc_is_property_other_grapheme_extend (ucs4_t UC)
2517 -- Function: bool uc_is_property_grapheme_link (ucs4_t UC)
2519 The following properties relate to bidirectional reordering.
2521 -- Function: bool uc_is_property_bidi_control (ucs4_t UC)
2522 -- Function: bool uc_is_property_bidi_left_to_right (ucs4_t UC)
2523 -- Function: bool uc_is_property_bidi_hebrew_right_to_left (ucs4_t UC)
2524 -- Function: bool uc_is_property_bidi_arabic_right_to_left (ucs4_t UC)
2525 -- Function: bool uc_is_property_bidi_european_digit (ucs4_t UC)
2526 -- Function: bool uc_is_property_bidi_eur_num_separator (ucs4_t UC)
2527 -- Function: bool uc_is_property_bidi_eur_num_terminator (ucs4_t UC)
2528 -- Function: bool uc_is_property_bidi_arabic_digit (ucs4_t UC)
2529 -- Function: bool uc_is_property_bidi_common_separator (ucs4_t UC)
2530 -- Function: bool uc_is_property_bidi_block_separator (ucs4_t UC)
2531 -- Function: bool uc_is_property_bidi_segment_separator (ucs4_t UC)
2532 -- Function: bool uc_is_property_bidi_whitespace (ucs4_t UC)
2533 -- Function: bool uc_is_property_bidi_non_spacing_mark (ucs4_t UC)
2534 -- Function: bool uc_is_property_bidi_boundary_neutral (ucs4_t UC)
2535 -- Function: bool uc_is_property_bidi_pdf (ucs4_t UC)
2536 -- Function: bool uc_is_property_bidi_embedding_or_override (ucs4_t UC)
2537 -- Function: bool uc_is_property_bidi_other_neutral (ucs4_t UC)
2539 The following properties deal with number representations.
2541 -- Function: bool uc_is_property_hex_digit (ucs4_t UC)
2542 -- Function: bool uc_is_property_ascii_hex_digit (ucs4_t UC)
2544 The following properties deal with CJK.
2546 -- Function: bool uc_is_property_ideographic (ucs4_t UC)
2547 -- Function: bool uc_is_property_unified_ideograph (ucs4_t UC)
2548 -- Function: bool uc_is_property_radical (ucs4_t UC)
2549 -- Function: bool uc_is_property_ids_binary_operator (ucs4_t UC)
2550 -- Function: bool uc_is_property_ids_trinary_operator (ucs4_t UC)
2552 Other miscellaneous properties are:
2554 -- Function: bool uc_is_property_zero_width (ucs4_t UC)
2555 -- Function: bool uc_is_property_space (ucs4_t UC)
2556 -- Function: bool uc_is_property_non_break (ucs4_t UC)
2557 -- Function: bool uc_is_property_iso_control (ucs4_t UC)
2558 -- Function: bool uc_is_property_format_control (ucs4_t UC)
2559 -- Function: bool uc_is_property_dash (ucs4_t UC)
2560 -- Function: bool uc_is_property_hyphen (ucs4_t UC)
2561 -- Function: bool uc_is_property_punctuation (ucs4_t UC)
2562 -- Function: bool uc_is_property_line_separator (ucs4_t UC)
2563 -- Function: bool uc_is_property_paragraph_separator (ucs4_t UC)
2564 -- Function: bool uc_is_property_quotation_mark (ucs4_t UC)
2565 -- Function: bool uc_is_property_sentence_terminal (ucs4_t UC)
2566 -- Function: bool uc_is_property_terminal_punctuation (ucs4_t UC)
2567 -- Function: bool uc_is_property_currency_symbol (ucs4_t UC)
2568 -- Function: bool uc_is_property_math (ucs4_t UC)
2569 -- Function: bool uc_is_property_other_math (ucs4_t UC)
2570 -- Function: bool uc_is_property_paired_punctuation (ucs4_t UC)
2571 -- Function: bool uc_is_property_left_of_pair (ucs4_t UC)
2572 -- Function: bool uc_is_property_combining (ucs4_t UC)
2573 -- Function: bool uc_is_property_composite (ucs4_t UC)
2574 -- Function: bool uc_is_property_decimal_digit (ucs4_t UC)
2575 -- Function: bool uc_is_property_numeric (ucs4_t UC)
2576 -- Function: bool uc_is_property_diacritic (ucs4_t UC)
2577 -- Function: bool uc_is_property_extender (ucs4_t UC)
2578 -- Function: bool uc_is_property_ignorable_control (ucs4_t UC)
2581 File: libunistring.info, Node: Scripts, Next: Blocks, Prev: Properties, Up: unictype.h
2586 The Unicode characters are subdivided into scripts.
2588 The following type is used to represent a script:
2590 -- Type: uc_script_t
2591 This data type is a structure type that refers to statically
2592 allocated read-only data. It contains the following fields:
2595 The ‘name’ field contains the name of the script.
2597 The following functions look up a script.
2599 -- Function: const uc_script_t * uc_script (ucs4_t UC)
2600 Returns the script of a Unicode character. Returns NULL if UC does
2601 not belong to any script.
2603 -- Function: const uc_script_t * uc_script_byname (const char
2605 Returns the script given by its name, e.g. ‘"HAN"’. Returns NULL
2606 if a script with the given name does not exist.
2608 The following function views a script as a set of Unicode characters.
2610 -- Function: bool uc_is_script (ucs4_t UC, const uc_script_t *SCRIPT)
2611 Tests whether a Unicode character belongs to a given script.
2613 The following gives a global picture of all scripts.
2615 -- Function: void uc_all_scripts (const uc_script_t **SCRIPTS, size_t
2617 Get the list of all scripts. Stores a pointer to an array of all
2618 scripts in ‘*SCRIPTS’ and the length of this array in ‘*COUNT’.
2621 File: libunistring.info, Node: Blocks, Next: ISO C and Java syntax, Prev: Scripts, Up: unictype.h
2626 The Unicode characters are subdivided into blocks. A block is an
2627 interval of Unicode code points.
2629 The following type is used to represent a block.
2632 This data type is a structure type that refers to statically
2633 allocated data. It contains the following fields:
2638 The ‘start’ field is the first Unicode code point in the block.
2640 The ‘end’ field is the last Unicode code point in the block.
2642 The ‘name’ field is the name of the block.
2644 The following function looks up a block.
2646 -- Function: const uc_block_t * uc_block (ucs4_t UC)
2647 Returns the block a character belongs to.
2649 The following function views a block as a set of Unicode characters.
2651 -- Function: bool uc_is_block (ucs4_t UC, const uc_block_t *BLOCK)
2652 Tests whether a Unicode character belongs to a given block.
2654 The following gives a global picture of all block.
2656 -- Function: void uc_all_blocks (const uc_block_t **BLOCKS, size_t
2658 Get the list of all blocks. Stores a pointer to an array of all
2659 blocks in ‘*BLOCKS’ and the length of this array in ‘*COUNT’.
2662 File: libunistring.info, Node: ISO C and Java syntax, Next: Classifications like in ISO C, Prev: Blocks, Up: unictype.h
2664 8.12 ISO C and Java syntax
2665 ==========================
2667 The following properties are taken from language standards. The
2668 supported language standards are ISO C 99 and Java.
2670 -- Function: bool uc_is_c_whitespace (ucs4_t UC)
2671 Tests whether a Unicode character is considered whitespace in ISO C
2674 -- Function: bool uc_is_java_whitespace (ucs4_t UC)
2675 Tests whether a Unicode character is considered whitespace in Java.
2677 The following enumerated values are the possible return values of the
2678 functions ‘uc_c_ident_category’ and ‘uc_java_ident_category’.
2680 -- Constant: int UC_IDENTIFIER_START
2681 This return value means that the given character is valid as first
2682 or subsequent character in an identifier.
2684 -- Constant: int UC_IDENTIFIER_VALID
2685 This return value means that the given character is valid as
2686 subsequent character only.
2688 -- Constant: int UC_IDENTIFIER_INVALID
2689 This return value means that the given character is not valid in an
2692 -- Constant: int UC_IDENTIFIER_IGNORABLE
2693 This return value (only for Java) means that the given character is
2696 The following function determine whether a given character can be a
2697 constituent of an identifier in the given programming language.
2699 -- Function: int uc_c_ident_category (ucs4_t UC)
2700 Returns the categorization of a Unicode character with respect to
2701 the ISO C 99 identifier syntax.
2703 -- Function: int uc_java_ident_category (ucs4_t UC)
2704 Returns the categorization of a Unicode character with respect to
2705 the Java identifier syntax.
2708 File: libunistring.info, Node: Classifications like in ISO C, Prev: ISO C and Java syntax, Up: unictype.h
2710 8.13 Classifications like in ISO C
2711 ==================================
2713 The following character classifications mimic those declared in the
2714 ISO C header files ‘<ctype.h>’ and ‘<wctype.h>’. These functions are
2715 deprecated, because this set of functions was designed with ASCII in
2716 mind and cannot reflect the more diverse reality of the Unicode
2717 character set. But they can be a quick-and-dirty porting aid when
2718 migrating from ‘wchar_t’ APIs to Unicode strings.
2720 -- Function: bool uc_is_alnum (ucs4_t UC)
2721 Tests for any character for which ‘uc_is_alpha’ or ‘uc_is_digit’ is
2724 -- Function: bool uc_is_alpha (ucs4_t UC)
2725 Tests for any character for which ‘uc_is_upper’ or ‘uc_is_lower’ is
2726 true, or any character that is one of a locale-specific set of
2727 characters for which none of ‘uc_is_cntrl’, ‘uc_is_digit’,
2728 ‘uc_is_punct’, or ‘uc_is_space’ is true.
2730 -- Function: bool uc_is_cntrl (ucs4_t UC)
2731 Tests for any control character.
2733 -- Function: bool uc_is_digit (ucs4_t UC)
2734 Tests for any character that corresponds to a decimal-digit
2737 -- Function: bool uc_is_graph (ucs4_t UC)
2738 Tests for any character for which ‘uc_is_print’ is true and
2739 ‘uc_is_space’ is false.
2741 -- Function: bool uc_is_lower (ucs4_t UC)
2742 Tests for any character that corresponds to a lowercase letter or
2743 is one of a locale-specific set of characters for which none of
2744 ‘uc_is_cntrl’, ‘uc_is_digit’, ‘uc_is_punct’, or ‘uc_is_space’ is
2747 -- Function: bool uc_is_print (ucs4_t UC)
2748 Tests for any printing character.
2750 -- Function: bool uc_is_punct (ucs4_t UC)
2751 Tests for any printing character that is one of a locale-specific
2752 set of characters for which neither ‘uc_is_space’ nor ‘uc_is_alnum’
2755 -- Function: bool uc_is_space (ucs4_t UC)
2756 Test for any character that corresponds to a locale-specific set of
2757 characters for which none of ‘uc_is_alnum’, ‘uc_is_graph’, or
2758 ‘uc_is_punct’ is true.
2760 -- Function: bool uc_is_upper (ucs4_t UC)
2761 Tests for any character that corresponds to an uppercase letter or
2762 is one of a locale-specific set of characters for which none of
2763 ‘uc_is_cntrl’, ‘uc_is_digit’, ‘uc_is_punct’, or ‘uc_is_space’ is
2766 -- Function: bool uc_is_xdigit (ucs4_t UC)
2767 Tests for any character that corresponds to a hexadecimal-digit
2770 -- Function: bool uc_is_blank (ucs4_t UC)
2771 Tests for any character that corresponds to a standard blank
2772 character or a locale-specific set of characters for which
2773 ‘uc_is_alnum’ is false.
2776 File: libunistring.info, Node: uniwidth.h, Next: unigbrk.h, Prev: unictype.h, Up: Top
2778 9 Display width ‘<uniwidth.h>’
2779 ******************************
2781 This include file declares functions that return the display width,
2782 measured in columns, of characters or strings, when output to a device
2783 that uses non-proportional fonts.
2785 Note that for some rarely used characters the actual fonts or
2786 terminal emulators can use a different width. There is no mechanism for
2787 communicating the display width of characters across a Unix
2788 pseudo-terminal (tty). Also, there are scripts with complex rendering,
2789 like the Indic scripts. For these scripts, there is no such concept as
2790 non-proportional fonts. Therefore the results of these functions
2791 usually work fine on most scripts and on most characters but can fail to
2792 represent the actual display width.
2794 These functions are locale dependent. The ENCODING argument
2795 identifies the encoding (e.g. ‘"ISO-8859-2"’ for Polish).
2797 -- Function: int uc_width (ucs4_t UC, const char *ENCODING)
2798 Determines and returns the number of column positions required for
2799 UC. Returns -1 if UC is a control character that has an influence
2800 on the column position when output.
2802 -- Function: int u8_width (const uint8_t *S, size_t N, const char
2804 -- Function: int u16_width (const uint16_t *S, size_t N, const char
2806 -- Function: int u32_width (const uint32_t *S, size_t N, const char
2808 Determines and returns the number of column positions required for
2809 first N units (or fewer if S ends before this) in S. This function
2810 ignores control characters in the string.
2812 -- Function: int u8_strwidth (const uint8_t *S, const char *ENCODING)
2813 -- Function: int u16_strwidth (const uint16_t *S, const char *ENCODING)
2814 -- Function: int u32_strwidth (const uint32_t *S, const char *ENCODING)
2815 Determines and returns the number of column positions required for
2816 S. This function ignores control characters in the string.
2819 File: libunistring.info, Node: unigbrk.h, Next: uniwbrk.h, Prev: uniwidth.h, Up: Top
2821 10 Grapheme cluster breaks in strings ‘<unigbrk.h>’
2822 ***************************************************
2824 This include file declares functions for determining where in a
2825 string “grapheme clusters” start and end. A “grapheme cluster” is an
2826 approximation to a user-perceived character, which sometimes corresponds
2827 to multiple Unicode characters. Editing operations such as mouse
2828 selection, cursor movement, and backspacing often operate on grapheme
2829 clusters as units, not on individual characters.
2831 Some grapheme clusters are built from a base character and a
2832 combining character. The letter ‘é’, for example, is most commonly
2833 represented in Unicode as a single character U+00E8 LATIN SMALL LETTER E
2834 WITH ACUTE. It is, however, equally valid to use the pair of characters
2835 U+0065 LATIN SMALL LETTER E followed by U+0301 COMBINING ACUTE ACCENT.
2836 Since the user would perceive this pair of characters as a single
2837 character, they would be grouped into a single grapheme cluster.
2839 But there are also grapheme clusters that consist of several base
2840 characters. For example, a Devanagari letter and a Devanagari vowel
2841 sign that follows it may form a grapheme cluster. Similarly, some pairs
2842 of Thai characters and Hangul syllables (formed by two or three Hangul
2843 characters) are grapheme clusters.
2847 * Grapheme cluster breaks in a string::
2848 * Grapheme cluster break property::
2851 File: libunistring.info, Node: Grapheme cluster breaks in a string, Next: Grapheme cluster break property, Up: unigbrk.h
2853 10.1 Grapheme cluster breaks in a string
2854 ========================================
2856 The following functions find a single boundary between grapheme
2857 clusters in a string.
2859 -- Function: void u8_grapheme_next (const uint8_t *S, const uint8_t
2861 -- Function: void u16_grapheme_next (const uint16_t *S, const uint16_t
2863 -- Function: void u32_grapheme_next (const uint32_t *S, const uint32_t
2865 Returns the start of the next grapheme cluster following S, or END
2866 if no grapheme cluster break is encountered before it. Returns
2867 NULL if and only if ‘S == END’.
2869 -- Function: void u8_grapheme_prev (const uint8_t *S, const uint8_t
2871 -- Function: void u16_grapheme_prev (const uint16_t *S, const uint16_t
2873 -- Function: void u32_grapheme_prev (const uint32_t *S, const uint32_t
2875 Returns the start of the grapheme cluster preceding S, or START if
2876 no grapheme cluster break is encountered before it. Returns NULL
2877 if and only if ‘S == START’.
2879 The following functions determine all of the grapheme cluster
2880 boundaries in a string.
2882 -- Function: void u8_grapheme_breaks (const uint8_t *S, size_t N, char
2884 -- Function: void u16_grapheme_breaks (const uint16_t *S, size_t N,
2886 -- Function: void u32_grapheme_breaks (const uint32_t *S, size_t N,
2888 -- Function: void ulc_grapheme_breaks (const char *S, size_t N, char
2890 Determines the grapheme cluster break points in S, an array of N
2891 units, and stores the result at ‘P[0..N-1]’.
2893 means that there is a grapheme cluster boundary between
2894 ‘S[i-1]’ and ‘S[i]’.
2896 means that ‘S[i-1]’ and ‘S[i]’ are part of the same grapheme
2898 ‘P[0]’ is always set to 1, because there is always a grapheme
2899 cluster break at start of text.
2902 File: libunistring.info, Node: Grapheme cluster break property, Prev: Grapheme cluster breaks in a string, Up: unigbrk.h
2904 10.2 Grapheme cluster break property
2905 ====================================
2907 This is a more low-level API. The grapheme cluster break property is
2908 a property defined in Unicode Standard Annex #29, section “Grapheme
2909 Cluster Boundaries”, see
2910 <http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries>. It
2911 is used for determining the grapheme cluster breaks in a string.
2913 The following are the possible values of the grapheme cluster break
2914 property. More values may be added in the future.
2916 -- Constant: int GBP_OTHER
2917 -- Constant: int GBP_CR
2918 -- Constant: int GBP_LF
2919 -- Constant: int GBP_CONTROL
2920 -- Constant: int GBP_EXTEND
2921 -- Constant: int GBP_PREPEND
2922 -- Constant: int GBP_SPACINGMARK
2923 -- Constant: int GBP_L
2924 -- Constant: int GBP_V
2925 -- Constant: int GBP_T
2926 -- Constant: int GBP_LV
2927 -- Constant: int GBP_LVT
2929 The following function looks up the grapheme cluster break property
2932 -- Function: int uc_graphemeclusterbreak_property (ucs4_t UC)
2933 Returns the Grapheme_Cluster_Break property of a Unicode character.
2935 The following function determines whether there is a grapheme cluster
2936 break between two Unicode characters. It is the primitive upon which
2937 the higher-level functions in the previous section are directly based.
2939 -- Function: bool uc_is_grapheme_break (ucs4_t A, ucs4_t B)
2940 Returns true if there is an grapheme cluster boundary between
2941 Unicode characters A and B.
2943 There is always a grapheme cluster break at the start or end of
2944 text. You can specify zero for A or B to indicate start of text or
2945 end of text, respectively.
2947 This implements the extended (not legacy) grapheme cluster rules
2948 described in the Unicode standard, because the standard says that
2952 File: libunistring.info, Node: uniwbrk.h, Next: unilbrk.h, Prev: unigbrk.h, Up: Top
2954 11 Word breaks in strings ‘<uniwbrk.h>’
2955 ***************************************
2957 This include file declares functions for determining where in a
2958 string “words” start and end. Here “words” are not necessarily the same
2959 as entities that can be looked up in dictionaries, but rather groups of
2960 consecutive characters that should not be split by text processing
2965 * Word breaks in a string::
2966 * Word break property::
2969 File: libunistring.info, Node: Word breaks in a string, Next: Word break property, Up: uniwbrk.h
2971 11.1 Word breaks in a string
2972 ============================
2974 The following functions determine the word breaks in a string.
2976 -- Function: void u8_wordbreaks (const uint8_t *S, size_t N, char *P)
2977 -- Function: void u16_wordbreaks (const uint16_t *S, size_t N, char *P)
2978 -- Function: void u32_wordbreaks (const uint32_t *S, size_t N, char *P)
2979 -- Function: void ulc_wordbreaks (const char *S, size_t N, char *P)
2980 Determines the word break points in S, an array of N units, and
2981 stores the result at ‘P[0..N-1]’.
2983 means that there is a word boundary between ‘S[i-1]’ and
2986 means that ‘S[i-1]’ and ‘S[i]’ must not be separated.
2987 ‘P[0]’ is always set to 0. If an application wants to consider a
2988 word break to be present at the beginning of the string (before
2989 ‘S[0]’) or at the end of the string (after ‘S[0..N-1]’), it has to
2990 treat these cases explicitly.
2993 File: libunistring.info, Node: Word break property, Prev: Word breaks in a string, Up: uniwbrk.h
2995 11.2 Word break property
2996 ========================
2998 This is a more low-level API. The word break property is a property
2999 defined in Unicode Standard Annex #29, section “Word Boundaries”, see
3000 <http://www.unicode.org/reports/tr29/#Word_Boundaries>. It is used for
3001 determining the word breaks in a string.
3003 The following are the possible values of the word break property.
3004 More values may be added in the future.
3006 -- Constant: int WBP_OTHER
3007 -- Constant: int WBP_CR
3008 -- Constant: int WBP_LF
3009 -- Constant: int WBP_NEWLINE
3010 -- Constant: int WBP_EXTEND
3011 -- Constant: int WBP_FORMAT
3012 -- Constant: int WBP_KATAKANA
3013 -- Constant: int WBP_ALETTER
3014 -- Constant: int WBP_MIDNUMLET
3015 -- Constant: int WBP_MIDLETTER
3016 -- Constant: int WBP_MIDNUM
3017 -- Constant: int WBP_NUMERIC
3018 -- Constant: int WBP_EXTENDNUMLET
3020 The following function looks up the word break property of a
3023 -- Function: int uc_wordbreak_property (ucs4_t UC)
3024 Returns the Word_Break property of a Unicode character.
3027 File: libunistring.info, Node: unilbrk.h, Next: uninorm.h, Prev: uniwbrk.h, Up: Top
3029 12 Line breaking ‘<unilbrk.h>’
3030 ******************************
3032 This include file declares functions for determining where in a
3033 string line breaks could or should be introduced, in order to make the
3034 displayed string fit into a column of given width.
3036 These functions are locale dependent. The ENCODING argument
3037 identifies the encoding (e.g. ‘"ISO-8859-2"’ for Polish).
3039 The following enumerated values indicate whether, at a given
3040 position, a line break is possible or not. Given an string S as an
3041 array ‘S[0..N-1]’ and a position I, the values have the following
3044 -- Constant: int UC_BREAK_MANDATORY
3045 This value indicates that ‘S[I]’ is a line break character.
3047 -- Constant: int UC_BREAK_POSSIBLE
3048 This value indicates that a line break may be inserted between
3049 ‘S[I-1]’ and ‘S[I]’.
3051 -- Constant: int UC_BREAK_HYPHENATION
3052 This value indicates that a hyphen and a line break may be inserted
3053 between ‘S[I-1]’ and ‘S[I]’. But beware of language dependent
3056 -- Constant: int UC_BREAK_PROHIBITED
3057 This value indicates that ‘S[I-1]’ and ‘S[I]’ must not be
3060 -- Constant: int UC_BREAK_UNDEFINED
3061 This value is not used as a return value; rather, in the overriding
3062 argument of the ‘u*_width_linebreaks’ functions, it indicates the
3063 absence of an override.
3065 The following functions determine the positions at which line breaks
3068 -- Function: void u8_possible_linebreaks (const uint8_t *S, size_t N,
3069 const char *ENCODING, char *P)
3070 -- Function: void u16_possible_linebreaks (const uint16_t *S, size_t N,
3071 const char *ENCODING, char *P)
3072 -- Function: void u32_possible_linebreaks (const uint32_t *S, size_t N,
3073 const char *ENCODING, char *P)
3074 -- Function: void ulc_possible_linebreaks (const char *S, size_t N,
3075 const char *ENCODING, char *P)
3076 Determines the line break points in S, and stores the result at
3077 ‘P[0..N-1]’. Every ‘P[I]’ is assigned one of the values
3078 ‘UC_BREAK_MANDATORY’, ‘UC_BREAK_POSSIBLE’, ‘UC_BREAK_HYPHENATION’,
3079 ‘UC_BREAK_PROHIBITED’.
3081 The following functions determine where line breaks should be
3082 inserted so that each line fits in a given width, when output to a
3083 device that uses non-proportional fonts.
3085 -- Function: int u8_width_linebreaks (const uint8_t *S, size_t N, int
3086 WIDTH, int START_COLUMN, int AT_END_COLUMNS, const char
3087 *OVERRIDE, const char *ENCODING, char *P)
3088 -- Function: int u16_width_linebreaks (const uint16_t *S, size_t N, int
3089 WIDTH, int START_COLUMN, int AT_END_COLUMNS, const char
3090 *OVERRIDE, const char *ENCODING, char *P)
3091 -- Function: int u32_width_linebreaks (const uint32_t *S, size_t N, int
3092 WIDTH, int START_COLUMN, int AT_END_COLUMNS, const char
3093 *OVERRIDE, const char *ENCODING, char *P)
3094 -- Function: int ulc_width_linebreaks (const char *S, size_t N, int
3095 WIDTH, int START_COLUMN, int AT_END_COLUMNS, const char
3096 *OVERRIDE, const char *ENCODING, char *P)
3097 Chooses the best line breaks, assuming that every character
3098 occupies a width given by the ‘uc_width’ function (see *note
3101 The string is ‘S[0..N-1]’.
3103 The maximum number of columns per line is given as WIDTH. The
3104 starting column of the string is given as START_COLUMN. If the
3105 algorithm shall keep room after the last piece, this amount of room
3106 can be given as AT_END_COLUMNS.
3108 OVERRIDE is an optional override; if ‘OVERRIDE[I] !=
3109 UC_BREAK_UNDEFINED’, ‘OVERRIDE[I]’ takes precedence over ‘P[I]’ as
3110 returned by the ‘u*_possible_linebreaks’ function.
3112 The given ENCODING is used for disambiguating widths in ‘uc_width’.
3114 Returns the column after the end of the string, and stores the
3115 result at ‘P[0..N-1]’. Every ‘P[I]’ is assigned one of the values
3116 ‘UC_BREAK_MANDATORY’, ‘UC_BREAK_POSSIBLE’, ‘UC_BREAK_HYPHENATION’,
3117 ‘UC_BREAK_PROHIBITED’. Here the value ‘UC_BREAK_POSSIBLE’
3118 indicates that a line break _should_ be inserted.
3121 File: libunistring.info, Node: uninorm.h, Next: unicase.h, Prev: unilbrk.h, Up: Top
3123 13 Normalization forms (composition and decomposition) ‘<uninorm.h>’
3124 ********************************************************************
3126 This include file defines functions for transforming Unicode strings
3127 to one of the four normal forms, known as NFC, NFD, NKFC, NFKD. These
3128 transformations involve decomposition and — for NFC and NFKC —
3129 composition of Unicode characters.
3133 * Decomposition of characters::
3134 * Composition of characters::
3135 * Normalization of strings::
3136 * Normalizing comparisons::
3137 * Normalization of streams::
3140 File: libunistring.info, Node: Decomposition of characters, Next: Composition of characters, Up: uninorm.h
3142 13.1 Decomposition of Unicode characters
3143 ========================================
3145 The following enumerated values are the possible types of
3146 decomposition of a Unicode character.
3148 -- Constant: int UC_DECOMP_CANONICAL
3149 Denotes canonical decomposition.
3151 -- Constant: int UC_DECOMP_FONT
3152 UCD marker: ‘<font>’. Denotes a font variant (e.g. a blackletter
3155 -- Constant: int UC_DECOMP_NOBREAK
3156 UCD marker: ‘<noBreak>’. Denotes a no-break version of a space or
3159 -- Constant: int UC_DECOMP_INITIAL
3160 UCD marker: ‘<initial>’. Denotes an initial presentation form
3163 -- Constant: int UC_DECOMP_MEDIAL
3164 UCD marker: ‘<medial>’. Denotes a medial presentation form
3167 -- Constant: int UC_DECOMP_FINAL
3168 UCD marker: ‘<final>’. Denotes a final presentation form (Arabic).
3170 -- Constant: int UC_DECOMP_ISOLATED
3171 UCD marker: ‘<isolated>’. Denotes an isolated presentation form
3174 -- Constant: int UC_DECOMP_CIRCLE
3175 UCD marker: ‘<circle>’. Denotes an encircled form.
3177 -- Constant: int UC_DECOMP_SUPER
3178 UCD marker: ‘<super>’. Denotes a superscript form.
3180 -- Constant: int UC_DECOMP_SUB
3181 UCD marker: ‘<sub>’. Denotes a subscript form.
3183 -- Constant: int UC_DECOMP_VERTICAL
3184 UCD marker: ‘<vertical>’. Denotes a vertical layout presentation
3187 -- Constant: int UC_DECOMP_WIDE
3188 UCD marker: ‘<wide>’. Denotes a wide (or zenkaku) compatibility
3191 -- Constant: int UC_DECOMP_NARROW
3192 UCD marker: ‘<narrow>’. Denotes a narrow (or hankaku)
3193 compatibility character.
3195 -- Constant: int UC_DECOMP_SMALL
3196 UCD marker: ‘<small>’. Denotes a small variant form (CNS
3199 -- Constant: int UC_DECOMP_SQUARE
3200 UCD marker: ‘<square>’. Denotes a CJK squared font variant.
3202 -- Constant: int UC_DECOMP_FRACTION
3203 UCD marker: ‘<fraction>’. Denotes a vulgar fraction form.
3205 -- Constant: int UC_DECOMP_COMPAT
3206 UCD marker: ‘<compat>’. Denotes an otherwise unspecified
3207 compatibility character.
3209 The following constant denotes the maximum size of decomposition of a
3210 single Unicode character.
3212 -- Macro: unsigned int UC_DECOMPOSITION_MAX_LENGTH
3213 This macro expands to a constant that is the required size of
3214 buffer passed to the ‘uc_decomposition’ and
3215 ‘uc_canonical_decomposition’ functions.
3217 The following functions decompose a Unicode character.
3219 -- Function: int uc_decomposition (ucs4_t UC, int *DECOMP_TAG, ucs4_t
3221 Returns the character decomposition mapping of the Unicode
3222 character UC. DECOMPOSITION must point to an array of at least
3223 ‘UC_DECOMPOSITION_MAX_LENGTH’ ‘ucs_t’ elements.
3225 When a decomposition exists, ‘DECOMPOSITION[0..N-1]’ and
3226 ‘*DECOMP_TAG’ are filled and N is returned. Otherwise -1 is
3229 -- Function: int uc_canonical_decomposition (ucs4_t UC, ucs4_t
3231 Returns the canonical character decomposition mapping of the
3232 Unicode character UC. DECOMPOSITION must point to an array of at
3233 least ‘UC_DECOMPOSITION_MAX_LENGTH’ ‘ucs_t’ elements.
3235 When a decomposition exists, ‘DECOMPOSITION[0..N-1]’ is filled and
3236 N is returned. Otherwise -1 is returned.
3239 File: libunistring.info, Node: Composition of characters, Next: Normalization of strings, Prev: Decomposition of characters, Up: uninorm.h
3241 13.2 Composition of Unicode characters
3242 ======================================
3244 The following function composes a Unicode character from two Unicode
3247 -- Function: ucs4_t uc_composition (ucs4_t UC1, ucs4_t UC2)
3248 Attempts to combine the Unicode characters UC1, UC2. UC1 is known
3249 to have canonical combining class 0.
3251 Returns the combination of UC1 and UC2, if it exists. Returns 0
3254 Not all decompositions can be recombined using this function. See
3255 the Unicode file ‘CompositionExclusions.txt’ for details.
3258 File: libunistring.info, Node: Normalization of strings, Next: Normalizing comparisons, Prev: Composition of characters, Up: uninorm.h
3260 13.3 Normalization of strings
3261 =============================
3263 The Unicode standard defines four normalization forms for Unicode
3264 strings. The following type is used to denote a normalization form.
3267 An object of type ‘uninorm_t’ denotes a Unicode normalization form.
3268 This is a scalar type; its values can be compared with ‘==’.
3270 The following constants denote the four normalization forms.
3272 -- Macro: uninorm_t UNINORM_NFD
3273 Denotes Normalization form D: canonical decomposition.
3275 -- Macro: uninorm_t UNINORM_NFC
3276 Normalization form C: canonical decomposition, then canonical
3279 -- Macro: uninorm_t UNINORM_NFKD
3280 Normalization form KD: compatibility decomposition.
3282 -- Macro: uninorm_t UNINORM_NFKC
3283 Normalization form KC: compatibility decomposition, then canonical
3286 The following functions operate on ‘uninorm_t’ objects.
3288 -- Function: bool uninorm_is_compat_decomposing (uninorm_t NF)
3289 Tests whether the normalization form NF does compatibility
3292 -- Function: bool uninorm_is_composing (uninorm_t NF)
3293 Tests whether the normalization form NF includes canonical
3296 -- Function: uninorm_t uninorm_decomposing_form (uninorm_t NF)
3297 Returns the decomposing variant of the normalization form NF. This
3298 maps NFC,NFD → NFD and NFKC,NFKD → NFKD.
3300 The following functions apply a Unicode normalization form to a
3303 -- Function: uint8_t * u8_normalize (uninorm_t NF, const uint8_t *S,
3304 size_t N, uint8_t *RESULTBUF, size_t *LENGTHP)
3305 -- Function: uint16_t * u16_normalize (uninorm_t NF, const uint16_t *S,
3306 size_t N, uint16_t *RESULTBUF, size_t *LENGTHP)
3307 -- Function: uint32_t * u32_normalize (uninorm_t NF, const uint32_t *S,
3308 size_t N, uint32_t *RESULTBUF, size_t *LENGTHP)
3309 Returns the specified normalization form of a string.
3312 File: libunistring.info, Node: Normalizing comparisons, Next: Normalization of streams, Prev: Normalization of strings, Up: uninorm.h
3314 13.4 Normalizing comparisons
3315 ============================
3317 The following functions compare Unicode string, ignoring differences
3320 -- Function: int u8_normcmp (const uint8_t *S1, size_t N1, const
3321 uint8_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
3322 -- Function: int u16_normcmp (const uint16_t *S1, size_t N1, const
3323 uint16_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
3324 -- Function: int u32_normcmp (const uint32_t *S1, size_t N1, const
3325 uint32_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
3326 Compares S1 and S2, ignoring differences in normalization.
3328 NF must be either ‘UNINORM_NFD’ or ‘UNINORM_NFKD’.
3330 If successful, sets ‘*RESULTP’ to -1 if S1 < S2, 0 if S1 = S2, 1 if
3331 S1 > S2, and returns 0. Upon failure, returns -1 with ‘errno’ set.
3333 -- Function: char * u8_normxfrm (const uint8_t *S, size_t N, uninorm_t
3334 NF, char *RESULTBUF, size_t *LENGTHP)
3335 -- Function: char * u16_normxfrm (const uint16_t *S, size_t N,
3336 uninorm_t NF, char *RESULTBUF, size_t *LENGTHP)
3337 -- Function: char * u32_normxfrm (const uint32_t *S, size_t N,
3338 uninorm_t NF, char *RESULTBUF, size_t *LENGTHP)
3339 Converts the string S of length N to a NUL-terminated byte
3340 sequence, in such a way that comparing ‘u8_normxfrm (S1)’ and
3341 ‘u8_normxfrm (S2)’ with the ‘u8_cmp2’ function is equivalent to
3342 comparing S1 and S2 with the ‘u8_normcoll’ function.
3344 NF must be either ‘UNINORM_NFC’ or ‘UNINORM_NFKC’.
3346 -- Function: int u8_normcoll (const uint8_t *S1, size_t N1, const
3347 uint8_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
3348 -- Function: int u16_normcoll (const uint16_t *S1, size_t N1, const
3349 uint16_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
3350 -- Function: int u32_normcoll (const uint32_t *S1, size_t N1, const
3351 uint32_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
3352 Compares S1 and S2, ignoring differences in normalization, using
3353 the collation rules of the current locale.
3355 NF must be either ‘UNINORM_NFC’ or ‘UNINORM_NFKC’.
3357 If successful, sets ‘*RESULTP’ to -1 if S1 < S2, 0 if S1 = S2, 1 if
3358 S1 > S2, and returns 0. Upon failure, returns -1 with ‘errno’ set.
3361 File: libunistring.info, Node: Normalization of streams, Prev: Normalizing comparisons, Up: uninorm.h
3363 13.5 Normalization of streams of Unicode characters
3364 ===================================================
3366 A “stream of Unicode characters” is essentially a function that
3367 accepts an ‘ucs4_t’ argument repeatedly, optionally combined with a
3368 function that “flushes” the stream.
3370 -- Type: struct uninorm_filter
3371 This is the data type of a stream of Unicode characters that
3372 normalizes its input according to a given normalization form and
3373 passes the normalized character sequence to the encapsulated stream
3374 of Unicode characters.
3376 -- Function: struct uninorm_filter * uninorm_filter_create (uninorm_t
3377 NF, int (*STREAM_FUNC) (void *STREAM_DATA, ucs4_t UC), void
3379 Creates and returns a normalization filter for Unicode characters.
3381 The pair (STREAM_FUNC, STREAM_DATA) is the encapsulated stream.
3382 ‘STREAM_FUNC (STREAM_DATA, UC)’ receives the Unicode character UC
3383 and returns 0 if successful, or -1 with ‘errno’ set upon failure.
3385 Returns the new filter, or NULL with ‘errno’ set upon failure.
3387 -- Function: int uninorm_filter_write (struct uninorm_filter *FILTER,
3389 Stuffs a Unicode character into a normalizing filter. Returns 0 if
3390 successful, or -1 with ‘errno’ set upon failure.
3392 -- Function: int uninorm_filter_flush (struct uninorm_filter *FILTER)
3393 Brings data buffered in the filter to its destination, the
3394 encapsulated stream.
3396 Returns 0 if successful, or -1 with ‘errno’ set upon failure.
3398 Note! If after calling this function, additional characters are
3399 written into the filter, the resulting character sequence in the
3400 encapsulated stream will not necessarily be normalized.
3402 -- Function: int uninorm_filter_free (struct uninorm_filter *FILTER)
3403 Brings data buffered in the filter to its destination, the
3404 encapsulated stream, then closes and frees the filter.
3406 Returns 0 if successful, or -1 with ‘errno’ set upon failure.
3409 File: libunistring.info, Node: unicase.h, Next: uniregex.h, Prev: uninorm.h, Up: Top
3411 14 Case mappings ‘<unicase.h>’
3412 ******************************
3414 This include file defines functions for case mapping for Unicode
3415 strings and case insensitive comparison of Unicode strings and C
3418 These string functions fix the problems that were mentioned in *note
3419 char * strings::, namely, they handle the Croatian LETTER DZ WITH CARON,
3420 the German LATIN SMALL LETTER SHARP S, the Greek sigma and the
3421 Lithuanian i correctly.
3425 * Case mappings of characters::
3426 * Case mappings of strings::
3427 * Case mappings of substrings::
3428 * Case insensitive comparison::
3432 File: libunistring.info, Node: Case mappings of characters, Next: Case mappings of strings, Up: unicase.h
3434 14.1 Case mappings of characters
3435 ================================
3437 The following functions implement case mappings on Unicode characters
3438 — for those cases only where the result of the mapping is a again a
3439 single Unicode character.
3441 These mappings are locale and context independent.
3443 *WARNING!* These functions are not sufficient for languages such as
3444 German, Greek and Lithuanian. Better use the functions below that treat
3445 an entire string at once and are language aware.
3447 -- Function: ucs4_t uc_toupper (ucs4_t UC)
3448 Returns the uppercase mapping of the Unicode character UC.
3450 -- Function: ucs4_t uc_tolower (ucs4_t UC)
3451 Returns the lowercase mapping of the Unicode character UC.
3453 -- Function: ucs4_t uc_totitle (ucs4_t UC)
3454 Returns the titlecase mapping of the Unicode character UC.
3456 The titlecase mapping of a character is to be used when the
3457 character should look like upper case and the following characters
3460 For most characters, this is the same as the uppercase mapping.
3461 There are only few characters where the title case variant and the
3462 uuper case variant are different. These characters occur in the
3463 Latin writing of the Croatian, Bosnian, and Serbian languages.
3465 Lower case Title case Upper case
3466 ---------------------------------------------------------------------
3467 LATIN SMALL LETTER LATIN CAPITAL LETTER LATIN CAPITAL LETTER
3468 LJ L WITH SMALL LETTER LJ
3470 LATIN SMALL LETTER LATIN CAPITAL LETTER LATIN CAPITAL LETTER
3471 NJ N WITH SMALL LETTER NJ
3473 LATIN SMALL LETTER LATIN CAPITAL LETTER LATIN CAPITAL LETTER
3474 DZ D WITH SMALL LETTER DZ
3476 LATIN SMALL LETTER LATIN CAPITAL LETTER LATIN CAPITAL LETTER
3477 DZ WITH CARON D WITH SMALL LETTER DZ WITH CARON
3481 File: libunistring.info, Node: Case mappings of strings, Next: Case mappings of substrings, Prev: Case mappings of characters, Up: unicase.h
3483 14.2 Case mappings of strings
3484 =============================
3486 Case mapping should always be performed on entire strings, not on
3487 individual characters. The functions in this sections do so.
3489 These functions allow to apply a normalization after the case
3490 mapping. The reason is that if you want to treat ‘ä’ and ‘Ä’ the same,
3491 you most often also want to treat the composed and decomposed forms of
3492 such a character, U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS and
3493 U+0041 LATIN CAPITAL LETTER A U+0308 COMBINING DIAERESIS the same. The
3494 NF argument designates the normalization.
3496 These functions are locale dependent. The ISO639_LANGUAGE argument
3497 identifies the language (e.g. ‘"tr"’ for Turkish). NULL means to use
3498 locale independent case mappings.
3500 -- Function: const char * uc_locale_language ()
3501 Returns the ISO 639 language code of the current locale. Returns
3502 ‘""’ if it is unknown, or in the "C" locale.
3504 -- Function: uint8_t * u8_toupper (const uint8_t *S, size_t N, const
3505 char *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF,
3507 -- Function: uint16_t * u16_toupper (const uint16_t *S, size_t N, const
3508 char *ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF,
3510 -- Function: uint32_t * u32_toupper (const uint32_t *S, size_t N, const
3511 char *ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF,
3513 Returns the uppercase mapping of a string.
3515 The NF argument identifies the normalization form to apply after
3516 the case-mapping. It can also be NULL, for no normalization.
3518 -- Function: uint8_t * u8_tolower (const uint8_t *S, size_t N, const
3519 char *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF,
3521 -- Function: uint16_t * u16_tolower (const uint16_t *S, size_t N, const
3522 char *ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF,
3524 -- Function: uint32_t * u32_tolower (const uint32_t *S, size_t N, const
3525 char *ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF,
3527 Returns the lowercase mapping of a string.
3529 The NF argument identifies the normalization form to apply after
3530 the case-mapping. It can also be NULL, for no normalization.
3532 -- Function: uint8_t * u8_totitle (const uint8_t *S, size_t N, const
3533 char *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF,
3535 -- Function: uint16_t * u16_totitle (const uint16_t *S, size_t N, const
3536 char *ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF,
3538 -- Function: uint32_t * u32_totitle (const uint32_t *S, size_t N, const
3539 char *ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF,
3541 Returns the titlecase mapping of a string.
3543 Mapping to title case means that, in each word, the first cased
3544 character is being mapped to title case and the remaining
3545 characters of the word are being mapped to lower case.
3547 The NF argument identifies the normalization form to apply after
3548 the case-mapping. It can also be NULL, for no normalization.
3551 File: libunistring.info, Node: Case mappings of substrings, Next: Case insensitive comparison, Prev: Case mappings of strings, Up: unicase.h
3553 14.3 Case mappings of substrings
3554 ================================
3556 Case mapping of a substring cannot simply be performed by extracting
3557 the substring and then applying the case mapping function to it. This
3558 does not work because case mapping requires some information about the
3559 surrounding characters. The following functions allow to apply case
3560 mappings to substrings of a given string, while taking into account the
3561 characters that precede it (the “prefix”) and the characters that follow
3564 -- Type: casing_prefix_context_t
3565 This data type denotes the case-mapping context that is given by a
3566 prefix string. It is an immediate type that can be copied by
3567 simple assignment, without involving memory allocation. It is not
3570 -- Constant: casing_prefix_context_t unicase_empty_prefix_context
3571 This constant is the case-mapping context that corresponds to an
3572 empty prefix string.
3574 The following functions return ‘casing_prefix_context_t’ objects:
3576 -- Function: casing_prefix_context_t u8_casing_prefix_context (const
3577 uint8_t *S, size_t N)
3578 -- Function: casing_prefix_context_t u16_casing_prefix_context (const
3579 uint16_t *S, size_t N)
3580 -- Function: casing_prefix_context_t u32_casing_prefix_context (const
3581 uint32_t *S, size_t N)
3582 Returns the case-mapping context of a given prefix string.
3584 -- Function: casing_prefix_context_t u8_casing_prefixes_context (const
3585 uint8_t *S, size_t N, casing_prefix_context_t A_CONTEXT)
3586 -- Function: casing_prefix_context_t u16_casing_prefixes_context (const
3587 uint16_t *S, size_t N, casing_prefix_context_t A_CONTEXT)
3588 -- Function: casing_prefix_context_t u32_casing_prefixes_context (const
3589 uint32_t *S, size_t N, casing_prefix_context_t A_CONTEXT)
3590 Returns the case-mapping context of the prefix concat(A, S), given
3591 the case-mapping context of the prefix A.
3593 -- Type: casing_suffix_context_t
3594 This data type denotes the case-mapping context that is given by a
3595 suffix string. It is an immediate type that can be copied by
3596 simple assignment, without involving memory allocation. It is not
3599 -- Constant: casing_suffix_context_t unicase_empty_suffix_context
3600 This constant is the case-mapping context that corresponds to an
3601 empty suffix string.
3603 The following functions return ‘casing_suffix_context_t’ objects:
3605 -- Function: casing_suffix_context_t u8_casing_suffix_context (const
3606 uint8_t *S, size_t N)
3607 -- Function: casing_suffix_context_t u16_casing_suffix_context (const
3608 uint16_t *S, size_t N)
3609 -- Function: casing_suffix_context_t u32_casing_suffix_context (const
3610 uint32_t *S, size_t N)
3611 Returns the case-mapping context of a given suffix string.
3613 -- Function: casing_suffix_context_t u8_casing_suffixes_context (const
3614 uint8_t *S, size_t N, casing_suffix_context_t A_CONTEXT)
3615 -- Function: casing_suffix_context_t u16_casing_suffixes_context (const
3616 uint16_t *S, size_t N, casing_suffix_context_t A_CONTEXT)
3617 -- Function: casing_suffix_context_t u32_casing_suffixes_context (const
3618 uint32_t *S, size_t N, casing_suffix_context_t A_CONTEXT)
3619 Returns the case-mapping context of the suffix concat(S, A), given
3620 the case-mapping context of the suffix A.
3622 The following functions perform a case mapping, considering the
3623 prefix context and the suffix context.
3625 -- Function: uint8_t * u8_ct_toupper (const uint8_t *S, size_t N,
3626 casing_prefix_context_t PREFIX_CONTEXT,
3627 casing_suffix_context_t SUFFIX_CONTEXT, const char
3628 *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF, size_t
3630 -- Function: uint16_t * u16_ct_toupper (const uint16_t *S, size_t N,
3631 casing_prefix_context_t PREFIX_CONTEXT,
3632 casing_suffix_context_t SUFFIX_CONTEXT, const char
3633 *ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF, size_t
3635 -- Function: uint32_t * u32_ct_toupper (const uint32_t *S, size_t N,
3636 casing_prefix_context_t PREFIX_CONTEXT,
3637 casing_suffix_context_t SUFFIX_CONTEXT, const char
3638 *ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF, size_t
3640 Returns the uppercase mapping of a string that is surrounded by a
3641 prefix and a suffix.
3643 -- Function: uint8_t * u8_ct_tolower (const uint8_t *S, size_t N,
3644 casing_prefix_context_t PREFIX_CONTEXT,
3645 casing_suffix_context_t SUFFIX_CONTEXT, const char
3646 *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF, size_t
3648 -- Function: uint16_t * u16_ct_tolower (const uint16_t *S, size_t N,
3649 casing_prefix_context_t PREFIX_CONTEXT,
3650 casing_suffix_context_t SUFFIX_CONTEXT, const char
3651 *ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF, size_t
3653 -- Function: uint32_t * u32_ct_tolower (const uint32_t *S, size_t N,
3654 casing_prefix_context_t PREFIX_CONTEXT,
3655 casing_suffix_context_t SUFFIX_CONTEXT, const char
3656 *ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF, size_t
3658 Returns the lowercase mapping of a string that is surrounded by a
3659 prefix and a suffix.
3661 -- Function: uint8_t * u8_ct_totitle (const uint8_t *S, size_t N,
3662 casing_prefix_context_t PREFIX_CONTEXT,
3663 casing_suffix_context_t SUFFIX_CONTEXT, const char
3664 *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF, size_t
3666 -- Function: uint16_t * u16_ct_totitle (const uint16_t *S, size_t N,
3667 casing_prefix_context_t PREFIX_CONTEXT,
3668 casing_suffix_context_t SUFFIX_CONTEXT, const char
3669 *ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF, size_t
3671 -- Function: uint32_t * u32_ct_totitle (const uint32_t *S, size_t N,
3672 casing_prefix_context_t PREFIX_CONTEXT,
3673 casing_suffix_context_t SUFFIX_CONTEXT, const char
3674 *ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF, size_t
3676 Returns the titlecase mapping of a string that is surrounded by a
3677 prefix and a suffix.
3679 For example, to uppercase the UTF-8 substring between ‘s +
3680 start_index’ and ‘s + end_index’ of a string that extends from ‘s’ to ‘s
3681 + u8_strlen (s)’, you can use the statements
3683 size_t result_length;
3685 u8_ct_toupper (s + start_index, end_index - start_index,
3686 u8_casing_prefix_context (s, start_index),
3687 u8_casing_suffix_context (s + end_index,
3688 u8_strlen (s) - end_index),
3689 iso639_language, NULL, NULL, &result_length);
3692 File: libunistring.info, Node: Case insensitive comparison, Next: Case detection, Prev: Case mappings of substrings, Up: unicase.h
3694 14.4 Case insensitive comparison
3695 ================================
3697 The following functions implement comparison that ignores differences
3698 in case and normalization.
3700 -- Function: uint8_t * u8_casefold (const uint8_t *S, size_t N, const
3701 char *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF,
3703 -- Function: uint16_t * u16_casefold (const uint16_t *S, size_t N,
3704 const char *ISO639_LANGUAGE, uninorm_t NF, uint16_t
3705 *RESULTBUF, size_t *LENGTHP)
3706 -- Function: uint32_t * u32_casefold (const uint32_t *S, size_t N,
3707 const char *ISO639_LANGUAGE, uninorm_t NF, uint32_t
3708 *RESULTBUF, size_t *LENGTHP)
3709 Returns the case folded string.
3711 Comparing ‘u8_casefold (S1)’ and ‘u8_casefold (S2)’ with the
3712 ‘u8_cmp2’ function is equivalent to comparing S1 and S2 with
3715 The NF argument identifies the normalization form to apply after
3716 the case-mapping. It can also be NULL, for no normalization.
3718 -- Function: uint8_t * u8_ct_casefold (const uint8_t *S, size_t N,
3719 casing_prefix_context_t PREFIX_CONTEXT,
3720 casing_suffix_context_t SUFFIX_CONTEXT, const char
3721 *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF, size_t
3723 -- Function: uint16_t * u16_ct_casefold (const uint16_t *S, size_t N,
3724 casing_prefix_context_t PREFIX_CONTEXT,
3725 casing_suffix_context_t SUFFIX_CONTEXT, const char
3726 *ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF, size_t
3728 -- Function: uint32_t * u32_ct_casefold (const uint32_t *S, size_t N,
3729 casing_prefix_context_t PREFIX_CONTEXT,
3730 casing_suffix_context_t SUFFIX_CONTEXT, const char
3731 *ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF, size_t
3733 Returns the case folded string. The case folding takes into
3734 account the case mapping contexts of the prefix and suffix strings.
3736 -- Function: int u8_casecmp (const uint8_t *S1, size_t N1, const
3737 uint8_t *S2, size_t N2, const char *ISO639_LANGUAGE, uninorm_t
3739 -- Function: int u16_casecmp (const uint16_t *S1, size_t N1, const
3740 uint16_t *S2, size_t N2, const char *ISO639_LANGUAGE,
3741 uninorm_t NF, int *RESULTP)
3742 -- Function: int u32_casecmp (const uint32_t *S1, size_t N1, const
3743 uint32_t *S2, size_t N2, const char *ISO639_LANGUAGE,
3744 uninorm_t NF, int *RESULTP)
3745 -- Function: int ulc_casecmp (const char *S1, size_t N1, const char
3746 *S2, size_t N2, const char *ISO639_LANGUAGE, uninorm_t NF, int
3748 Compares S1 and S2, ignoring differences in case and normalization.
3750 The NF argument identifies the normalization form to apply after
3751 the case-mapping. It can also be NULL, for no normalization.
3753 If successful, sets ‘*RESULTP’ to -1 if S1 < S2, 0 if S1 = S2, 1 if
3754 S1 > S2, and returns 0. Upon failure, returns -1 with ‘errno’ set.
3756 The following functions additionally take into account the sorting
3757 rules of the current locale.
3759 -- Function: char * u8_casexfrm (const uint8_t *S, size_t N, const char
3760 *ISO639_LANGUAGE, uninorm_t NF, char *RESULTBUF, size_t
3762 -- Function: char * u16_casexfrm (const uint16_t *S, size_t N, const
3763 char *ISO639_LANGUAGE, uninorm_t NF, char *RESULTBUF, size_t
3765 -- Function: char * u32_casexfrm (const uint32_t *S, size_t N, const
3766 char *ISO639_LANGUAGE, uninorm_t NF, char *RESULTBUF, size_t
3768 -- Function: char * ulc_casexfrm (const char *S, size_t N, const char
3769 *ISO639_LANGUAGE, uninorm_t NF, char *RESULTBUF, size_t
3771 Converts the string S of length N to a NUL-terminated byte
3772 sequence, in such a way that comparing ‘u8_casexfrm (S1)’ and
3773 ‘u8_casexfrm (S2)’ with the gnulib function ‘memcmp2’ is equivalent
3774 to comparing S1 and S2 with ‘u8_casecoll’.
3776 NF must be either ‘UNINORM_NFC’, ‘UNINORM_NFKC’, or NULL for no
3779 -- Function: int u8_casecoll (const uint8_t *S1, size_t N1, const
3780 uint8_t *S2, size_t N2, const char *ISO639_LANGUAGE, uninorm_t
3782 -- Function: int u16_casecoll (const uint16_t *S1, size_t N1, const
3783 uint16_t *S2, size_t N2, const char *ISO639_LANGUAGE,
3784 uninorm_t NF, int *RESULTP)
3785 -- Function: int u32_casecoll (const uint32_t *S1, size_t N1, const
3786 uint32_t *S2, size_t N2, const char *ISO639_LANGUAGE,
3787 uninorm_t NF, int *RESULTP)
3788 -- Function: int ulc_casecoll (const char *S1, size_t N1, const char
3789 *S2, size_t N2, const char *ISO639_LANGUAGE, uninorm_t NF, int
3791 Compares S1 and S2, ignoring differences in case and normalization,
3792 using the collation rules of the current locale.
3794 The NF argument identifies the normalization form to apply after
3795 the case-mapping. It must be either ‘UNINORM_NFC’ or
3796 ‘UNINORM_NFKC’. It can also be NULL, for no normalization.
3798 If successful, sets ‘*RESULTP’ to -1 if S1 < S2, 0 if S1 = S2, 1 if
3799 S1 > S2, and returns 0. Upon failure, returns -1 with ‘errno’ set.
3802 File: libunistring.info, Node: Case detection, Prev: Case insensitive comparison, Up: unicase.h
3807 The following functions determine whether a Unicode string is
3808 entirely in upper case. or entirely in lower case, or entirely in title
3809 case, or already case-folded.
3811 -- Function: int u8_is_uppercase (const uint8_t *S, size_t N, const
3812 char *ISO639_LANGUAGE, bool *RESULTP)
3813 -- Function: int u16_is_uppercase (const uint16_t *S, size_t N, const
3814 char *ISO639_LANGUAGE, bool *RESULTP)
3815 -- Function: int u32_is_uppercase (const uint32_t *S, size_t N, const
3816 char *ISO639_LANGUAGE, bool *RESULTP)
3817 Sets ‘*RESULTP’ to true if mapping NFD(S) to upper case is a no-op,
3818 or to false otherwise, and returns 0. Upon failure, returns -1
3821 -- Function: int u8_is_lowercase (const uint8_t *S, size_t N, const
3822 char *ISO639_LANGUAGE, bool *RESULTP)
3823 -- Function: int u16_is_lowercase (const uint16_t *S, size_t N, const
3824 char *ISO639_LANGUAGE, bool *RESULTP)
3825 -- Function: int u32_is_lowercase (const uint32_t *S, size_t N, const
3826 char *ISO639_LANGUAGE, bool *RESULTP)
3827 Sets ‘*RESULTP’ to true if mapping NFD(S) to lower case is a no-op,
3828 or to false otherwise, and returns 0. Upon failure, returns -1
3831 -- Function: int u8_is_titlecase (const uint8_t *S, size_t N, const
3832 char *ISO639_LANGUAGE, bool *RESULTP)
3833 -- Function: int u16_is_titlecase (const uint16_t *S, size_t N, const
3834 char *ISO639_LANGUAGE, bool *RESULTP)
3835 -- Function: int u32_is_titlecase (const uint32_t *S, size_t N, const
3836 char *ISO639_LANGUAGE, bool *RESULTP)
3837 Sets ‘*RESULTP’ to true if mapping NFD(S) to title case is a no-op,
3838 or to false otherwise, and returns 0. Upon failure, returns -1
3841 -- Function: int u8_is_casefolded (const uint8_t *S, size_t N, const
3842 char *ISO639_LANGUAGE, bool *RESULTP)
3843 -- Function: int u16_is_casefolded (const uint16_t *S, size_t N, const
3844 char *ISO639_LANGUAGE, bool *RESULTP)
3845 -- Function: int u32_is_casefolded (const uint32_t *S, size_t N, const
3846 char *ISO639_LANGUAGE, bool *RESULTP)
3847 Sets ‘*RESULTP’ to true if applying case folding to NFD(S) is a
3848 no-op, or to false otherwise, and returns 0. Upon failure, returns
3849 -1 with ‘errno’ set.
3851 The following functions determine whether case mappings have any
3852 effect on a Unicode string.
3854 -- Function: int u8_is_cased (const uint8_t *S, size_t N, const char
3855 *ISO639_LANGUAGE, bool *RESULTP)
3856 -- Function: int u16_is_cased (const uint16_t *S, size_t N, const char
3857 *ISO639_LANGUAGE, bool *RESULTP)
3858 -- Function: int u32_is_cased (const uint32_t *S, size_t N, const char
3859 *ISO639_LANGUAGE, bool *RESULTP)
3860 Sets ‘*RESULTP’ to true if case matters for S, that is, if mapping
3861 NFD(S) to either upper case or lower case or title case is not a
3862 no-op. Set ‘*RESULTP’ to false if NFD(S) maps to itself under the
3863 upper case mapping, under the lower case mapping, and under the
3864 title case mapping; in other words, when NFD(S) consists entirely
3865 of caseless characters. Upon failure, returns -1 with ‘errno’ set.
3868 File: libunistring.info, Node: uniregex.h, Next: Using the library, Prev: unicase.h, Up: Top
3870 15 Regular expressions ‘<uniregex.h>’
3871 *************************************
3873 This include file is not yet implemented.
3876 File: libunistring.info, Node: Using the library, Next: More functionality, Prev: uniregex.h, Up: Top
3878 16 Using the library
3879 ********************
3881 This chapter explains some practical considerations, regarding the
3882 installation and compiler options that are needed in order to use this
3888 * Compiler options::
3891 * Reporting problems::
3894 File: libunistring.info, Node: Installation, Next: Compiler options, Up: Using the library
3899 Before you can use the library, it must be installed. First, you
3900 have to make sure all dependencies are installed. They are listed in
3901 the file ‘DEPENDENCIES’.
3903 Then you can proceed to build and install the library, as described
3904 in the file ‘INSTALL’. For installation on Windows systems, please
3905 refer to the file ‘README.woe32’.
3908 File: libunistring.info, Node: Compiler options, Next: Include files, Prev: Installation, Up: Using the library
3910 16.2 Compiler options
3911 =====================
3913 Let’s denote as ‘LIBUNISTRING_PREFIX’ the value of the ‘--prefix’
3914 option that you passed to ‘configure’ while installing this package. If
3915 you didn’t pass any ‘--prefix’ option, then the package is installed in
3918 Let’s denote as ‘LIBUNISTRING_INCLUDEDIR’ the directory where the
3919 include files were installed. This is usually the same as
3920 ‘${LIBUNISTRING_PREFIX}/include’. Except that if you passed an
3921 ‘--includedir’ option to ‘configure’, it is the value of that option.
3923 Let’s further denote as ‘LIBUNISTRING_LIBDIR’ the directory where the
3924 library itself was installed. This is the value that you passed with
3925 the ‘--libdir’ option to ‘configure’, or otherwise the same as
3926 ‘${LIBUNISTRING_PREFIX}/lib’. Recall that when building in 64-bit mode
3927 on a 64-bit GNU/Linux system that supports executables in either 64-bit
3928 mode or 32-bit mode, you should have used the option
3929 ‘--libdir=${LIBUNISTRING_PREFIX}/lib64’.
3931 So that the compiler finds the include files, you have to pass it the
3932 option ‘-I${LIBUNISTRING_INCLUDEDIR}’.
3934 So that the compiler finds the library during its linking pass, you
3935 have to pass it the options ‘-L${LIBUNISTRING_LIBDIR} -lunistring’. On
3936 some systems, in some configurations, you also have to pass options
3937 needed for linking with ‘libiconv’. The autoconf macro
3938 ‘gl_LIBUNISTRING’ (see *note Autoconf macro::) deals with this
3942 File: libunistring.info, Node: Include files, Next: Autoconf macro, Prev: Compiler options, Up: Using the library
3947 Most of the include files have been presented in the introduction,
3948 see *note Introduction::, and subsequent detailed chapters.
3950 Another include file is ‘<unistring/version.h>’. It contains the
3951 version number of the libunistring library.
3953 -- Macro: int _LIBUNISTRING_VERSION
3954 This constant contains the version of libunistring that is being
3955 used at compile time. It encodes the major and minor parts of the
3956 version number only. These parts are encoded in the form
3957 ‘(major<<8) + minor’.
3959 -- Constant: int _libunistring_version
3960 This constant contains the version of libunistring that is being
3961 used at run time. It encodes the major and minor parts of the
3962 version number only. These parts are encoded in the form
3963 ‘(major<<8) + minor’.
3965 It is possible that ‘_libunistring_version’ is greater than
3966 ‘_LIBUNISTRING_VERSION’. This can happen when you use ‘libunistring’ as
3967 a shared library, and a newer, binary backward-compatible version has
3968 been installed after your program that uses ‘libunistring’ was
3972 File: libunistring.info, Node: Autoconf macro, Next: Reporting problems, Prev: Include files, Up: Using the library
3977 GNU Gnulib provides an autoconf macro that tests for the availability
3978 of ‘libunistring’. It is contained in the Gnulib module ‘libunistring’,
3980 <http://www.gnu.org/software/gnulib/MODULES.html#module=libunistring>.
3982 The macro is called ‘gl_LIBUNISTRING’. It searches for an installed
3983 libunistring. If found, it sets and AC_SUBSTs ‘HAVE_LIBUNISTRING=yes’
3984 and the ‘LIBUNISTRING’ and ‘LTLIBUNISTRING’ variables and augments the
3985 ‘CPPFLAGS’ variable, and defines the C macro ‘HAVE_LIBUNISTRING’ to 1.
3986 Otherwise, it sets and AC_SUBSTs ‘HAVE_LIBUNISTRING=no’ and
3987 ‘LIBUNISTRING’ and ‘LTLIBUNISTRING’ to empty.
3989 The complexities that ‘gl_LIBUNISTRING’ deals with are the following:
3991 • On some operating systems, in some configurations, libunistring
3992 depends on ‘libiconv’, and the options for linking with libiconv
3993 must be mentioned explicitly on the link command line.
3995 • GNU ‘libunistring’, if installed, is not necessarily already in the
3996 search path (‘CPPFLAGS’ for the include file search path, ‘LDFLAGS’
3997 for the library search path).
3999 • GNU ‘libunistring’, if installed, is not necessarily already in the
4000 run time library search path. To avoid the need for setting an
4001 environment variable like ‘LD_LIBRARY_PATH’, the macro adds the
4002 appropriate run time search path options to the ‘LIBUNISTRING’
4003 variable. This works on most systems.
4006 File: libunistring.info, Node: Reporting problems, Prev: Autoconf macro, Up: Using the library
4008 16.5 Reporting problems
4009 =======================
4011 If you encounter any problem, please don’t hesitate to send a
4012 detailed bug report to the ‘bug-libunistring@gnu.org’ mailing list. You
4013 can alternatively also use the bug tracker at the project page
4014 <https://savannah.gnu.org/projects/libunistring>.
4016 Please always include the version number of this library, and a short
4017 description of your operating system and compilation environment with
4018 corresponding version numbers.
4020 For problems that appear while building and installing
4021 ‘libunistring’, for which you don’t find the remedy in the ‘INSTALL’
4022 file, please include a description of the options that you passed to the
4026 File: libunistring.info, Node: More functionality, Next: Licenses, Prev: Using the library, Up: Top
4028 17 More advanced functionality
4029 ******************************
4031 For bidirectional reordering of strings, we recommend the GNU FriBidi
4032 library: <http://www.fribidi.org/>.
4034 For the rendering of Unicode strings outside of the context of a
4035 given toolkit (KDE/Qt or GNOME/Gtk), we recommend the Pango library:
4036 <http://www.pango.org/>.
4039 File: libunistring.info, Node: Licenses, Next: Index, Prev: More functionality, Up: Top
4044 The files of this package are covered by the licenses indicated in
4045 each particular file or directory. Here is a summary:
4047 • The ‘libunistring’ library is covered by the GNU Lesser General
4048 Public License (LGPL). A copy of the license is included in *note
4051 • This manual is free documentation. It is dually licensed under the
4052 GNU FDL and the GNU GPL. This means that you can redistribute this
4053 manual under either of these two licenses, at your choice.
4054 This manual is covered by the GNU FDL. Permission is granted to
4055 copy, distribute and/or modify this document under the terms of the
4056 GNU Free Documentation License (FDL), either version 1.2 of the
4057 License, or (at your option) any later version published by the
4058 Free Software Foundation (FSF); with no Invariant Sections, with no
4059 Front-Cover Text, and with no Back-Cover Texts. A copy of the
4060 license is included in *note GNU FDL::.
4061 This manual is covered by the GNU GPL. You can redistribute it
4062 and/or modify it under the terms of the GNU General Public License
4063 (GPL), either version 3 of the License, or (at your option) any
4064 later version published by the Free Software Foundation (FSF). A
4065 copy of the license is included in *note GNU GPL::.
4069 * GNU GPL:: GNU General Public License
4070 * GNU LGPL:: GNU Lesser General Public License
4071 * GNU FDL:: GNU Free Documentation License
4074 File: libunistring.info, Node: GNU GPL, Next: GNU LGPL, Up: Licenses
4076 A.1 GNU GENERAL PUBLIC LICENSE
4077 ==============================
4079 Version 3, 29 June 2007
4081 Copyright © 2007 Free Software Foundation, Inc. <http://fsf.org/>
4083 Everyone is permitted to copy and distribute verbatim copies of this
4084 license document, but changing it is not allowed.
4089 The GNU General Public License is a free, copyleft license for
4090 software and other kinds of works.
4092 The licenses for most software and other practical works are designed
4093 to take away your freedom to share and change the works. By contrast,
4094 the GNU General Public License is intended to guarantee your freedom to
4095 share and change all versions of a program—to make sure it remains free
4096 software for all its users. We, the Free Software Foundation, use the
4097 GNU General Public License for most of our software; it applies also to
4098 any other work released this way by its authors. You can apply it to
4101 When we speak of free software, we are referring to freedom, not
4102 price. Our General Public Licenses are designed to make sure that you
4103 have the freedom to distribute copies of free software (and charge for
4104 them if you wish), that you receive source code or can get it if you
4105 want it, that you can change the software or use pieces of it in new
4106 free programs, and that you know you can do these things.
4108 To protect your rights, we need to prevent others from denying you
4109 these rights or asking you to surrender the rights. Therefore, you have
4110 certain responsibilities if you distribute copies of the software, or if
4111 you modify it: responsibilities to respect the freedom of others.
4113 For example, if you distribute copies of such a program, whether
4114 gratis or for a fee, you must pass on to the recipients the same
4115 freedoms that you received. You must make sure that they, too, receive
4116 or can get the source code. And you must show them these terms so they
4119 Developers that use the GNU GPL protect your rights with two steps:
4120 (1) assert copyright on the software, and (2) offer you this License
4121 giving you legal permission to copy, distribute and/or modify it.
4123 For the developers’ and authors’ protection, the GPL clearly explains
4124 that there is no warranty for this free software. For both users’ and
4125 authors’ sake, the GPL requires that modified versions be marked as
4126 changed, so that their problems will not be attributed erroneously to
4127 authors of previous versions.
4129 Some devices are designed to deny users access to install or run
4130 modified versions of the software inside them, although the manufacturer
4131 can do so. This is fundamentally incompatible with the aim of
4132 protecting users’ freedom to change the software. The systematic
4133 pattern of such abuse occurs in the area of products for individuals to
4134 use, which is precisely where it is most unacceptable. Therefore, we
4135 have designed this version of the GPL to prohibit the practice for those
4136 products. If such problems arise substantially in other domains, we
4137 stand ready to extend this provision to those domains in future versions
4138 of the GPL, as needed to protect the freedom of users.
4140 Finally, every program is threatened constantly by software patents.
4141 States should not allow patents to restrict development and use of
4142 software on general-purpose computers, but in those that do, we wish to
4143 avoid the special danger that patents applied to a free program could
4144 make it effectively proprietary. To prevent this, the GPL assures that
4145 patents cannot be used to render the program non-free.
4147 The precise terms and conditions for copying, distribution and
4148 modification follow.
4150 TERMS AND CONDITIONS
4151 ====================
4155 “This License” refers to version 3 of the GNU General Public
4158 “Copyright” also means copyright-like laws that apply to other
4159 kinds of works, such as semiconductor masks.
4161 “The Program” refers to any copyrightable work licensed under this
4162 License. Each licensee is addressed as “you”. “Licensees” and
4163 “recipients” may be individuals or organizations.
4165 To “modify” a work means to copy from or adapt all or part of the
4166 work in a fashion requiring copyright permission, other than the
4167 making of an exact copy. The resulting work is called a “modified
4168 version” of the earlier work or a work “based on” the earlier work.
4170 A “covered work” means either the unmodified Program or a work
4171 based on the Program.
4173 To “propagate” a work means to do anything with it that, without
4174 permission, would make you directly or secondarily liable for
4175 infringement under applicable copyright law, except executing it on
4176 a computer or modifying a private copy. Propagation includes
4177 copying, distribution (with or without modification), making
4178 available to the public, and in some countries other activities as
4181 To “convey” a work means any kind of propagation that enables other
4182 parties to make or receive copies. Mere interaction with a user
4183 through a computer network, with no transfer of a copy, is not
4186 An interactive user interface displays “Appropriate Legal Notices”
4187 to the extent that it includes a convenient and prominently visible
4188 feature that (1) displays an appropriate copyright notice, and (2)
4189 tells the user that there is no warranty for the work (except to
4190 the extent that warranties are provided), that licensees may convey
4191 the work under this License, and how to view a copy of this
4192 License. If the interface presents a list of user commands or
4193 options, such as a menu, a prominent item in the list meets this
4198 The “source code” for a work means the preferred form of the work
4199 for making modifications to it. “Object code” means any non-source
4202 A “Standard Interface” means an interface that either is an
4203 official standard defined by a recognized standards body, or, in
4204 the case of interfaces specified for a particular programming
4205 language, one that is widely used among developers working in that
4208 The “System Libraries” of an executable work include anything,
4209 other than the work as a whole, that (a) is included in the normal
4210 form of packaging a Major Component, but which is not part of that
4211 Major Component, and (b) serves only to enable use of the work with
4212 that Major Component, or to implement a Standard Interface for
4213 which an implementation is available to the public in source code
4214 form. A “Major Component”, in this context, means a major
4215 essential component (kernel, window system, and so on) of the
4216 specific operating system (if any) on which the executable work
4217 runs, or a compiler used to produce the work, or an object code
4218 interpreter used to run it.
4220 The “Corresponding Source” for a work in object code form means all
4221 the source code needed to generate, install, and (for an executable
4222 work) run the object code and to modify the work, including scripts
4223 to control those activities. However, it does not include the
4224 work’s System Libraries, or general-purpose tools or generally
4225 available free programs which are used unmodified in performing
4226 those activities but which are not part of the work. For example,
4227 Corresponding Source includes interface definition files associated
4228 with source files for the work, and the source code for shared
4229 libraries and dynamically linked subprograms that the work is
4230 specifically designed to require, such as by intimate data
4231 communication or control flow between those subprograms and other
4234 The Corresponding Source need not include anything that users can
4235 regenerate automatically from other parts of the Corresponding
4238 The Corresponding Source for a work in source code form is that
4241 2. Basic Permissions.
4243 All rights granted under this License are granted for the term of
4244 copyright on the Program, and are irrevocable provided the stated
4245 conditions are met. This License explicitly affirms your unlimited
4246 permission to run the unmodified Program. The output from running
4247 a covered work is covered by this License only if the output, given
4248 its content, constitutes a covered work. This License acknowledges
4249 your rights of fair use or other equivalent, as provided by
4252 You may make, run and propagate covered works that you do not
4253 convey, without conditions so long as your license otherwise
4254 remains in force. You may convey covered works to others for the
4255 sole purpose of having them make modifications exclusively for you,
4256 or provide you with facilities for running those works, provided
4257 that you comply with the terms of this License in conveying all
4258 material for which you do not control copyright. Those thus making
4259 or running the covered works for you must do so exclusively on your
4260 behalf, under your direction and control, on terms that prohibit
4261 them from making any copies of your copyrighted material outside
4262 their relationship with you.
4264 Conveying under any other circumstances is permitted solely under
4265 the conditions stated below. Sublicensing is not allowed; section
4266 10 makes it unnecessary.
4268 3. Protecting Users’ Legal Rights From Anti-Circumvention Law.
4270 No covered work shall be deemed part of an effective technological
4271 measure under any applicable law fulfilling obligations under
4272 article 11 of the WIPO copyright treaty adopted on 20 December
4273 1996, or similar laws prohibiting or restricting circumvention of
4276 When you convey a covered work, you waive any legal power to forbid
4277 circumvention of technological measures to the extent such
4278 circumvention is effected by exercising rights under this License
4279 with respect to the covered work, and you disclaim any intention to
4280 limit operation or modification of the work as a means of
4281 enforcing, against the work’s users, your or third parties’ legal
4282 rights to forbid circumvention of technological measures.
4284 4. Conveying Verbatim Copies.
4286 You may convey verbatim copies of the Program’s source code as you
4287 receive it, in any medium, provided that you conspicuously and
4288 appropriately publish on each copy an appropriate copyright notice;
4289 keep intact all notices stating that this License and any
4290 non-permissive terms added in accord with section 7 apply to the
4291 code; keep intact all notices of the absence of any warranty; and
4292 give all recipients a copy of this License along with the Program.
4294 You may charge any price or no price for each copy that you convey,
4295 and you may offer support or warranty protection for a fee.
4297 5. Conveying Modified Source Versions.
4299 You may convey a work based on the Program, or the modifications to
4300 produce it from the Program, in the form of source code under the
4301 terms of section 4, provided that you also meet all of these
4304 a. The work must carry prominent notices stating that you
4305 modified it, and giving a relevant date.
4307 b. The work must carry prominent notices stating that it is
4308 released under this License and any conditions added under
4309 section 7. This requirement modifies the requirement in
4310 section 4 to “keep intact all notices”.
4312 c. You must license the entire work, as a whole, under this
4313 License to anyone who comes into possession of a copy. This
4314 License will therefore apply, along with any applicable
4315 section 7 additional terms, to the whole of the work, and all
4316 its parts, regardless of how they are packaged. This License
4317 gives no permission to license the work in any other way, but
4318 it does not invalidate such permission if you have separately
4321 d. If the work has interactive user interfaces, each must display
4322 Appropriate Legal Notices; however, if the Program has
4323 interactive interfaces that do not display Appropriate Legal
4324 Notices, your work need not make them do so.
4326 A compilation of a covered work with other separate and independent
4327 works, which are not by their nature extensions of the covered
4328 work, and which are not combined with it such as to form a larger
4329 program, in or on a volume of a storage or distribution medium, is
4330 called an “aggregate” if the compilation and its resulting
4331 copyright are not used to limit the access or legal rights of the
4332 compilation’s users beyond what the individual works permit.
4333 Inclusion of a covered work in an aggregate does not cause this
4334 License to apply to the other parts of the aggregate.
4336 6. Conveying Non-Source Forms.
4338 You may convey a covered work in object code form under the terms
4339 of sections 4 and 5, provided that you also convey the
4340 machine-readable Corresponding Source under the terms of this
4341 License, in one of these ways:
4343 a. Convey the object code in, or embodied in, a physical product
4344 (including a physical distribution medium), accompanied by the
4345 Corresponding Source fixed on a durable physical medium
4346 customarily used for software interchange.
4348 b. Convey the object code in, or embodied in, a physical product
4349 (including a physical distribution medium), accompanied by a
4350 written offer, valid for at least three years and valid for as
4351 long as you offer spare parts or customer support for that
4352 product model, to give anyone who possesses the object code
4353 either (1) a copy of the Corresponding Source for all the
4354 software in the product that is covered by this License, on a
4355 durable physical medium customarily used for software
4356 interchange, for a price no more than your reasonable cost of
4357 physically performing this conveying of source, or (2) access
4358 to copy the Corresponding Source from a network server at no
4361 c. Convey individual copies of the object code with a copy of the
4362 written offer to provide the Corresponding Source. This
4363 alternative is allowed only occasionally and noncommercially,
4364 and only if you received the object code with such an offer,
4365 in accord with subsection 6b.
4367 d. Convey the object code by offering access from a designated
4368 place (gratis or for a charge), and offer equivalent access to
4369 the Corresponding Source in the same way through the same
4370 place at no further charge. You need not require recipients
4371 to copy the Corresponding Source along with the object code.
4372 If the place to copy the object code is a network server, the
4373 Corresponding Source may be on a different server (operated by
4374 you or a third party) that supports equivalent copying
4375 facilities, provided you maintain clear directions next to the
4376 object code saying where to find the Corresponding Source.
4377 Regardless of what server hosts the Corresponding Source, you
4378 remain obligated to ensure that it is available for as long as
4379 needed to satisfy these requirements.
4381 e. Convey the object code using peer-to-peer transmission,
4382 provided you inform other peers where the object code and
4383 Corresponding Source of the work are being offered to the
4384 general public at no charge under subsection 6d.
4386 A separable portion of the object code, whose source code is
4387 excluded from the Corresponding Source as a System Library, need
4388 not be included in conveying the object code work.
4390 A “User Product” is either (1) a “consumer product”, which means
4391 any tangible personal property which is normally used for personal,
4392 family, or household purposes, or (2) anything designed or sold for
4393 incorporation into a dwelling. In determining whether a product is
4394 a consumer product, doubtful cases shall be resolved in favor of
4395 coverage. For a particular product received by a particular user,
4396 “normally used” refers to a typical or common use of that class of
4397 product, regardless of the status of the particular user or of the
4398 way in which the particular user actually uses, or expects or is
4399 expected to use, the product. A product is a consumer product
4400 regardless of whether the product has substantial commercial,
4401 industrial or non-consumer uses, unless such uses represent the
4402 only significant mode of use of the product.
4404 “Installation Information” for a User Product means any methods,
4405 procedures, authorization keys, or other information required to
4406 install and execute modified versions of a covered work in that
4407 User Product from a modified version of its Corresponding Source.
4408 The information must suffice to ensure that the continued
4409 functioning of the modified object code is in no case prevented or
4410 interfered with solely because modification has been made.
4412 If you convey an object code work under this section in, or with,
4413 or specifically for use in, a User Product, and the conveying
4414 occurs as part of a transaction in which the right of possession
4415 and use of the User Product is transferred to the recipient in
4416 perpetuity or for a fixed term (regardless of how the transaction
4417 is characterized), the Corresponding Source conveyed under this
4418 section must be accompanied by the Installation Information. But
4419 this requirement does not apply if neither you nor any third party
4420 retains the ability to install modified object code on the User
4421 Product (for example, the work has been installed in ROM).
4423 The requirement to provide Installation Information does not
4424 include a requirement to continue to provide support service,
4425 warranty, or updates for a work that has been modified or installed
4426 by the recipient, or for the User Product in which it has been
4427 modified or installed. Access to a network may be denied when the
4428 modification itself materially and adversely affects the operation
4429 of the network or violates the rules and protocols for
4430 communication across the network.
4432 Corresponding Source conveyed, and Installation Information
4433 provided, in accord with this section must be in a format that is
4434 publicly documented (and with an implementation available to the
4435 public in source code form), and must require no special password
4436 or key for unpacking, reading or copying.
4438 7. Additional Terms.
4440 “Additional permissions” are terms that supplement the terms of
4441 this License by making exceptions from one or more of its
4442 conditions. Additional permissions that are applicable to the
4443 entire Program shall be treated as though they were included in
4444 this License, to the extent that they are valid under applicable
4445 law. If additional permissions apply only to part of the Program,
4446 that part may be used separately under those permissions, but the
4447 entire Program remains governed by this License without regard to
4448 the additional permissions.
4450 When you convey a copy of a covered work, you may at your option
4451 remove any additional permissions from that copy, or from any part
4452 of it. (Additional permissions may be written to require their own
4453 removal in certain cases when you modify the work.) You may place
4454 additional permissions on material, added by you to a covered work,
4455 for which you have or can give appropriate copyright permission.
4457 Notwithstanding any other provision of this License, for material
4458 you add to a covered work, you may (if authorized by the copyright
4459 holders of that material) supplement the terms of this License with
4462 a. Disclaiming warranty or limiting liability differently from
4463 the terms of sections 15 and 16 of this License; or
4465 b. Requiring preservation of specified reasonable legal notices
4466 or author attributions in that material or in the Appropriate
4467 Legal Notices displayed by works containing it; or
4469 c. Prohibiting misrepresentation of the origin of that material,
4470 or requiring that modified versions of such material be marked
4471 in reasonable ways as different from the original version; or
4473 d. Limiting the use for publicity purposes of names of licensors
4474 or authors of the material; or
4476 e. Declining to grant rights under trademark law for use of some
4477 trade names, trademarks, or service marks; or
4479 f. Requiring indemnification of licensors and authors of that
4480 material by anyone who conveys the material (or modified
4481 versions of it) with contractual assumptions of liability to
4482 the recipient, for any liability that these contractual
4483 assumptions directly impose on those licensors and authors.
4485 All other non-permissive additional terms are considered “further
4486 restrictions” within the meaning of section 10. If the Program as
4487 you received it, or any part of it, contains a notice stating that
4488 it is governed by this License along with a term that is a further
4489 restriction, you may remove that term. If a license document
4490 contains a further restriction but permits relicensing or conveying
4491 under this License, you may add to a covered work material governed
4492 by the terms of that license document, provided that the further
4493 restriction does not survive such relicensing or conveying.
4495 If you add terms to a covered work in accord with this section, you
4496 must place, in the relevant source files, a statement of the
4497 additional terms that apply to those files, or a notice indicating
4498 where to find the applicable terms.
4500 Additional terms, permissive or non-permissive, may be stated in
4501 the form of a separately written license, or stated as exceptions;
4502 the above requirements apply either way.
4506 You may not propagate or modify a covered work except as expressly
4507 provided under this License. Any attempt otherwise to propagate or
4508 modify it is void, and will automatically terminate your rights
4509 under this License (including any patent licenses granted under the
4510 third paragraph of section 11).
4512 However, if you cease all violation of this License, then your
4513 license from a particular copyright holder is reinstated (a)
4514 provisionally, unless and until the copyright holder explicitly and
4515 finally terminates your license, and (b) permanently, if the
4516 copyright holder fails to notify you of the violation by some
4517 reasonable means prior to 60 days after the cessation.
4519 Moreover, your license from a particular copyright holder is
4520 reinstated permanently if the copyright holder notifies you of the
4521 violation by some reasonable means, this is the first time you have
4522 received notice of violation of this License (for any work) from
4523 that copyright holder, and you cure the violation prior to 30 days
4524 after your receipt of the notice.
4526 Termination of your rights under this section does not terminate
4527 the licenses of parties who have received copies or rights from you
4528 under this License. If your rights have been terminated and not
4529 permanently reinstated, you do not qualify to receive new licenses
4530 for the same material under section 10.
4532 9. Acceptance Not Required for Having Copies.
4534 You are not required to accept this License in order to receive or
4535 run a copy of the Program. Ancillary propagation of a covered work
4536 occurring solely as a consequence of using peer-to-peer
4537 transmission to receive a copy likewise does not require
4538 acceptance. However, nothing other than this License grants you
4539 permission to propagate or modify any covered work. These actions
4540 infringe copyright if you do not accept this License. Therefore,
4541 by modifying or propagating a covered work, you indicate your
4542 acceptance of this License to do so.
4544 10. Automatic Licensing of Downstream Recipients.
4546 Each time you convey a covered work, the recipient automatically
4547 receives a license from the original licensors, to run, modify and
4548 propagate that work, subject to this License. You are not
4549 responsible for enforcing compliance by third parties with this
4552 An “entity transaction” is a transaction transferring control of an
4553 organization, or substantially all assets of one, or subdividing an
4554 organization, or merging organizations. If propagation of a
4555 covered work results from an entity transaction, each party to that
4556 transaction who receives a copy of the work also receives whatever
4557 licenses to the work the party’s predecessor in interest had or
4558 could give under the previous paragraph, plus a right to possession
4559 of the Corresponding Source of the work from the predecessor in
4560 interest, if the predecessor has it or can get it with reasonable
4563 You may not impose any further restrictions on the exercise of the
4564 rights granted or affirmed under this License. For example, you
4565 may not impose a license fee, royalty, or other charge for exercise
4566 of rights granted under this License, and you may not initiate
4567 litigation (including a cross-claim or counterclaim in a lawsuit)
4568 alleging that any patent claim is infringed by making, using,
4569 selling, offering for sale, or importing the Program or any portion
4574 A “contributor” is a copyright holder who authorizes use under this
4575 License of the Program or a work on which the Program is based.
4576 The work thus licensed is called the contributor’s “contributor
4579 A contributor’s “essential patent claims” are all patent claims
4580 owned or controlled by the contributor, whether already acquired or
4581 hereafter acquired, that would be infringed by some manner,
4582 permitted by this License, of making, using, or selling its
4583 contributor version, but do not include claims that would be
4584 infringed only as a consequence of further modification of the
4585 contributor version. For purposes of this definition, “control”
4586 includes the right to grant patent sublicenses in a manner
4587 consistent with the requirements of this License.
4589 Each contributor grants you a non-exclusive, worldwide,
4590 royalty-free patent license under the contributor’s essential
4591 patent claims, to make, use, sell, offer for sale, import and
4592 otherwise run, modify and propagate the contents of its contributor
4595 In the following three paragraphs, a “patent license” is any
4596 express agreement or commitment, however denominated, not to
4597 enforce a patent (such as an express permission to practice a
4598 patent or covenant not to sue for patent infringement). To “grant”
4599 such a patent license to a party means to make such an agreement or
4600 commitment not to enforce a patent against the party.
4602 If you convey a covered work, knowingly relying on a patent
4603 license, and the Corresponding Source of the work is not available
4604 for anyone to copy, free of charge and under the terms of this
4605 License, through a publicly available network server or other
4606 readily accessible means, then you must either (1) cause the
4607 Corresponding Source to be so available, or (2) arrange to deprive
4608 yourself of the benefit of the patent license for this particular
4609 work, or (3) arrange, in a manner consistent with the requirements
4610 of this License, to extend the patent license to downstream
4611 recipients. “Knowingly relying” means you have actual knowledge
4612 that, but for the patent license, your conveying the covered work
4613 in a country, or your recipient’s use of the covered work in a
4614 country, would infringe one or more identifiable patents in that
4615 country that you have reason to believe are valid.
4617 If, pursuant to or in connection with a single transaction or
4618 arrangement, you convey, or propagate by procuring conveyance of, a
4619 covered work, and grant a patent license to some of the parties
4620 receiving the covered work authorizing them to use, propagate,
4621 modify or convey a specific copy of the covered work, then the
4622 patent license you grant is automatically extended to all
4623 recipients of the covered work and works based on it.
4625 A patent license is “discriminatory” if it does not include within
4626 the scope of its coverage, prohibits the exercise of, or is
4627 conditioned on the non-exercise of one or more of the rights that
4628 are specifically granted under this License. You may not convey a
4629 covered work if you are a party to an arrangement with a third
4630 party that is in the business of distributing software, under which
4631 you make payment to the third party based on the extent of your
4632 activity of conveying the work, and under which the third party
4633 grants, to any of the parties who would receive the covered work
4634 from you, a discriminatory patent license (a) in connection with
4635 copies of the covered work conveyed by you (or copies made from
4636 those copies), or (b) primarily for and in connection with specific
4637 products or compilations that contain the covered work, unless you
4638 entered into that arrangement, or that patent license was granted,
4639 prior to 28 March 2007.
4641 Nothing in this License shall be construed as excluding or limiting
4642 any implied license or other defenses to infringement that may
4643 otherwise be available to you under applicable patent law.
4645 12. No Surrender of Others’ Freedom.
4647 If conditions are imposed on you (whether by court order, agreement
4648 or otherwise) that contradict the conditions of this License, they
4649 do not excuse you from the conditions of this License. If you
4650 cannot convey a covered work so as to satisfy simultaneously your
4651 obligations under this License and any other pertinent obligations,
4652 then as a consequence you may not convey it at all. For example,
4653 if you agree to terms that obligate you to collect a royalty for
4654 further conveying from those to whom you convey the Program, the
4655 only way you could satisfy both those terms and this License would
4656 be to refrain entirely from conveying the Program.
4658 13. Use with the GNU Affero General Public License.
4660 Notwithstanding any other provision of this License, you have
4661 permission to link or combine any covered work with a work licensed
4662 under version 3 of the GNU Affero General Public License into a
4663 single combined work, and to convey the resulting work. The terms
4664 of this License will continue to apply to the part which is the
4665 covered work, but the special requirements of the GNU Affero
4666 General Public License, section 13, concerning interaction through
4667 a network will apply to the combination as such.
4669 14. Revised Versions of this License.
4671 The Free Software Foundation may publish revised and/or new
4672 versions of the GNU General Public License from time to time. Such
4673 new versions will be similar in spirit to the present version, but
4674 may differ in detail to address new problems or concerns.
4676 Each version is given a distinguishing version number. If the
4677 Program specifies that a certain numbered version of the GNU
4678 General Public License “or any later version” applies to it, you
4679 have the option of following the terms and conditions either of
4680 that numbered version or of any later version published by the Free
4681 Software Foundation. If the Program does not specify a version
4682 number of the GNU General Public License, you may choose any
4683 version ever published by the Free Software Foundation.
4685 If the Program specifies that a proxy can decide which future
4686 versions of the GNU General Public License can be used, that
4687 proxy’s public statement of acceptance of a version permanently
4688 authorizes you to choose that version for the Program.
4690 Later license versions may give you additional or different
4691 permissions. However, no additional obligations are imposed on any
4692 author or copyright holder as a result of your choosing to follow a
4695 15. Disclaimer of Warranty.
4697 THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
4698 APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE
4699 COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM “AS IS”
4700 WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED,
4701 INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
4702 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE
4703 RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.
4704 SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL
4705 NECESSARY SERVICING, REPAIR OR CORRECTION.
4707 16. Limitation of Liability.
4709 IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
4710 WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES
4711 AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR
4712 DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR
4713 CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE
4714 THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA
4715 BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
4716 PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
4717 PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF
4718 THE POSSIBILITY OF SUCH DAMAGES.
4720 17. Interpretation of Sections 15 and 16.
4722 If the disclaimer of warranty and limitation of liability provided
4723 above cannot be given local legal effect according to their terms,
4724 reviewing courts shall apply local law that most closely
4725 approximates an absolute waiver of all civil liability in
4726 connection with the Program, unless a warranty or assumption of
4727 liability accompanies a copy of the Program in return for a fee.
4729 END OF TERMS AND CONDITIONS
4730 ===========================
4732 How to Apply These Terms to Your New Programs
4733 =============================================
4735 If you develop a new program, and you want it to be of the greatest
4736 possible use to the public, the best way to achieve this is to make it
4737 free software which everyone can redistribute and change under these
4740 To do so, attach the following notices to the program. It is safest
4741 to attach them to the start of each source file to most effectively
4742 state the exclusion of warranty; and each file should have at least the
4743 “copyright” line and a pointer to where the full notice is found.
4745 ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES.
4746 Copyright (C) YEAR NAME OF AUTHOR
4748 This program is free software: you can redistribute it and/or modify
4749 it under the terms of the GNU General Public License as published by
4750 the Free Software Foundation, either version 3 of the License, or (at
4751 your option) any later version.
4753 This program is distributed in the hope that it will be useful, but
4754 WITHOUT ANY WARRANTY; without even the implied warranty of
4755 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
4756 General Public License for more details.
4758 You should have received a copy of the GNU General Public License
4759 along with this program. If not, see <http://www.gnu.org/licenses/>.
4761 Also add information on how to contact you by electronic and paper
4764 If the program does terminal interaction, make it output a short
4765 notice like this when it starts in an interactive mode:
4767 PROGRAM Copyright (C) YEAR NAME OF AUTHOR
4768 This program comes with ABSOLUTELY NO WARRANTY; for details type ‘show w’.
4769 This is free software, and you are welcome to redistribute it
4770 under certain conditions; type ‘show c’ for details.
4772 The hypothetical commands ‘show w’ and ‘show c’ should show the
4773 appropriate parts of the General Public License. Of course, your
4774 program’s commands might be different; for a GUI interface, you would
4777 You should also get your employer (if you work as a programmer) or
4778 school, if any, to sign a “copyright disclaimer” for the program, if
4779 necessary. For more information on this, and how to apply and follow
4780 the GNU GPL, see <http://www.gnu.org/licenses/>.
4782 The GNU General Public License does not permit incorporating your
4783 program into proprietary programs. If your program is a subroutine
4784 library, you may consider it more useful to permit linking proprietary
4785 applications with the library. If this is what you want to do, use the
4786 GNU Lesser General Public License instead of this License. But first,
4787 please read <http://www.gnu.org/philosophy/why-not-lgpl.html>.
4790 File: libunistring.info, Node: GNU LGPL, Next: GNU FDL, Prev: GNU GPL, Up: Licenses
4792 A.2 GNU LESSER GENERAL PUBLIC LICENSE
4793 =====================================
4795 Version 3, 29 June 2007
4797 Copyright © 2007 Free Software Foundation, Inc. <http://fsf.org/>
4799 Everyone is permitted to copy and distribute verbatim copies of this
4800 license document, but changing it is not allowed.
4802 This version of the GNU Lesser General Public License incorporates
4803 the terms and conditions of version 3 of the GNU General Public License,
4804 supplemented by the additional permissions listed below.
4806 0. Additional Definitions.
4808 As used herein, “this License” refers to version 3 of the GNU
4809 Lesser General Public License, and the “GNU GPL” refers to version
4810 3 of the GNU General Public License.
4812 “The Library” refers to a covered work governed by this License,
4813 other than an Application or a Combined Work as defined below.
4815 An “Application” is any work that makes use of an interface
4816 provided by the Library, but which is not otherwise based on the
4817 Library. Defining a subclass of a class defined by the Library is
4818 deemed a mode of using an interface provided by the Library.
4820 A “Combined Work” is a work produced by combining or linking an
4821 Application with the Library. The particular version of the
4822 Library with which the Combined Work was made is also called the
4825 The “Minimal Corresponding Source” for a Combined Work means the
4826 Corresponding Source for the Combined Work, excluding any source
4827 code for portions of the Combined Work that, considered in
4828 isolation, are based on the Application, and not on the Linked
4831 The “Corresponding Application Code” for a Combined Work means the
4832 object code and/or source code for the Application, including any
4833 data and utility programs needed for reproducing the Combined Work
4834 from the Application, but excluding the System Libraries of the
4837 1. Exception to Section 3 of the GNU GPL.
4839 You may convey a covered work under sections 3 and 4 of this
4840 License without being bound by section 3 of the GNU GPL.
4842 2. Conveying Modified Versions.
4844 If you modify a copy of the Library, and, in your modifications, a
4845 facility refers to a function or data to be supplied by an
4846 Application that uses the facility (other than as an argument
4847 passed when the facility is invoked), then you may convey a copy of
4848 the modified version:
4850 a. under this License, provided that you make a good faith effort
4851 to ensure that, in the event an Application does not supply
4852 the function or data, the facility still operates, and
4853 performs whatever part of its purpose remains meaningful, or
4855 b. under the GNU GPL, with none of the additional permissions of
4856 this License applicable to that copy.
4858 3. Object Code Incorporating Material from Library Header Files.
4860 The object code form of an Application may incorporate material
4861 from a header file that is part of the Library. You may convey
4862 such object code under terms of your choice, provided that, if the
4863 incorporated material is not limited to numerical parameters, data
4864 structure layouts and accessors, or small macros, inline functions
4865 and templates (ten or fewer lines in length), you do both of the
4868 a. Give prominent notice with each copy of the object code that
4869 the Library is used in it and that the Library and its use are
4870 covered by this License.
4871 b. Accompany the object code with a copy of the GNU GPL and this
4876 You may convey a Combined Work under terms of your choice that,
4877 taken together, effectively do not restrict modification of the
4878 portions of the Library contained in the Combined Work and reverse
4879 engineering for debugging such modifications, if you also do each
4882 a. Give prominent notice with each copy of the Combined Work that
4883 the Library is used in it and that the Library and its use are
4884 covered by this License.
4885 b. Accompany the Combined Work with a copy of the GNU GPL and
4886 this license document.
4887 c. For a Combined Work that displays copyright notices during
4888 execution, include the copyright notice for the Library among
4889 these notices, as well as a reference directing the user to
4890 the copies of the GNU GPL and this license document.
4891 d. Do one of the following:
4893 0. Convey the Minimal Corresponding Source under the terms
4894 of this License, and the Corresponding Application Code
4895 in a form suitable for, and under terms that permit, the
4896 user to recombine or relink the Application with a
4897 modified version of the Linked Version to produce a
4898 modified Combined Work, in the manner specified by
4899 section 6 of the GNU GPL for conveying Corresponding
4901 1. Use a suitable shared library mechanism for linking with
4902 the Library. A suitable mechanism is one that (a) uses
4903 at run time a copy of the Library already present on the
4904 user’s computer system, and (b) will operate properly
4905 with a modified version of the Library that is
4906 interface-compatible with the Linked Version.
4908 e. Provide Installation Information, but only if you would
4909 otherwise be required to provide such information under
4910 section 6 of the GNU GPL, and only to the extent that such
4911 information is necessary to install and execute a modified
4912 version of the Combined Work produced by recombining or
4913 relinking the Application with a modified version of the
4914 Linked Version. (If you use option 4d0, the Installation
4915 Information must accompany the Minimal Corresponding Source
4916 and Corresponding Application Code. If you use option 4d1,
4917 you must provide the Installation Information in the manner
4918 specified by section 6 of the GNU GPL for conveying
4919 Corresponding Source.)
4921 5. Combined Libraries.
4923 You may place library facilities that are a work based on the
4924 Library side by side in a single library together with other
4925 library facilities that are not Applications and are not covered by
4926 this License, and convey such a combined library under terms of
4927 your choice, if you do both of the following:
4929 a. Accompany the combined library with a copy of the same work
4930 based on the Library, uncombined with any other library
4931 facilities, conveyed under the terms of this License.
4932 b. Give prominent notice with the combined library that part of
4933 it is a work based on the Library, and explaining where to
4934 find the accompanying uncombined form of the same work.
4936 6. Revised Versions of the GNU Lesser General Public License.
4938 The Free Software Foundation may publish revised and/or new
4939 versions of the GNU Lesser General Public License from time to
4940 time. Such new versions will be similar in spirit to the present
4941 version, but may differ in detail to address new problems or
4944 Each version is given a distinguishing version number. If the
4945 Library as you received it specifies that a certain numbered
4946 version of the GNU Lesser General Public License “or any later
4947 version” applies to it, you have the option of following the terms
4948 and conditions either of that published version or of any later
4949 version published by the Free Software Foundation. If the Library
4950 as you received it does not specify a version number of the GNU
4951 Lesser General Public License, you may choose any version of the
4952 GNU Lesser General Public License ever published by the Free
4953 Software Foundation.
4955 If the Library as you received it specifies that a proxy can decide
4956 whether future versions of the GNU Lesser General Public License
4957 shall apply, that proxy’s public statement of acceptance of any
4958 version is permanent authorization for you to choose that version
4962 File: libunistring.info, Node: GNU FDL, Prev: GNU LGPL, Up: Licenses
4964 A.3 GNU Free Documentation License
4965 ==================================
4967 Version 1.3, 3 November 2008
4969 Copyright © 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc.
4972 Everyone is permitted to copy and distribute verbatim copies
4973 of this license document, but changing it is not allowed.
4977 The purpose of this License is to make a manual, textbook, or other
4978 functional and useful document "free" in the sense of freedom: to
4979 assure everyone the effective freedom to copy and redistribute it,
4980 with or without modifying it, either commercially or
4981 noncommercially. Secondarily, this License preserves for the
4982 author and publisher a way to get credit for their work, while not
4983 being considered responsible for modifications made by others.
4985 This License is a kind of “copyleft”, which means that derivative
4986 works of the document must themselves be free in the same sense.
4987 It complements the GNU General Public License, which is a copyleft
4988 license designed for free software.
4990 We have designed this License in order to use it for manuals for
4991 free software, because free software needs free documentation: a
4992 free program should come with manuals providing the same freedoms
4993 that the software does. But this License is not limited to
4994 software manuals; it can be used for any textual work, regardless
4995 of subject matter or whether it is published as a printed book. We
4996 recommend this License principally for works whose purpose is
4997 instruction or reference.
4999 1. APPLICABILITY AND DEFINITIONS
5001 This License applies to any manual or other work, in any medium,
5002 that contains a notice placed by the copyright holder saying it can
5003 be distributed under the terms of this License. Such a notice
5004 grants a world-wide, royalty-free license, unlimited in duration,
5005 to use that work under the conditions stated herein. The
5006 “Document”, below, refers to any such manual or work. Any member
5007 of the public is a licensee, and is addressed as “you”. You accept
5008 the license if you copy, modify or distribute the work in a way
5009 requiring permission under copyright law.
5011 A “Modified Version” of the Document means any work containing the
5012 Document or a portion of it, either copied verbatim, or with
5013 modifications and/or translated into another language.
5015 A “Secondary Section” is a named appendix or a front-matter section
5016 of the Document that deals exclusively with the relationship of the
5017 publishers or authors of the Document to the Document’s overall
5018 subject (or to related matters) and contains nothing that could
5019 fall directly within that overall subject. (Thus, if the Document
5020 is in part a textbook of mathematics, a Secondary Section may not
5021 explain any mathematics.) The relationship could be a matter of
5022 historical connection with the subject or with related matters, or
5023 of legal, commercial, philosophical, ethical or political position
5026 The “Invariant Sections” are certain Secondary Sections whose
5027 titles are designated, as being those of Invariant Sections, in the
5028 notice that says that the Document is released under this License.
5029 If a section does not fit the above definition of Secondary then it
5030 is not allowed to be designated as Invariant. The Document may
5031 contain zero Invariant Sections. If the Document does not identify
5032 any Invariant Sections then there are none.
5034 The “Cover Texts” are certain short passages of text that are
5035 listed, as Front-Cover Texts or Back-Cover Texts, in the notice
5036 that says that the Document is released under this License. A
5037 Front-Cover Text may be at most 5 words, and a Back-Cover Text may
5038 be at most 25 words.
5040 A “Transparent” copy of the Document means a machine-readable copy,
5041 represented in a format whose specification is available to the
5042 general public, that is suitable for revising the document
5043 straightforwardly with generic text editors or (for images composed
5044 of pixels) generic paint programs or (for drawings) some widely
5045 available drawing editor, and that is suitable for input to text
5046 formatters or for automatic translation to a variety of formats
5047 suitable for input to text formatters. A copy made in an otherwise
5048 Transparent file format whose markup, or absence of markup, has
5049 been arranged to thwart or discourage subsequent modification by
5050 readers is not Transparent. An image format is not Transparent if
5051 used for any substantial amount of text. A copy that is not
5052 “Transparent” is called “Opaque”.
5054 Examples of suitable formats for Transparent copies include plain
5055 ASCII without markup, Texinfo input format, LaTeX input format,
5056 SGML or XML using a publicly available DTD, and standard-conforming
5057 simple HTML, PostScript or PDF designed for human modification.
5058 Examples of transparent image formats include PNG, XCF and JPG.
5059 Opaque formats include proprietary formats that can be read and
5060 edited only by proprietary word processors, SGML or XML for which
5061 the DTD and/or processing tools are not generally available, and
5062 the machine-generated HTML, PostScript or PDF produced by some word
5063 processors for output purposes only.
5065 The “Title Page” means, for a printed book, the title page itself,
5066 plus such following pages as are needed to hold, legibly, the
5067 material this License requires to appear in the title page. For
5068 works in formats which do not have any title page as such, “Title
5069 Page” means the text near the most prominent appearance of the
5070 work’s title, preceding the beginning of the body of the text.
5072 The “publisher” means any person or entity that distributes copies
5073 of the Document to the public.
5075 A section “Entitled XYZ” means a named subunit of the Document
5076 whose title either is precisely XYZ or contains XYZ in parentheses
5077 following text that translates XYZ in another language. (Here XYZ
5078 stands for a specific section name mentioned below, such as
5079 “Acknowledgements”, “Dedications”, “Endorsements”, or “History”.)
5080 To “Preserve the Title” of such a section when you modify the
5081 Document means that it remains a section “Entitled XYZ” according
5084 The Document may include Warranty Disclaimers next to the notice
5085 which states that this License applies to the Document. These
5086 Warranty Disclaimers are considered to be included by reference in
5087 this License, but only as regards disclaiming warranties: any other
5088 implication that these Warranty Disclaimers may have is void and
5089 has no effect on the meaning of this License.
5093 You may copy and distribute the Document in any medium, either
5094 commercially or noncommercially, provided that this License, the
5095 copyright notices, and the license notice saying this License
5096 applies to the Document are reproduced in all copies, and that you
5097 add no other conditions whatsoever to those of this License. You
5098 may not use technical measures to obstruct or control the reading
5099 or further copying of the copies you make or distribute. However,
5100 you may accept compensation in exchange for copies. If you
5101 distribute a large enough number of copies you must also follow the
5102 conditions in section 3.
5104 You may also lend copies, under the same conditions stated above,
5105 and you may publicly display copies.
5107 3. COPYING IN QUANTITY
5109 If you publish printed copies (or copies in media that commonly
5110 have printed covers) of the Document, numbering more than 100, and
5111 the Document’s license notice requires Cover Texts, you must
5112 enclose the copies in covers that carry, clearly and legibly, all
5113 these Cover Texts: Front-Cover Texts on the front cover, and
5114 Back-Cover Texts on the back cover. Both covers must also clearly
5115 and legibly identify you as the publisher of these copies. The
5116 front cover must present the full title with all words of the title
5117 equally prominent and visible. You may add other material on the
5118 covers in addition. Copying with changes limited to the covers, as
5119 long as they preserve the title of the Document and satisfy these
5120 conditions, can be treated as verbatim copying in other respects.
5122 If the required texts for either cover are too voluminous to fit
5123 legibly, you should put the first ones listed (as many as fit
5124 reasonably) on the actual cover, and continue the rest onto
5127 If you publish or distribute Opaque copies of the Document
5128 numbering more than 100, you must either include a machine-readable
5129 Transparent copy along with each Opaque copy, or state in or with
5130 each Opaque copy a computer-network location from which the general
5131 network-using public has access to download using public-standard
5132 network protocols a complete Transparent copy of the Document, free
5133 of added material. If you use the latter option, you must take
5134 reasonably prudent steps, when you begin distribution of Opaque
5135 copies in quantity, to ensure that this Transparent copy will
5136 remain thus accessible at the stated location until at least one
5137 year after the last time you distribute an Opaque copy (directly or
5138 through your agents or retailers) of that edition to the public.
5140 It is requested, but not required, that you contact the authors of
5141 the Document well before redistributing any large number of copies,
5142 to give them a chance to provide you with an updated version of the
5147 You may copy and distribute a Modified Version of the Document
5148 under the conditions of sections 2 and 3 above, provided that you
5149 release the Modified Version under precisely this License, with the
5150 Modified Version filling the role of the Document, thus licensing
5151 distribution and modification of the Modified Version to whoever
5152 possesses a copy of it. In addition, you must do these things in
5153 the Modified Version:
5155 A. Use in the Title Page (and on the covers, if any) a title
5156 distinct from that of the Document, and from those of previous
5157 versions (which should, if there were any, be listed in the
5158 History section of the Document). You may use the same title
5159 as a previous version if the original publisher of that
5160 version gives permission.
5162 B. List on the Title Page, as authors, one or more persons or
5163 entities responsible for authorship of the modifications in
5164 the Modified Version, together with at least five of the
5165 principal authors of the Document (all of its principal
5166 authors, if it has fewer than five), unless they release you
5167 from this requirement.
5169 C. State on the Title page the name of the publisher of the
5170 Modified Version, as the publisher.
5172 D. Preserve all the copyright notices of the Document.
5174 E. Add an appropriate copyright notice for your modifications
5175 adjacent to the other copyright notices.
5177 F. Include, immediately after the copyright notices, a license
5178 notice giving the public permission to use the Modified
5179 Version under the terms of this License, in the form shown in
5182 G. Preserve in that license notice the full lists of Invariant
5183 Sections and required Cover Texts given in the Document’s
5186 H. Include an unaltered copy of this License.
5188 I. Preserve the section Entitled “History”, Preserve its Title,
5189 and add to it an item stating at least the title, year, new
5190 authors, and publisher of the Modified Version as given on the
5191 Title Page. If there is no section Entitled “History” in the
5192 Document, create one stating the title, year, authors, and
5193 publisher of the Document as given on its Title Page, then add
5194 an item describing the Modified Version as stated in the
5197 J. Preserve the network location, if any, given in the Document
5198 for public access to a Transparent copy of the Document, and
5199 likewise the network locations given in the Document for
5200 previous versions it was based on. These may be placed in the
5201 “History” section. You may omit a network location for a work
5202 that was published at least four years before the Document
5203 itself, or if the original publisher of the version it refers
5204 to gives permission.
5206 K. For any section Entitled “Acknowledgements” or “Dedications”,
5207 Preserve the Title of the section, and preserve in the section
5208 all the substance and tone of each of the contributor
5209 acknowledgements and/or dedications given therein.
5211 L. Preserve all the Invariant Sections of the Document, unaltered
5212 in their text and in their titles. Section numbers or the
5213 equivalent are not considered part of the section titles.
5215 M. Delete any section Entitled “Endorsements”. Such a section
5216 may not be included in the Modified Version.
5218 N. Do not retitle any existing section to be Entitled
5219 “Endorsements” or to conflict in title with any Invariant
5222 O. Preserve any Warranty Disclaimers.
5224 If the Modified Version includes new front-matter sections or
5225 appendices that qualify as Secondary Sections and contain no
5226 material copied from the Document, you may at your option designate
5227 some or all of these sections as invariant. To do this, add their
5228 titles to the list of Invariant Sections in the Modified Version’s
5229 license notice. These titles must be distinct from any other
5232 You may add a section Entitled “Endorsements”, provided it contains
5233 nothing but endorsements of your Modified Version by various
5234 parties—for example, statements of peer review or that the text has
5235 been approved by an organization as the authoritative definition of
5238 You may add a passage of up to five words as a Front-Cover Text,
5239 and a passage of up to 25 words as a Back-Cover Text, to the end of
5240 the list of Cover Texts in the Modified Version. Only one passage
5241 of Front-Cover Text and one of Back-Cover Text may be added by (or
5242 through arrangements made by) any one entity. If the Document
5243 already includes a cover text for the same cover, previously added
5244 by you or by arrangement made by the same entity you are acting on
5245 behalf of, you may not add another; but you may replace the old
5246 one, on explicit permission from the previous publisher that added
5249 The author(s) and publisher(s) of the Document do not by this
5250 License give permission to use their names for publicity for or to
5251 assert or imply endorsement of any Modified Version.
5253 5. COMBINING DOCUMENTS
5255 You may combine the Document with other documents released under
5256 this License, under the terms defined in section 4 above for
5257 modified versions, provided that you include in the combination all
5258 of the Invariant Sections of all of the original documents,
5259 unmodified, and list them all as Invariant Sections of your
5260 combined work in its license notice, and that you preserve all
5261 their Warranty Disclaimers.
5263 The combined work need only contain one copy of this License, and
5264 multiple identical Invariant Sections may be replaced with a single
5265 copy. If there are multiple Invariant Sections with the same name
5266 but different contents, make the title of each such section unique
5267 by adding at the end of it, in parentheses, the name of the
5268 original author or publisher of that section if known, or else a
5269 unique number. Make the same adjustment to the section titles in
5270 the list of Invariant Sections in the license notice of the
5273 In the combination, you must combine any sections Entitled
5274 “History” in the various original documents, forming one section
5275 Entitled “History”; likewise combine any sections Entitled
5276 “Acknowledgements”, and any sections Entitled “Dedications”. You
5277 must delete all sections Entitled “Endorsements.”
5279 6. COLLECTIONS OF DOCUMENTS
5281 You may make a collection consisting of the Document and other
5282 documents released under this License, and replace the individual
5283 copies of this License in the various documents with a single copy
5284 that is included in the collection, provided that you follow the
5285 rules of this License for verbatim copying of each of the documents
5286 in all other respects.
5288 You may extract a single document from such a collection, and
5289 distribute it individually under this License, provided you insert
5290 a copy of this License into the extracted document, and follow this
5291 License in all other respects regarding verbatim copying of that
5294 7. AGGREGATION WITH INDEPENDENT WORKS
5296 A compilation of the Document or its derivatives with other
5297 separate and independent documents or works, in or on a volume of a
5298 storage or distribution medium, is called an “aggregate” if the
5299 copyright resulting from the compilation is not used to limit the
5300 legal rights of the compilation’s users beyond what the individual
5301 works permit. When the Document is included in an aggregate, this
5302 License does not apply to the other works in the aggregate which
5303 are not themselves derivative works of the Document.
5305 If the Cover Text requirement of section 3 is applicable to these
5306 copies of the Document, then if the Document is less than one half
5307 of the entire aggregate, the Document’s Cover Texts may be placed
5308 on covers that bracket the Document within the aggregate, or the
5309 electronic equivalent of covers if the Document is in electronic
5310 form. Otherwise they must appear on printed covers that bracket
5311 the whole aggregate.
5315 Translation is considered a kind of modification, so you may
5316 distribute translations of the Document under the terms of section
5317 4. Replacing Invariant Sections with translations requires special
5318 permission from their copyright holders, but you may include
5319 translations of some or all Invariant Sections in addition to the
5320 original versions of these Invariant Sections. You may include a
5321 translation of this License, and all the license notices in the
5322 Document, and any Warranty Disclaimers, provided that you also
5323 include the original English version of this License and the
5324 original versions of those notices and disclaimers. In case of a
5325 disagreement between the translation and the original version of
5326 this License or a notice or disclaimer, the original version will
5329 If a section in the Document is Entitled “Acknowledgements”,
5330 “Dedications”, or “History”, the requirement (section 4) to
5331 Preserve its Title (section 1) will typically require changing the
5336 You may not copy, modify, sublicense, or distribute the Document
5337 except as expressly provided under this License. Any attempt
5338 otherwise to copy, modify, sublicense, or distribute it is void,
5339 and will automatically terminate your rights under this License.
5341 However, if you cease all violation of this License, then your
5342 license from a particular copyright holder is reinstated (a)
5343 provisionally, unless and until the copyright holder explicitly and
5344 finally terminates your license, and (b) permanently, if the
5345 copyright holder fails to notify you of the violation by some
5346 reasonable means prior to 60 days after the cessation.
5348 Moreover, your license from a particular copyright holder is
5349 reinstated permanently if the copyright holder notifies you of the
5350 violation by some reasonable means, this is the first time you have
5351 received notice of violation of this License (for any work) from
5352 that copyright holder, and you cure the violation prior to 30 days
5353 after your receipt of the notice.
5355 Termination of your rights under this section does not terminate
5356 the licenses of parties who have received copies or rights from you
5357 under this License. If your rights have been terminated and not
5358 permanently reinstated, receipt of a copy of some or all of the
5359 same material does not give you any rights to use it.
5361 10. FUTURE REVISIONS OF THIS LICENSE
5363 The Free Software Foundation may publish new, revised versions of
5364 the GNU Free Documentation License from time to time. Such new
5365 versions will be similar in spirit to the present version, but may
5366 differ in detail to address new problems or concerns. See
5367 <http://www.gnu.org/copyleft/>.
5369 Each version of the License is given a distinguishing version
5370 number. If the Document specifies that a particular numbered
5371 version of this License “or any later version” applies to it, you
5372 have the option of following the terms and conditions either of
5373 that specified version or of any later version that has been
5374 published (not as a draft) by the Free Software Foundation. If the
5375 Document does not specify a version number of this License, you may
5376 choose any version ever published (not as a draft) by the Free
5377 Software Foundation. If the Document specifies that a proxy can
5378 decide which future versions of this License can be used, that
5379 proxy’s public statement of acceptance of a version permanently
5380 authorizes you to choose that version for the Document.
5384 “Massive Multiauthor Collaboration Site” (or “MMC Site”) means any
5385 World Wide Web server that publishes copyrightable works and also
5386 provides prominent facilities for anybody to edit those works. A
5387 public wiki that anybody can edit is an example of such a server.
5388 A “Massive Multiauthor Collaboration” (or “MMC”) contained in the
5389 site means any set of copyrightable works thus published on the MMC
5392 “CC-BY-SA” means the Creative Commons Attribution-Share Alike 3.0
5393 license published by Creative Commons Corporation, a not-for-profit
5394 corporation with a principal place of business in San Francisco,
5395 California, as well as future copyleft versions of that license
5396 published by that same organization.
5398 “Incorporate” means to publish or republish a Document, in whole or
5399 in part, as part of another Document.
5401 An MMC is “eligible for relicensing” if it is licensed under this
5402 License, and if all works that were first published under this
5403 License somewhere other than this MMC, and subsequently
5404 incorporated in whole or in part into the MMC, (1) had no cover
5405 texts or invariant sections, and (2) were thus incorporated prior
5406 to November 1, 2008.
5408 The operator of an MMC Site may republish an MMC contained in the
5409 site under CC-BY-SA on the same site at any time before August 1,
5410 2009, provided the MMC is eligible for relicensing.
5412 ADDENDUM: How to use this License for your documents
5413 ====================================================
5415 To use this License in a document you have written, include a copy of
5416 the License in the document and put the following copyright and license
5417 notices just after the title page:
5419 Copyright (C) YEAR YOUR NAME.
5420 Permission is granted to copy, distribute and/or modify this document
5421 under the terms of the GNU Free Documentation License, Version 1.3
5422 or any later version published by the Free Software Foundation;
5423 with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
5424 Texts. A copy of the license is included in the section entitled ``GNU
5425 Free Documentation License''.
5427 If you have Invariant Sections, Front-Cover Texts and Back-Cover
5428 Texts, replace the “with…Texts.” line with this:
5430 with the Invariant Sections being LIST THEIR TITLES, with
5431 the Front-Cover Texts being LIST, and with the Back-Cover Texts
5434 If you have Invariant Sections without Cover Texts, or some other
5435 combination of the three, merge those two alternatives to suit the
5438 If your document contains nontrivial examples of program code, we
5439 recommend releasing these examples in parallel under your choice of free
5440 software license, such as the GNU General Public License, to permit
5441 their use in free software.
5444 File: libunistring.info, Node: Index, Prev: Licenses, Up: Top
5452 * ambiguous width: uniwidth.h. (line 10)
5453 * Arabic shaping: Arabic shaping. (line 6)
5454 * argument conventions: Conventions. (line 9)
5455 * autoconf macro: Autoconf macro. (line 6)
5456 * bidi class: Bidi class. (line 6)
5457 * bidirectional category: Bidi class. (line 6)
5458 * bidirectional reordering: More functionality. (line 6)
5459 * block: Blocks. (line 6)
5460 * boundaries, between grapheme clusters: unigbrk.h. (line 6)
5461 * boundaries, between words: uniwbrk.h. (line 6)
5462 * breaks, grapheme cluster: unigbrk.h. (line 6)
5463 * breaks, line: unilbrk.h. (line 6)
5464 * breaks, word: uniwbrk.h. (line 6)
5465 * bug reports: Reporting problems. (line 6)
5466 * bug tracker: Reporting problems. (line 6)
5467 * C string functions: char * strings. (line 6)
5468 * C, programming language: ISO C and Java syntax.
5470 * C-like API: Classifications like in ISO C.
5472 * canonical combining class: Canonical combining class.
5474 * case detection: Case detection. (line 6)
5475 * case mappings: Case mappings of strings.
5477 * casing_prefix_context_t: Case mappings of substrings.
5479 * casing_suffix_context_t: Case mappings of substrings.
5481 * char, type: char * strings. (line 22)
5482 * combining, Unicode characters: Composition of characters.
5484 * comparing: Elementary string functions.
5486 * comparing <1>: Elementary string functions on NUL terminated strings.
5488 * comparing, ignoring case: Case insensitive comparison.
5490 * comparing, ignoring case, with collation rules: Case insensitive comparison.
5492 * comparing, ignoring normalization: Normalizing comparisons.
5494 * comparing, ignoring normalization and case: Case insensitive comparison.
5496 * comparing, ignoring normalization and case, with collation rules: Case insensitive comparison.
5498 * comparing, ignoring normalization, with collation rules: Normalizing comparisons.
5500 * comparing, with collation rules: Elementary string functions on NUL terminated strings.
5502 * comparing, with collation rules, ignoring case: Case insensitive comparison.
5504 * comparing, with collation rules, ignoring normalization: Normalizing comparisons.
5506 * comparing, with collation rules, ignoring normalization and case: Case insensitive comparison.
5508 * compiler options: Compiler options. (line 24)
5509 * composing, Unicode characters: Composition of characters.
5511 * converting: Elementary string conversions.
5513 * converting <1>: uniconv.h. (line 45)
5514 * copying: Elementary string functions.
5516 * copying <1>: Elementary string functions on NUL terminated strings.
5518 * counting: Elementary string functions.
5520 * decomposing: Decomposition of characters.
5522 * dependencies: Installation. (line 6)
5523 * detecting case: Case detection. (line 6)
5524 * duplicating: Elementary string functions with memory allocation.
5526 * duplicating <1>: Elementary string functions on NUL terminated strings.
5528 * enum iconv_ilseq_handler: uniconv.h. (line 29)
5529 * FDL, GNU Free Documentation License: GNU FDL. (line 6)
5530 * formatted output: unistdio.h. (line 6)
5531 * fullwidth: uniwidth.h. (line 22)
5532 * general category: General category. (line 6)
5533 * gl_LIBUNISTRING: Autoconf macro. (line 11)
5534 * GPL, GNU General Public License: GNU GPL. (line 6)
5535 * grapheme cluster boundaries: unigbrk.h. (line 6)
5536 * grapheme cluster breaks: unigbrk.h. (line 6)
5537 * halfwidth: uniwidth.h. (line 22)
5538 * identifiers: ISO C and Java syntax.
5540 * installation: Installation. (line 10)
5541 * internationalization: Unicode and i18n. (line 6)
5542 * iterating: Elementary string functions.
5544 * iterating <1>: Elementary string functions on NUL terminated strings.
5546 * Java, programming language: ISO C and Java syntax.
5548 * joining group: Joining group. (line 6)
5549 * joining of Arabic characters: Arabic shaping. (line 6)
5550 * joining type: Joining type. (line 6)
5551 * LGPL, GNU Lesser General Public License: GNU LGPL. (line 6)
5552 * License, GNU FDL: GNU FDL. (line 6)
5553 * License, GNU GPL: GNU GPL. (line 6)
5554 * License, GNU LGPL: GNU LGPL. (line 6)
5555 * Licenses: Licenses. (line 6)
5556 * line breaks: unilbrk.h. (line 6)
5557 * locale: Locale encodings. (line 6)
5558 * locale categories: Locale encodings. (line 10)
5559 * locale encoding: Locale encodings. (line 23)
5560 * locale encoding <1>: uniconv.h. (line 10)
5561 * locale language: Case mappings of strings.
5563 * locale, multibyte: char * strings. (line 13)
5564 * locale_charset: uniconv.h. (line 12)
5565 * lowercasing: Case mappings of strings.
5567 * mailing list: Reporting problems. (line 6)
5568 * mirroring, of Unicode character: Mirrored character. (line 6)
5569 * normal forms: uninorm.h. (line 6)
5570 * normalizing: uninorm.h. (line 6)
5571 * output, formatted: unistdio.h. (line 6)
5572 * properties, of Unicode character: Properties. (line 6)
5573 * regular expression: uniregex.h. (line 6)
5574 * rendering: More functionality. (line 9)
5575 * return value conventions: Conventions. (line 47)
5576 * scripts: Scripts. (line 6)
5577 * searching, for a character: Elementary string functions.
5579 * searching, for a character <1>: Elementary string functions on NUL terminated strings.
5581 * searching, for a substring: Elementary string functions on NUL terminated strings.
5583 * stream, normalizing a: Normalization of streams.
5585 * struct uninorm_filter: Normalization of streams.
5587 * titlecasing: Case mappings of strings.
5589 * u16_asnprintf: unistdio.h. (line 111)
5590 * u16_asprintf: unistdio.h. (line 109)
5591 * u16_casecmp: Case insensitive comparison.
5593 * u16_casecoll: Case insensitive comparison.
5595 * u16_casefold: Case insensitive comparison.
5597 * u16_casexfrm: Case insensitive comparison.
5599 * u16_casing_prefixes_context: Case mappings of substrings.
5601 * u16_casing_prefix_context: Case mappings of substrings.
5603 * u16_casing_suffixes_context: Case mappings of substrings.
5605 * u16_casing_suffix_context: Case mappings of substrings.
5607 * u16_check: Elementary string checks.
5609 * u16_chr: Elementary string functions.
5611 * u16_cmp: Elementary string functions.
5613 * u16_cmp2: Elementary string functions.
5615 * u16_conv_from_encoding: uniconv.h. (line 51)
5616 * u16_conv_to_encoding: uniconv.h. (line 88)
5617 * u16_cpy: Elementary string functions.
5619 * u16_cpy_alloc: Elementary string functions with memory allocation.
5621 * u16_ct_casefold: Case insensitive comparison.
5623 * u16_ct_tolower: Case mappings of substrings.
5625 * u16_ct_totitle: Case mappings of substrings.
5627 * u16_ct_toupper: Case mappings of substrings.
5629 * u16_endswith: Elementary string functions on NUL terminated strings.
5631 * u16_grapheme_breaks: Grapheme cluster breaks in a string.
5633 * u16_grapheme_next: Grapheme cluster breaks in a string.
5635 * u16_grapheme_prev: Grapheme cluster breaks in a string.
5637 * u16_is_cased: Case detection. (line 55)
5638 * u16_is_casefolded: Case detection. (line 42)
5639 * u16_is_lowercase: Case detection. (line 22)
5640 * u16_is_titlecase: Case detection. (line 32)
5641 * u16_is_uppercase: Case detection. (line 12)
5642 * u16_mblen: Elementary string functions.
5644 * u16_mbsnlen: Elementary string functions.
5646 * u16_mbtouc: Elementary string functions.
5648 * u16_mbtoucr: Elementary string functions.
5650 * u16_mbtouc_unsafe: Elementary string functions.
5652 * u16_move: Elementary string functions.
5654 * u16_next: Elementary string functions on NUL terminated strings.
5656 * u16_normalize: Normalization of strings.
5658 * u16_normcmp: Normalizing comparisons.
5660 * u16_normcoll: Normalizing comparisons.
5662 * u16_normxfrm: Normalizing comparisons.
5664 * u16_possible_linebreaks: unilbrk.h. (line 44)
5665 * u16_prev: Elementary string functions on NUL terminated strings.
5667 * u16_set: Elementary string functions.
5669 * u16_snprintf: unistdio.h. (line 107)
5670 * u16_sprintf: unistdio.h. (line 106)
5671 * u16_startswith: Elementary string functions on NUL terminated strings.
5673 * u16_stpcpy: Elementary string functions on NUL terminated strings.
5675 * u16_stpncpy: Elementary string functions on NUL terminated strings.
5677 * u16_strcat: Elementary string functions on NUL terminated strings.
5679 * u16_strchr: Elementary string functions on NUL terminated strings.
5681 * u16_strcmp: Elementary string functions on NUL terminated strings.
5683 * u16_strcoll: Elementary string functions on NUL terminated strings.
5685 * u16_strconv_from_encoding: uniconv.h. (line 127)
5686 * u16_strconv_from_locale: uniconv.h. (line 156)
5687 * u16_strconv_to_encoding: uniconv.h. (line 140)
5688 * u16_strconv_to_locale: uniconv.h. (line 166)
5689 * u16_strcpy: Elementary string functions on NUL terminated strings.
5691 * u16_strcspn: Elementary string functions on NUL terminated strings.
5693 * u16_strdup: Elementary string functions on NUL terminated strings.
5695 * u16_strlen: Elementary string functions on NUL terminated strings.
5697 * u16_strmblen: Elementary string functions on NUL terminated strings.
5699 * u16_strmbtouc: Elementary string functions on NUL terminated strings.
5701 * u16_strncat: Elementary string functions on NUL terminated strings.
5703 * u16_strncmp: Elementary string functions on NUL terminated strings.
5705 * u16_strncpy: Elementary string functions on NUL terminated strings.
5707 * u16_strnlen: Elementary string functions on NUL terminated strings.
5709 * u16_strpbrk: Elementary string functions on NUL terminated strings.
5711 * u16_strrchr: Elementary string functions on NUL terminated strings.
5713 * u16_strspn: Elementary string functions on NUL terminated strings.
5715 * u16_strstr: Elementary string functions on NUL terminated strings.
5717 * u16_strtok: Elementary string functions on NUL terminated strings.
5719 * u16_strwidth: uniwidth.h. (line 38)
5720 * u16_tolower: Case mappings of strings.
5722 * u16_totitle: Case mappings of strings.
5724 * u16_toupper: Case mappings of strings.
5726 * u16_to_u32: Elementary string conversions.
5728 * u16_to_u8: Elementary string conversions.
5730 * u16_u16_asnprintf: unistdio.h. (line 131)
5731 * u16_u16_asprintf: unistdio.h. (line 129)
5732 * u16_u16_snprintf: unistdio.h. (line 127)
5733 * u16_u16_sprintf: unistdio.h. (line 125)
5734 * u16_u16_vasnprintf: unistdio.h. (line 139)
5735 * u16_u16_vasprintf: unistdio.h. (line 137)
5736 * u16_u16_vsnprintf: unistdio.h. (line 135)
5737 * u16_u16_vsprintf: unistdio.h. (line 133)
5738 * u16_uctomb: Elementary string functions.
5740 * u16_vasnprintf: unistdio.h. (line 119)
5741 * u16_vasprintf: unistdio.h. (line 117)
5742 * u16_vsnprintf: unistdio.h. (line 115)
5743 * u16_vsprintf: unistdio.h. (line 113)
5744 * u16_width: uniwidth.h. (line 29)
5745 * u16_width_linebreaks: unilbrk.h. (line 62)
5746 * u16_wordbreaks: Word breaks in a string.
5748 * u32_asnprintf: unistdio.h. (line 150)
5749 * u32_asprintf: unistdio.h. (line 148)
5750 * u32_casecmp: Case insensitive comparison.
5752 * u32_casecoll: Case insensitive comparison.
5754 * u32_casefold: Case insensitive comparison.
5756 * u32_casexfrm: Case insensitive comparison.
5758 * u32_casing_prefixes_context: Case mappings of substrings.
5760 * u32_casing_prefix_context: Case mappings of substrings.
5762 * u32_casing_suffixes_context: Case mappings of substrings.
5764 * u32_casing_suffix_context: Case mappings of substrings.
5766 * u32_check: Elementary string checks.
5768 * u32_chr: Elementary string functions.
5770 * u32_cmp: Elementary string functions.
5772 * u32_cmp2: Elementary string functions.
5774 * u32_conv_from_encoding: uniconv.h. (line 54)
5775 * u32_conv_to_encoding: uniconv.h. (line 91)
5776 * u32_cpy: Elementary string functions.
5778 * u32_cpy_alloc: Elementary string functions with memory allocation.
5780 * u32_ct_casefold: Case insensitive comparison.
5782 * u32_ct_tolower: Case mappings of substrings.
5784 * u32_ct_totitle: Case mappings of substrings.
5786 * u32_ct_toupper: Case mappings of substrings.
5788 * u32_endswith: Elementary string functions on NUL terminated strings.
5790 * u32_grapheme_breaks: Grapheme cluster breaks in a string.
5792 * u32_grapheme_next: Grapheme cluster breaks in a string.
5794 * u32_grapheme_prev: Grapheme cluster breaks in a string.
5796 * u32_is_cased: Case detection. (line 57)
5797 * u32_is_casefolded: Case detection. (line 44)
5798 * u32_is_lowercase: Case detection. (line 24)
5799 * u32_is_titlecase: Case detection. (line 34)
5800 * u32_is_uppercase: Case detection. (line 14)
5801 * u32_mblen: Elementary string functions.
5803 * u32_mbsnlen: Elementary string functions.
5805 * u32_mbtouc: Elementary string functions.
5807 * u32_mbtoucr: Elementary string functions.
5809 * u32_mbtouc_unsafe: Elementary string functions.
5811 * u32_move: Elementary string functions.
5813 * u32_next: Elementary string functions on NUL terminated strings.
5815 * u32_normalize: Normalization of strings.
5817 * u32_normcmp: Normalizing comparisons.
5819 * u32_normcoll: Normalizing comparisons.
5821 * u32_normxfrm: Normalizing comparisons.
5823 * u32_possible_linebreaks: unilbrk.h. (line 46)
5824 * u32_prev: Elementary string functions on NUL terminated strings.
5826 * u32_set: Elementary string functions.
5828 * u32_snprintf: unistdio.h. (line 146)
5829 * u32_sprintf: unistdio.h. (line 145)
5830 * u32_startswith: Elementary string functions on NUL terminated strings.
5832 * u32_stpcpy: Elementary string functions on NUL terminated strings.
5834 * u32_stpncpy: Elementary string functions on NUL terminated strings.
5836 * u32_strcat: Elementary string functions on NUL terminated strings.
5838 * u32_strchr: Elementary string functions on NUL terminated strings.
5840 * u32_strcmp: Elementary string functions on NUL terminated strings.
5842 * u32_strcoll: Elementary string functions on NUL terminated strings.
5844 * u32_strconv_from_encoding: uniconv.h. (line 129)
5845 * u32_strconv_from_locale: uniconv.h. (line 157)
5846 * u32_strconv_to_encoding: uniconv.h. (line 142)
5847 * u32_strconv_to_locale: uniconv.h. (line 167)
5848 * u32_strcpy: Elementary string functions on NUL terminated strings.
5850 * u32_strcspn: Elementary string functions on NUL terminated strings.
5852 * u32_strdup: Elementary string functions on NUL terminated strings.
5854 * u32_strlen: Elementary string functions on NUL terminated strings.
5856 * u32_strmblen: Elementary string functions on NUL terminated strings.
5858 * u32_strmbtouc: Elementary string functions on NUL terminated strings.
5860 * u32_strncat: Elementary string functions on NUL terminated strings.
5862 * u32_strncmp: Elementary string functions on NUL terminated strings.
5864 * u32_strncpy: Elementary string functions on NUL terminated strings.
5866 * u32_strnlen: Elementary string functions on NUL terminated strings.
5868 * u32_strpbrk: Elementary string functions on NUL terminated strings.
5870 * u32_strrchr: Elementary string functions on NUL terminated strings.
5872 * u32_strspn: Elementary string functions on NUL terminated strings.
5874 * u32_strstr: Elementary string functions on NUL terminated strings.
5876 * u32_strtok: Elementary string functions on NUL terminated strings.
5878 * u32_strwidth: uniwidth.h. (line 39)
5879 * u32_tolower: Case mappings of strings.
5881 * u32_totitle: Case mappings of strings.
5883 * u32_toupper: Case mappings of strings.
5885 * u32_to_u16: Elementary string conversions.
5887 * u32_to_u8: Elementary string conversions.
5889 * u32_u32_asnprintf: unistdio.h. (line 170)
5890 * u32_u32_asprintf: unistdio.h. (line 168)
5891 * u32_u32_snprintf: unistdio.h. (line 166)
5892 * u32_u32_sprintf: unistdio.h. (line 164)
5893 * u32_u32_vasnprintf: unistdio.h. (line 178)
5894 * u32_u32_vasprintf: unistdio.h. (line 176)
5895 * u32_u32_vsnprintf: unistdio.h. (line 174)
5896 * u32_u32_vsprintf: unistdio.h. (line 172)
5897 * u32_uctomb: Elementary string functions.
5899 * u32_vasnprintf: unistdio.h. (line 158)
5900 * u32_vasprintf: unistdio.h. (line 156)
5901 * u32_vsnprintf: unistdio.h. (line 154)
5902 * u32_vsprintf: unistdio.h. (line 152)
5903 * u32_width: uniwidth.h. (line 31)
5904 * u32_width_linebreaks: unilbrk.h. (line 65)
5905 * u32_wordbreaks: Word breaks in a string.
5907 * u8_asnprintf: unistdio.h. (line 72)
5908 * u8_asprintf: unistdio.h. (line 70)
5909 * u8_casecmp: Case insensitive comparison.
5911 * u8_casecoll: Case insensitive comparison.
5913 * u8_casefold: Case insensitive comparison.
5915 * u8_casexfrm: Case insensitive comparison.
5917 * u8_casing_prefixes_context: Case mappings of substrings.
5919 * u8_casing_prefix_context: Case mappings of substrings.
5921 * u8_casing_suffixes_context: Case mappings of substrings.
5923 * u8_casing_suffix_context: Case mappings of substrings.
5925 * u8_check: Elementary string checks.
5927 * u8_chr: Elementary string functions.
5929 * u8_cmp: Elementary string functions.
5931 * u8_cmp2: Elementary string functions.
5933 * u8_conv_from_encoding: uniconv.h. (line 48)
5934 * u8_conv_to_encoding: uniconv.h. (line 85)
5935 * u8_cpy: Elementary string functions.
5937 * u8_cpy_alloc: Elementary string functions with memory allocation.
5939 * u8_ct_casefold: Case insensitive comparison.
5941 * u8_ct_tolower: Case mappings of substrings.
5943 * u8_ct_totitle: Case mappings of substrings.
5945 * u8_ct_toupper: Case mappings of substrings.
5947 * u8_endswith: Elementary string functions on NUL terminated strings.
5949 * u8_grapheme_breaks: Grapheme cluster breaks in a string.
5951 * u8_grapheme_next: Grapheme cluster breaks in a string.
5953 * u8_grapheme_prev: Grapheme cluster breaks in a string.
5955 * u8_is_cased: Case detection. (line 53)
5956 * u8_is_casefolded: Case detection. (line 40)
5957 * u8_is_lowercase: Case detection. (line 20)
5958 * u8_is_titlecase: Case detection. (line 30)
5959 * u8_is_uppercase: Case detection. (line 10)
5960 * u8_mblen: Elementary string functions.
5962 * u8_mbsnlen: Elementary string functions.
5964 * u8_mbtouc: Elementary string functions.
5966 * u8_mbtoucr: Elementary string functions.
5968 * u8_mbtouc_unsafe: Elementary string functions.
5970 * u8_move: Elementary string functions.
5972 * u8_next: Elementary string functions on NUL terminated strings.
5974 * u8_normalize: Normalization of strings.
5976 * u8_normcmp: Normalizing comparisons.
5978 * u8_normcoll: Normalizing comparisons.
5980 * u8_normxfrm: Normalizing comparisons.
5982 * u8_possible_linebreaks: unilbrk.h. (line 42)
5983 * u8_prev: Elementary string functions on NUL terminated strings.
5985 * u8_set: Elementary string functions.
5987 * u8_snprintf: unistdio.h. (line 68)
5988 * u8_sprintf: unistdio.h. (line 67)
5989 * u8_startswith: Elementary string functions on NUL terminated strings.
5991 * u8_stpcpy: Elementary string functions on NUL terminated strings.
5993 * u8_stpncpy: Elementary string functions on NUL terminated strings.
5995 * u8_strcat: Elementary string functions on NUL terminated strings.
5997 * u8_strchr: Elementary string functions on NUL terminated strings.
5999 * u8_strcmp: Elementary string functions on NUL terminated strings.
6001 * u8_strcoll: Elementary string functions on NUL terminated strings.
6003 * u8_strconv_from_encoding: uniconv.h. (line 125)
6004 * u8_strconv_from_locale: uniconv.h. (line 155)
6005 * u8_strconv_to_encoding: uniconv.h. (line 138)
6006 * u8_strconv_to_locale: uniconv.h. (line 165)
6007 * u8_strcpy: Elementary string functions on NUL terminated strings.
6009 * u8_strcspn: Elementary string functions on NUL terminated strings.
6011 * u8_strdup: Elementary string functions on NUL terminated strings.
6013 * u8_strlen: Elementary string functions on NUL terminated strings.
6015 * u8_strmblen: Elementary string functions on NUL terminated strings.
6017 * u8_strmbtouc: Elementary string functions on NUL terminated strings.
6019 * u8_strncat: Elementary string functions on NUL terminated strings.
6021 * u8_strncmp: Elementary string functions on NUL terminated strings.
6023 * u8_strncpy: Elementary string functions on NUL terminated strings.
6025 * u8_strnlen: Elementary string functions on NUL terminated strings.
6027 * u8_strpbrk: Elementary string functions on NUL terminated strings.
6029 * u8_strrchr: Elementary string functions on NUL terminated strings.
6031 * u8_strspn: Elementary string functions on NUL terminated strings.
6033 * u8_strstr: Elementary string functions on NUL terminated strings.
6035 * u8_strtok: Elementary string functions on NUL terminated strings.
6037 * u8_strwidth: uniwidth.h. (line 37)
6038 * u8_tolower: Case mappings of strings.
6040 * u8_totitle: Case mappings of strings.
6042 * u8_toupper: Case mappings of strings.
6044 * u8_to_u16: Elementary string conversions.
6046 * u8_to_u32: Elementary string conversions.
6048 * u8_u8_asnprintf: unistdio.h. (line 92)
6049 * u8_u8_asprintf: unistdio.h. (line 90)
6050 * u8_u8_snprintf: unistdio.h. (line 88)
6051 * u8_u8_sprintf: unistdio.h. (line 86)
6052 * u8_u8_vasnprintf: unistdio.h. (line 100)
6053 * u8_u8_vasprintf: unistdio.h. (line 98)
6054 * u8_u8_vsnprintf: unistdio.h. (line 96)
6055 * u8_u8_vsprintf: unistdio.h. (line 94)
6056 * u8_uctomb: Elementary string functions.
6058 * u8_vasnprintf: unistdio.h. (line 80)
6059 * u8_vasprintf: unistdio.h. (line 78)
6060 * u8_vsnprintf: unistdio.h. (line 76)
6061 * u8_vsprintf: unistdio.h. (line 74)
6062 * u8_width: uniwidth.h. (line 27)
6063 * u8_width_linebreaks: unilbrk.h. (line 59)
6064 * u8_wordbreaks: Word breaks in a string.
6066 * UCS-4: Unicode. (line 14)
6067 * ucs4_t: unitypes.h. (line 15)
6068 * uc_all_blocks: Blocks. (line 36)
6069 * uc_all_scripts: Scripts. (line 35)
6070 * uc_bidi_category: Bidi class. (line 93)
6071 * uc_bidi_category_byname: Bidi class. (line 83)
6072 * uc_bidi_category_name: Bidi class. (line 75)
6073 * uc_bidi_class: Bidi class. (line 92)
6074 * uc_bidi_class_byname: Bidi class. (line 82)
6075 * uc_bidi_class_long_name: Bidi class. (line 79)
6076 * uc_bidi_class_name: Bidi class. (line 74)
6077 * uc_block: Blocks. (line 26)
6078 * uc_block_t: Blocks. (line 11)
6079 * uc_canonical_decomposition: Decomposition of characters.
6081 * uc_combining_class: Canonical combining class.
6083 * uc_combining_class_byname: Canonical combining class.
6085 * uc_combining_class_long_name: Canonical combining class.
6087 * uc_combining_class_name: Canonical combining class.
6089 * uc_composition: Composition of characters.
6091 * uc_c_ident_category: ISO C and Java syntax.
6093 * uc_decimal_value: Decimal digit value. (line 10)
6094 * uc_decomposition: Decomposition of characters.
6096 * uc_digit_value: Digit value. (line 10)
6097 * uc_fraction_t: Numeric value. (line 12)
6098 * uc_general_category: Object oriented API. (line 219)
6099 * uc_general_category_and: Object oriented API. (line 180)
6100 * uc_general_category_and_not: Object oriented API. (line 187)
6101 * uc_general_category_byname: Object oriented API. (line 209)
6102 * uc_general_category_long_name: Object oriented API. (line 203)
6103 * uc_general_category_name: Object oriented API. (line 197)
6104 * uc_general_category_or: Object oriented API. (line 174)
6105 * uc_general_category_t: Object oriented API. (line 6)
6106 * uc_graphemeclusterbreak_property: Grapheme cluster break property.
6108 * uc_is_alnum: Classifications like in ISO C.
6110 * uc_is_alpha: Classifications like in ISO C.
6112 * uc_is_bidi_category: Bidi class. (line 97)
6113 * uc_is_bidi_class: Bidi class. (line 96)
6114 * uc_is_blank: Classifications like in ISO C.
6116 * uc_is_block: Blocks. (line 31)
6117 * uc_is_cntrl: Classifications like in ISO C.
6119 * uc_is_c_whitespace: ISO C and Java syntax.
6121 * uc_is_digit: Classifications like in ISO C.
6123 * uc_is_general_category: Object oriented API. (line 224)
6124 * uc_is_general_category_withtable: Bit mask API. (line 51)
6125 * uc_is_graph: Classifications like in ISO C.
6127 * uc_is_grapheme_break: Grapheme cluster break property.
6129 * uc_is_java_whitespace: ISO C and Java syntax.
6131 * uc_is_lower: Classifications like in ISO C.
6133 * uc_is_print: Classifications like in ISO C.
6135 * uc_is_property: Properties as objects.
6137 * uc_is_property_alphabetic: Properties as functions.
6139 * uc_is_property_ascii_hex_digit: Properties as functions.
6141 * uc_is_property_bidi_arabic_digit: Properties as functions.
6143 * uc_is_property_bidi_arabic_right_to_left: Properties as functions.
6145 * uc_is_property_bidi_block_separator: Properties as functions.
6147 * uc_is_property_bidi_boundary_neutral: Properties as functions.
6149 * uc_is_property_bidi_common_separator: Properties as functions.
6151 * uc_is_property_bidi_control: Properties as functions.
6153 * uc_is_property_bidi_embedding_or_override: Properties as functions.
6155 * uc_is_property_bidi_european_digit: Properties as functions.
6157 * uc_is_property_bidi_eur_num_separator: Properties as functions.
6159 * uc_is_property_bidi_eur_num_terminator: Properties as functions.
6161 * uc_is_property_bidi_hebrew_right_to_left: Properties as functions.
6163 * uc_is_property_bidi_left_to_right: Properties as functions.
6165 * uc_is_property_bidi_non_spacing_mark: Properties as functions.
6167 * uc_is_property_bidi_other_neutral: Properties as functions.
6169 * uc_is_property_bidi_pdf: Properties as functions.
6171 * uc_is_property_bidi_segment_separator: Properties as functions.
6173 * uc_is_property_bidi_whitespace: Properties as functions.
6175 * uc_is_property_cased: Properties as functions.
6177 * uc_is_property_case_ignorable: Properties as functions.
6179 * uc_is_property_changes_when_casefolded: Properties as functions.
6181 * uc_is_property_changes_when_casemapped: Properties as functions.
6183 * uc_is_property_changes_when_lowercased: Properties as functions.
6185 * uc_is_property_changes_when_titlecased: Properties as functions.
6187 * uc_is_property_changes_when_uppercased: Properties as functions.
6189 * uc_is_property_combining: Properties as functions.
6191 * uc_is_property_composite: Properties as functions.
6193 * uc_is_property_currency_symbol: Properties as functions.
6195 * uc_is_property_dash: Properties as functions.
6197 * uc_is_property_decimal_digit: Properties as functions.
6199 * uc_is_property_default_ignorable_code_point: Properties as functions.
6201 * uc_is_property_deprecated: Properties as functions.
6203 * uc_is_property_diacritic: Properties as functions.
6205 * uc_is_property_extender: Properties as functions.
6207 * uc_is_property_format_control: Properties as functions.
6209 * uc_is_property_grapheme_base: Properties as functions.
6211 * uc_is_property_grapheme_extend: Properties as functions.
6213 * uc_is_property_grapheme_link: Properties as functions.
6215 * uc_is_property_hex_digit: Properties as functions.
6217 * uc_is_property_hyphen: Properties as functions.
6219 * uc_is_property_ideographic: Properties as functions.
6221 * uc_is_property_ids_binary_operator: Properties as functions.
6223 * uc_is_property_ids_trinary_operator: Properties as functions.
6225 * uc_is_property_id_continue: Properties as functions.
6227 * uc_is_property_id_start: Properties as functions.
6229 * uc_is_property_ignorable_control: Properties as functions.
6231 * uc_is_property_iso_control: Properties as functions.
6233 * uc_is_property_join_control: Properties as functions.
6235 * uc_is_property_left_of_pair: Properties as functions.
6237 * uc_is_property_line_separator: Properties as functions.
6239 * uc_is_property_logical_order_exception: Properties as functions.
6241 * uc_is_property_lowercase: Properties as functions.
6243 * uc_is_property_math: Properties as functions.
6245 * uc_is_property_non_break: Properties as functions.
6247 * uc_is_property_not_a_character: Properties as functions.
6249 * uc_is_property_numeric: Properties as functions.
6251 * uc_is_property_other_alphabetic: Properties as functions.
6253 * uc_is_property_other_default_ignorable_code_point: Properties as functions.
6255 * uc_is_property_other_grapheme_extend: Properties as functions.
6257 * uc_is_property_other_id_continue: Properties as functions.
6259 * uc_is_property_other_id_start: Properties as functions.
6261 * uc_is_property_other_lowercase: Properties as functions.
6263 * uc_is_property_other_math: Properties as functions.
6265 * uc_is_property_other_uppercase: Properties as functions.
6267 * uc_is_property_paired_punctuation: Properties as functions.
6269 * uc_is_property_paragraph_separator: Properties as functions.
6271 * uc_is_property_pattern_syntax: Properties as functions.
6273 * uc_is_property_pattern_white_space: Properties as functions.
6275 * uc_is_property_private_use: Properties as functions.
6277 * uc_is_property_punctuation: Properties as functions.
6279 * uc_is_property_quotation_mark: Properties as functions.
6281 * uc_is_property_radical: Properties as functions.
6283 * uc_is_property_sentence_terminal: Properties as functions.
6285 * uc_is_property_soft_dotted: Properties as functions.
6287 * uc_is_property_space: Properties as functions.
6289 * uc_is_property_terminal_punctuation: Properties as functions.
6291 * uc_is_property_titlecase: Properties as functions.
6293 * uc_is_property_unassigned_code_value: Properties as functions.
6295 * uc_is_property_unified_ideograph: Properties as functions.
6297 * uc_is_property_uppercase: Properties as functions.
6299 * uc_is_property_variation_selector: Properties as functions.
6301 * uc_is_property_white_space: Properties as functions.
6303 * uc_is_property_xid_continue: Properties as functions.
6305 * uc_is_property_xid_start: Properties as functions.
6307 * uc_is_property_zero_width: Properties as functions.
6309 * uc_is_punct: Classifications like in ISO C.
6311 * uc_is_script: Scripts. (line 30)
6312 * uc_is_space: Classifications like in ISO C.
6314 * uc_is_upper: Classifications like in ISO C.
6316 * uc_is_xdigit: Classifications like in ISO C.
6318 * uc_java_ident_category: ISO C and Java syntax.
6320 * uc_joining_group: Joining group. (line 85)
6321 * uc_joining_group_byname: Joining group. (line 76)
6322 * uc_joining_group_name: Joining group. (line 73)
6323 * uc_joining_type: Joining type. (line 54)
6324 * uc_joining_type_byname: Joining type. (line 45)
6325 * uc_joining_type_long_name: Joining type. (line 42)
6326 * uc_joining_type_name: Joining type. (line 39)
6327 * uc_locale_language: Case mappings of strings.
6329 * uc_mirror_char: Mirrored character. (line 13)
6330 * uc_numeric_value: Numeric value. (line 21)
6331 * uc_property_byname: Properties as objects.
6333 * uc_property_is_valid: Properties as objects.
6335 * uc_property_t: Properties as objects.
6337 * uc_script: Scripts. (line 19)
6338 * uc_script_byname: Scripts. (line 23)
6339 * uc_script_t: Scripts. (line 10)
6340 * uc_tolower: Case mappings of characters.
6342 * uc_totitle: Case mappings of characters.
6344 * uc_toupper: Case mappings of characters.
6346 * uc_width: uniwidth.h. (line 22)
6347 * uc_wordbreak_property: Word break property. (line 31)
6348 * uint16_t: unitypes.h. (line 9)
6349 * uint32_t: unitypes.h. (line 10)
6350 * uint8_t: unitypes.h. (line 8)
6351 * ulc_asnprintf: unistdio.h. (line 49)
6352 * ulc_asprintf: unistdio.h. (line 47)
6353 * ulc_casecmp: Case insensitive comparison.
6355 * ulc_casecoll: Case insensitive comparison.
6357 * ulc_casexfrm: Case insensitive comparison.
6359 * ulc_fprintf: unistdio.h. (line 184)
6360 * ulc_grapheme_breaks: Grapheme cluster breaks in a string.
6362 * ulc_possible_linebreaks: unilbrk.h. (line 48)
6363 * ulc_snprintf: unistdio.h. (line 44)
6364 * ulc_sprintf: unistdio.h. (line 42)
6365 * ulc_vasnprintf: unistdio.h. (line 61)
6366 * ulc_vasprintf: unistdio.h. (line 58)
6367 * ulc_vfprintf: unistdio.h. (line 185)
6368 * ulc_vsnprintf: unistdio.h. (line 55)
6369 * ulc_vsprintf: unistdio.h. (line 52)
6370 * ulc_width_linebreaks: unilbrk.h. (line 68)
6371 * ulc_wordbreaks: Word breaks in a string.
6373 * Unicode: Unicode. (line 6)
6374 * Unicode character, bidi class: Bidi class. (line 6)
6375 * Unicode character, bidirectional category: Bidi class. (line 6)
6376 * Unicode character, block: Blocks. (line 24)
6377 * Unicode character, canonical combining class: Canonical combining class.
6379 * Unicode character, case mappings: Case mappings of characters.
6381 * Unicode character, classification: General category. (line 6)
6382 * Unicode character, classification like in C: Classifications like in ISO C.
6384 * Unicode character, general category: General category. (line 6)
6385 * Unicode character, mirroring: Mirrored character. (line 6)
6386 * Unicode character, name: uniname.h. (line 6)
6387 * Unicode character, properties: Properties. (line 6)
6388 * Unicode character, script: Scripts. (line 17)
6389 * Unicode character, validity in C identifiers: ISO C and Java syntax.
6391 * Unicode character, validity in Java identifiers: ISO C and Java syntax.
6393 * Unicode character, value: Decimal digit value. (line 6)
6394 * Unicode character, value <1>: Digit value. (line 6)
6395 * Unicode character, value <2>: Numeric value. (line 6)
6396 * Unicode character, width: uniwidth.h. (line 22)
6397 * unicode_character_name: uniname.h. (line 18)
6398 * unicode_name_character: uniname.h. (line 24)
6399 * uninorm_decomposing_form: Normalization of strings.
6401 * uninorm_filter_create: Normalization of streams.
6403 * uninorm_filter_flush: Normalization of streams.
6405 * uninorm_filter_free: Normalization of streams.
6407 * uninorm_filter_write: Normalization of streams.
6409 * uninorm_is_compat_decomposing: Normalization of strings.
6411 * uninorm_is_composing: Normalization of strings.
6413 * uninorm_t: Normalization of strings.
6415 * uppercasing: Case mappings of strings.
6417 * use cases: Introduction. (line 36)
6418 * UTF-16: Unicode. (line 14)
6419 * UTF-16, strings: Unicode strings. (line 6)
6420 * UTF-32: Unicode. (line 14)
6421 * UTF-32, strings: Unicode strings. (line 6)
6422 * UTF-8: Unicode. (line 14)
6423 * UTF-8, strings: Unicode strings. (line 6)
6424 * validity: Elementary string checks.
6426 * value, of libunistring: Introduction. (line 36)
6427 * value, of Unicode character: Decimal digit value. (line 6)
6428 * value, of Unicode character <1>: Digit value. (line 6)
6429 * value, of Unicode character <2>: Numeric value. (line 6)
6430 * verification: Elementary string checks.
6432 * wchar_t, type: The wchar_t mess. (line 6)
6433 * well-formed: Elementary string checks.
6435 * width: uniwidth.h. (line 6)
6436 * word boundaries: uniwbrk.h. (line 6)
6437 * word breaks: uniwbrk.h. (line 6)
6438 * wrapping: unilbrk.h. (line 6)
6444 Node: Introduction
\7f3400
6445 Node: Unicode
\7f5493
6446 Node: Unicode and i18n
\7f7378
6447 Node: Locale encodings
\7f8848
6448 Node: In-memory representation
\7f11113
6449 Node: char * strings
\7f12239
6450 Node: The wchar_t mess
\7f17727
6451 Node: Unicode strings
\7f20035
6452 Node: Conventions
\7f21220
6453 Node: unitypes.h
\7f23512
6454 Node: unistr.h
\7f24096
6455 Node: Elementary string checks
\7f24661
6456 Node: Elementary string conversions
\7f25283
6457 Node: Elementary string functions
\7f26585
6458 Node: Elementary string functions with memory allocation
\7f33644
6459 Node: Elementary string functions on NUL terminated strings
\7f34266
6460 Node: uniconv.h
\7f46494
6461 Node: unistdio.h
\7f54447
6462 Node: uniname.h
\7f62700
6463 Node: unictype.h
\7f64059
6464 Node: General category
\7f64987
6465 Node: Object oriented API
\7f66042
6466 Node: Bit mask API
\7f75276
6467 Node: Canonical combining class
\7f77571
6468 Node: Bidi class
\7f81805
6469 Node: Decimal digit value
\7f85218
6470 Node: Digit value
\7f85775
6471 Node: Numeric value
\7f86336
6472 Node: Mirrored character
\7f87238
6473 Node: Arabic shaping
\7f87931
6474 Node: Joining type
\7f88404
6475 Node: Joining group
\7f90554
6476 Node: Properties
\7f93992
6477 Node: Properties as objects
\7f94683
6478 Node: Properties as functions
\7f101705
6479 Node: Scripts
\7f107721
6480 Node: Blocks
\7f109126
6481 Node: ISO C and Java syntax
\7f110469
6482 Node: Classifications like in ISO C
\7f112187
6483 Node: uniwidth.h
\7f114999
6484 Node: unigbrk.h
\7f117045
6485 Node: Grapheme cluster breaks in a string
\7f118539
6486 Node: Grapheme cluster break property
\7f120644
6487 Node: uniwbrk.h
\7f122545
6488 Node: Word breaks in a string
\7f123083
6489 Node: Word break property
\7f124175
6490 Node: unilbrk.h
\7f125274
6491 Node: uninorm.h
\7f129570
6492 Node: Decomposition of characters
\7f130207
6493 Node: Composition of characters
\7f133684
6494 Node: Normalization of strings
\7f134397
6495 Node: Normalizing comparisons
\7f136474
6496 Node: Normalization of streams
\7f138876
6497 Node: unicase.h
\7f141001
6498 Node: Case mappings of characters
\7f141690
6499 Node: Case mappings of strings
\7f143839
6500 Node: Case mappings of substrings
\7f147190
6501 Node: Case insensitive comparison
\7f154112
6502 Node: Case detection
\7f159517
6503 Node: uniregex.h
\7f162831
6504 Node: Using the library
\7f163058
6505 Node: Installation
\7f163469
6506 Node: Compiler options
\7f163954
6507 Node: Include files
\7f165594
6508 Node: Autoconf macro
\7f166847
6509 Node: Reporting problems
\7f168487
6510 Node: More functionality
\7f169305
6511 Node: Licenses
\7f169748
6512 Node: GNU GPL
\7f171386
6513 Node: GNU LGPL
\7f209130
6514 Node: GNU FDL
\7f217612
6515 Node: Index
\7f242916