the UTF-8, UTF-16 and UCS-4 encodings of Unicode.
</para>
+<para>
+The implementations of the Unicode functions in GLib are based
+on the Unicode Character Data tables, which are available from
+<ulink url="http://www.unicode.org">www.unicode.org</ulink>.
+GLib 2.8 supports Unicode 4.0, GLib 2.10 supports Unicode 4.1,
+GLib 2.12 supports Unicode 5.0.
+</para>
+
<!-- ##### SECTION See_Also ##### -->
<para>
<variablelist>
</variablelist>
</para>
+<!-- ##### SECTION Stability_Level ##### -->
+
+
<!-- ##### TYPEDEF gunichar ##### -->
<para>
A type which can hold any UCS-4 character code.
<!-- ##### TYPEDEF gunichar2 ##### -->
<para>
-A type which can hold any UTF-16 character code.
+A type which can hold any UTF-16 code
+point<footnote id="utf16_surrogate_pairs">UTF-16 also has so called
+<firstterm>surrogate pairs</firstterm> to encode characters beyond the
+BMP as pairs of 16bit numbers. Surrogate pairs cannot be stored in a
+single gunichar2 field, but all GLib functions accepting gunichar2 arrays
+will correctly interpret surrogate pairs.</footnote>.
</para>
@Returns:
+<!-- ##### FUNCTION g_unichar_iswide_cjk ##### -->
+<para>
+
+</para>
+
+@c:
+@Returns:
+
+
<!-- ##### FUNCTION g_unichar_toupper ##### -->
<para>
<!-- ##### ENUM GUnicodeBreakType ##### -->
<para>
These are the possible line break classifications.
+The five Hangul types were added in Unicode 4.1, so, has been
+introduced in GLib 2.10. Note that new types may be added in the future.
+Applications should be ready to handle unknown values.
+They may be regarded as %G_UNICODE_BREAK_UNKNOWN.
See <ulink url="http://www.unicode.org/unicode/reports/tr14/"
>http://www.unicode.org/unicode/reports/tr14/</ulink>.
</para>
@G_UNICODE_BREAK_UNKNOWN:
@G_UNICODE_BREAK_NEXT_LINE:
@G_UNICODE_BREAK_WORD_JOINER:
+@G_UNICODE_BREAK_HANGUL_L_JAMO:
+@G_UNICODE_BREAK_HANGUL_V_JAMO:
+@G_UNICODE_BREAK_HANGUL_T_JAMO:
+@G_UNICODE_BREAK_HANGUL_LV_SYLLABLE:
+@G_UNICODE_BREAK_HANGUL_LVT_SYLLABLE:
<!-- ##### FUNCTION g_unichar_break_type ##### -->
<para>
@Returns:
+<!-- ##### FUNCTION g_unichar_get_mirror_char ##### -->
+<para>
+
+</para>
+
+@ch:
+@mirrored_ch:
+@Returns:
+
+
+<!-- ##### ENUM GUnicodeScript ##### -->
+<para>
+The #GUnicodeScript enumeration identifies different writing
+systems. The values correspond to the names as defined in the
+Unicode standard. The enumeration has been added in GLib 2.14.
+Note that new types may be added in the future. Applications
+should be ready to handle unknown values.
+See <ulink
+url="http://www.unicode.org/reports/tr24/">Unicode Standard Annex
+#24: Script names</ulink>.
+</para>
+
+@G_UNICODE_SCRIPT_INVALID_CODE: a value never returned from g_unichar_get_script()
+@G_UNICODE_SCRIPT_COMMON: a character used by multiple different scripts
+@G_UNICODE_SCRIPT_INHERITED: a mark glyph that takes its script from the
+ base glyph to which it is attached
+@G_UNICODE_SCRIPT_ARABIC: Arabic
+@G_UNICODE_SCRIPT_ARMENIAN: Armenian
+@G_UNICODE_SCRIPT_BENGALI: Bengali
+@G_UNICODE_SCRIPT_BOPOMOFO: Bopomofo
+@G_UNICODE_SCRIPT_CHEROKEE: Cherokee
+@G_UNICODE_SCRIPT_COPTIC: Coptic
+@G_UNICODE_SCRIPT_CYRILLIC: Cyrillic
+@G_UNICODE_SCRIPT_DESERET: Deseret
+@G_UNICODE_SCRIPT_DEVANAGARI: Devanagari
+@G_UNICODE_SCRIPT_ETHIOPIC: Ethiopic
+@G_UNICODE_SCRIPT_GEORGIAN: Georgian
+@G_UNICODE_SCRIPT_GOTHIC: Gothic
+@G_UNICODE_SCRIPT_GREEK: Greek
+@G_UNICODE_SCRIPT_GUJARATI: Gujarati
+@G_UNICODE_SCRIPT_GURMUKHI: Gurmukhi
+@G_UNICODE_SCRIPT_HAN: Han
+@G_UNICODE_SCRIPT_HANGUL: Hangul
+@G_UNICODE_SCRIPT_HEBREW: Hebrew
+@G_UNICODE_SCRIPT_HIRAGANA: Hiragana
+@G_UNICODE_SCRIPT_KANNADA: Kannada
+@G_UNICODE_SCRIPT_KATAKANA: Katakana
+@G_UNICODE_SCRIPT_KHMER: Khmer
+@G_UNICODE_SCRIPT_LAO: Lao
+@G_UNICODE_SCRIPT_LATIN: Latin
+@G_UNICODE_SCRIPT_MALAYALAM: Malayalam
+@G_UNICODE_SCRIPT_MONGOLIAN: Mongolian
+@G_UNICODE_SCRIPT_MYANMAR: Myanmar
+@G_UNICODE_SCRIPT_OGHAM: Ogham
+@G_UNICODE_SCRIPT_OLD_ITALIC: Old Italic
+@G_UNICODE_SCRIPT_ORIYA: Oriya
+@G_UNICODE_SCRIPT_RUNIC: Runic
+@G_UNICODE_SCRIPT_SINHALA: Sinhala
+@G_UNICODE_SCRIPT_SYRIAC: Syriac
+@G_UNICODE_SCRIPT_TAMIL: Tamil
+@G_UNICODE_SCRIPT_TELUGU: Telugu
+@G_UNICODE_SCRIPT_THAANA: Thaana
+@G_UNICODE_SCRIPT_THAI: Thai
+@G_UNICODE_SCRIPT_TIBETAN: Tibetan
+@G_UNICODE_SCRIPT_CANADIAN_ABORIGINAL:
+ Canadian Aboriginal
+@G_UNICODE_SCRIPT_YI: Yi
+@G_UNICODE_SCRIPT_TAGALOG: Tagalog
+@G_UNICODE_SCRIPT_HANUNOO: Hanunoo
+@G_UNICODE_SCRIPT_BUHID: Buhid
+@G_UNICODE_SCRIPT_TAGBANWA: Tagbanwa
+@G_UNICODE_SCRIPT_BRAILLE: Braille
+@G_UNICODE_SCRIPT_CYPRIOT: Cypriot
+@G_UNICODE_SCRIPT_LIMBU: Limbu
+@G_UNICODE_SCRIPT_OSMANYA: Osmanya
+@G_UNICODE_SCRIPT_SHAVIAN: Shavian
+@G_UNICODE_SCRIPT_LINEAR_B: Linear B
+@G_UNICODE_SCRIPT_TAI_LE: Tai Le
+@G_UNICODE_SCRIPT_UGARITIC: Ugaritic
+@G_UNICODE_SCRIPT_NEW_TAI_LUE: New Tai Lue
+@G_UNICODE_SCRIPT_BUGINESE: Buginese
+@G_UNICODE_SCRIPT_GLAGOLITIC: Glagolitic
+@G_UNICODE_SCRIPT_TIFINAGH: Tifinagh
+@G_UNICODE_SCRIPT_SYLOTI_NAGRI: Syloti Nagri
+@G_UNICODE_SCRIPT_OLD_PERSIAN: Old Persian
+@G_UNICODE_SCRIPT_KHAROSHTHI: Kharoshthi
+@G_UNICODE_SCRIPT_UNKNOWN: an unassigned code point
+@G_UNICODE_SCRIPT_BALINESE: Balinese
+@G_UNICODE_SCRIPT_CUNEIFORM: Cuneiform
+@G_UNICODE_SCRIPT_PHOENICIAN: Phoenician
+@G_UNICODE_SCRIPT_PHAGS_PA: Phags-pa
+@G_UNICODE_SCRIPT_NKO: N'Ko
+
+
+<!-- ##### FUNCTION g_unichar_get_script ##### -->
+<para>
+
+</para>
+
+@ch:
+@Returns:
+
+
<!-- ##### MACRO g_utf8_next_char ##### -->
<para>
Skips to the next character in a UTF-8 string. The string must be
@p:
@end:
@Returns:
-<!-- # Unused Parameters # -->
-@bound:
<!-- ##### FUNCTION g_utf8_find_prev_char ##### -->
@len:
@c:
@Returns:
-<!-- # Unused Parameters # -->
-@ch:
<!-- ##### FUNCTION g_utf8_strrchr ##### -->
@len:
@c:
@Returns:
-<!-- # Unused Parameters # -->
-@ch:
<!-- ##### FUNCTION g_utf8_strreverse ##### -->
@Returns:
+<!-- ##### FUNCTION g_utf8_collate_key_for_filename ##### -->
+<para>
+
+</para>
+
+@str:
+@len:
+@Returns:
+
+
<!-- ##### FUNCTION g_utf8_to_utf16 ##### -->
<para>