1 <!-- ##### SECTION Title ##### -->
2 Character Set Conversion
4 <!-- ##### SECTION Short_Description ##### -->
5 convert strings between different character sets using <function>iconv()</function>.
7 <!-- ##### SECTION Long_Description ##### -->
12 <refsect2 id="file-name-encodings">
13 <title>File Name Encodings</title>
16 Historically, Unix has not had a defined encoding for file
17 names: a file name is valid as long as it does not have path
18 separators in it ("/"). However, displaying file names may
19 require conversion: from the character set in which they were
20 created, to the character set in which the application
21 operates. Consider the Spanish file name
22 "<filename>Presentación.sxi</filename>". If the
23 application which created it uses ISO-8859-1 for its encoding,
24 then the actual file name on disk would look like this:
27 <programlisting id="filename-iso8859-1">
28 Character: P r e s e n t a c i ó n . s x i
29 Hex code: 50 72 65 73 65 6e 74 61 63 69 f3 6e 2e 73 78 69
33 However, if the application use UTF-8, the actual file name on
34 disk would look like this:
37 <programlisting id="filename-utf-8">
38 Character: P r e s e n t a c i ó n . s x i
39 Hex code: 50 72 65 73 65 6e 74 61 63 69 c3 b3 6e 2e 73 78 69
43 Glib uses UTF-8 for its strings, and GUI toolkits like GTK+
44 that use Glib do the same thing. If you get a file name from
45 the file system, for example, from
46 <function>readdir(3)</function> or from <link
47 linkend="g_dir_read_name"><function>g_dir_read_name()</function></link>,
48 and you wish to display the file name to the user, you
49 <emphasis>will</emphasis> need to convert it into UTF-8. The
50 opposite case is when the user types the name of a file he
51 wishes to save: the toolkit will give you that string in
52 UTF-8 encoding, and you will need to convert it to the
53 character set used for file names before you can create the
54 file with <function>open(2)</function> or
55 <function>fopen(3)</function>.
59 By default, Glib assumes that file names on disk are in UTF-8
60 encoding. This is a valid assumption for file systems which
61 were created relatively recently: most applications use UTF-8
62 encoding for their strings, and that is also what they use for
63 the file names they create. However, older file systems may
64 still contain file names created in "older" encodings, such as
65 ISO-8859-1. In this case, for compatibility reasons, you may
66 want to instruct Glib to use that particular encoding for file
67 names rather than UTF-8. You can do this by specifying the
68 encoding for file names in the <link
69 linkend="G_FILENAME_ENCODING"><envar>G_FILENAME_ENCODING</envar></link>
70 environment variable. For example, if your installation uses
71 ISO-8859-1 for file names, you can put this in your
72 <filename>~/.profile</filename>:
76 export G_FILENAME_ENCODING=ISO-8859-1
80 Glib provides the functions <link
81 linkend="g_filename_to_utf8"><function>g_filename_to_utf8()</function></link>
83 linkend="g_filename_from_utf8"><function>g_filename_from_utf8()</function></link>
84 to perform the necessary conversions. These functions convert
85 file names from the encoding specified in
86 <envar>G_FILENAME_ENCODING</envar> to UTF-8 and vice-versa.
87 <xref linkend="file-name-encodings-diagram"/> illustrates how
88 these functions are used to convert between UTF-8 and the
89 encoding for file names in the file system.
92 <figure id="file-name-encodings-diagram">
93 <title>Conversion between File Name Encodings</title>
94 <graphic fileref="file-name-encodings.png" format="PNG"/>
97 <refsect3 id="file-name-encodings-checklist">
98 <title>Checklist for Application Writers</title>
101 This section is a practical summary of the detailed
102 description above. You can use this as a checklist of
103 things to do to make sure your applications process file
104 name encodings correctly.
110 If you get a file name from the file system from a
111 function such as <function>readdir(3)</function> or
112 <function>gtk_file_chooser_get_filename()</function>,
113 you do not need to do any conversion to pass that
114 file name to functions like <function>open(2)</function>,
115 <function>rename(2)</function>, or
116 <function>fopen(3)</function> — those are "raw"
117 file names which the file system understands.
123 If you need to display a file name, convert it to UTF-8
125 linkend="g_filename_to_utf8"><function>g_filename_to_utf8()</function></link>.
126 If conversion fails, display a string like
127 "<literal>Unknown file name</literal>". <emphasis>Do
128 not</emphasis> convert this string back into the
129 encoding used for file names if you wish to pass it to
130 the file system; use the original file name instead.
131 For example, the document window of a word processor
132 could display "Unknown file name" in its title bar but
133 still let the user save the file, as it would keep the
134 raw file name internally. This can happen if the user
135 has not set the <envar>G_FILENAME_ENCODING</envar>
136 environment variable even though he has files whose
137 names are not encoded in UTF-8.
143 If your user interface lets the user type a file name
144 for saving or renaming, convert it to the encoding used
145 for file names in the file system by using <link
146 linkend="g_filename_from_utf8"><function>g_filename_from_utf8()</function></link>.
147 Pass the converted file name to functions like
148 <function>fopen(3)</function>. If conversion fails, ask
149 the user to enter a different file name. This can
150 happen if the user types Japanese characters when
151 <envar>G_FILENAME_ENCODING</envar> is set to
152 <literal>ISO-8859-1</literal>, for example.
159 <!-- ##### SECTION See_Also ##### -->
164 <!-- ##### FUNCTION g_convert ##### -->
179 <!-- ##### FUNCTION g_convert_with_fallback ##### -->
195 <!-- ##### STRUCT GIConv ##### -->
197 The <structname>GIConv</structname> struct wraps an
198 <function>iconv()</function> conversion descriptor. It contains private data
199 and should only be accessed using the following functions.
203 <!-- ##### FUNCTION g_convert_with_iconv ##### -->
217 <!-- ##### MACRO G_CONVERT_ERROR ##### -->
219 Error domain for character set conversions. Errors in this domain will
220 be from the #GConvertError enumeration. See #GError for information on
226 <!-- ##### FUNCTION g_iconv_open ##### -->
236 <!-- ##### FUNCTION g_iconv ##### -->
249 <!-- ##### FUNCTION g_iconv_close ##### -->
258 <!-- ##### FUNCTION g_locale_to_utf8 ##### -->
271 <!-- ##### FUNCTION g_filename_to_utf8 ##### -->
284 <!-- ##### FUNCTION g_filename_from_utf8 ##### -->
297 <!-- ##### FUNCTION g_filename_from_uri ##### -->
308 <!-- ##### FUNCTION g_filename_to_uri ##### -->
319 <!-- ##### FUNCTION g_get_filename_charsets ##### -->
328 <!-- ##### FUNCTION g_filename_display_name ##### -->
337 <!-- ##### FUNCTION g_uri_list_extract_uris ##### -->
346 <!-- ##### FUNCTION g_locale_from_utf8 ##### -->
359 <!-- ##### ENUM GConvertError ##### -->
361 Error codes returned by character set conversion routines.
364 @G_CONVERT_ERROR_NO_CONVERSION: Conversion between the requested character sets
366 @G_CONVERT_ERROR_ILLEGAL_SEQUENCE: Invalid byte sequence in conversion input.
367 @G_CONVERT_ERROR_FAILED: Conversion failed for some reason.
368 @G_CONVERT_ERROR_PARTIAL_INPUT: Partial character sequence at end of input.
369 @G_CONVERT_ERROR_BAD_URI: URI is invalid.
370 @G_CONVERT_ERROR_NOT_ABSOLUTE_PATH: Pathname is not an absolute path.
372 <!-- ##### FUNCTION g_get_charset ##### -->
384 sgml-parent-document: ("../glib-docs.sgml" "book" "refentry")