3 <!-- This HTML file has been created by texi2html 1.52b
4 from gettext.texi on 28 December 2015 -->
6 <META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
7 <TITLE>GNU gettext utilities - 15 Other Programming Languages</TITLE>
10 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_14.html">previous</A>, <A HREF="gettext_16.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
14 <H1><A NAME="SEC248" HREF="gettext_toc.html#TOC248">15 Other Programming Languages</A></H1>
17 While the presentation of <CODE>gettext</CODE> focuses mostly on C and
18 implicitly applies to C++ as well, its scope is far broader than that:
19 Many programming languages, scripting languages and other textual data
20 like GUI resources or package descriptions can make use of the gettext
27 <H2><A NAME="SEC249" HREF="gettext_toc.html#TOC249">15.1 The Language Implementor's View</A></H2>
29 <A NAME="IDX1217"></A>
30 <A NAME="IDX1218"></A>
34 All programming and scripting languages that have the notion of strings
35 are eligible to supporting <CODE>gettext</CODE>. Supporting <CODE>gettext</CODE>
43 You should add to the language a syntax for translatable strings. In
44 principle, a function call of <CODE>gettext</CODE> would do, but a shorthand
45 syntax helps keeping the legibility of internationalized programs. For
46 example, in C we use the syntax <CODE>_("string")</CODE>, and in GNU awk we use
47 the shorthand <CODE>_"string"</CODE>.
51 You should arrange that evaluation of such a translatable string at
52 runtime calls the <CODE>gettext</CODE> function, or performs equivalent
57 Similarly, you should make the functions <CODE>ngettext</CODE>,
58 <CODE>dcgettext</CODE>, <CODE>dcngettext</CODE> available from within the language.
59 These functions are less often used, but are nevertheless necessary for
60 particular purposes: <CODE>ngettext</CODE> for correct plural handling, and
61 <CODE>dcgettext</CODE> and <CODE>dcngettext</CODE> for obeying other locale-related
62 environment variables than <CODE>LC_MESSAGES</CODE>, such as <CODE>LC_TIME</CODE> or
63 <CODE>LC_MONETARY</CODE>. For these latter functions, you need to make the
64 <CODE>LC_*</CODE> constants, available in the C header <CODE><locale.h></CODE>,
65 referenceable from within the language, usually either as enumeration
70 You should allow the programmer to designate a message domain, either by
71 making the <CODE>textdomain</CODE> function available from within the
72 language, or by introducing a magic variable called <CODE>TEXTDOMAIN</CODE>.
73 Similarly, you should allow the programmer to designate where to search
74 for message catalogs, by providing access to the <CODE>bindtextdomain</CODE>
79 You should either perform a <CODE>setlocale (LC_ALL, "")</CODE> call during
80 the startup of your language runtime, or allow the programmer to do so.
81 Remember that gettext will act as a no-op if the <CODE>LC_MESSAGES</CODE> and
82 <CODE>LC_CTYPE</CODE> locale categories are not both set.
86 A programmer should have a way to extract translatable strings from a
87 program into a PO file. The GNU <CODE>xgettext</CODE> program is being
88 extended to support very different programming languages. Please
89 contact the GNU <CODE>gettext</CODE> maintainers to help them doing this. If
90 the string extractor is best integrated into your language's parser, GNU
91 <CODE>xgettext</CODE> can function as a front end to your string extractor.
95 The language's library should have a string formatting facility where
96 the arguments of a format string are denoted by a positional number or a
97 name. This is needed because for some languages and some messages with
98 more than one substitutable argument, the translation will need to
99 output the substituted arguments in different order. See section <A HREF="gettext_4.html#SEC22">4.6 Special Comments preceding Keywords</A>.
103 If the language has more than one implementation, and not all of the
104 implementations use <CODE>gettext</CODE>, but the programs should be portable
105 across implementations, you should provide a no-i18n emulation, that
106 makes the other implementations accept programs written for yours,
107 without actually translating the strings.
111 To help the programmer in the task of marking translatable strings,
112 which is sometimes performed using the Emacs PO mode (see section <A HREF="gettext_4.html#SEC21">4.5 Marking Translatable Strings</A>),
114 contact the GNU <CODE>gettext</CODE> maintainers, so they can add support for
115 your language to <TT>‘po-mode.el’</TT>.
119 On the implementation side, three approaches are possible, with
120 different effects on portability and copyright:
127 You may integrate the GNU <CODE>gettext</CODE>'s <TT>‘intl/’</TT> directory in
128 your package, as described in section <A HREF="gettext_13.html#SEC213">13 The Maintainer's View</A>. This allows you to
129 have internationalization on all kinds of platforms. Note that when you
130 then distribute your package, it legally falls under the GNU General
131 Public License, and the GNU project will be glad about your contribution
132 to the Free Software pool.
136 You may link against GNU <CODE>gettext</CODE> functions if they are found in
137 the C library. For example, an autoconf test for <CODE>gettext()</CODE> and
138 <CODE>ngettext()</CODE> will detect this situation. For the moment, this test
139 will succeed on GNU systems and not on other platforms. No severe
140 copyright restrictions apply.
144 You may emulate or reimplement the GNU <CODE>gettext</CODE> functionality.
145 This has the advantage of full portability and no copyright
146 restrictions, but also the drawback that you have to reimplement the GNU
147 <CODE>gettext</CODE> features (such as the <CODE>LANGUAGE</CODE> environment
148 variable, the locale aliases database, the automatic charset conversion,
149 and plural handling).
154 <H2><A NAME="SEC250" HREF="gettext_toc.html#TOC250">15.2 The Programmer's View</A></H2>
157 For the programmer, the general procedure is the same as for the C
158 language. The Emacs PO mode marking supports other languages, and the GNU
159 <CODE>xgettext</CODE> string extractor recognizes other languages based on the
160 file extension or a command-line option. In some languages,
161 <CODE>setlocale</CODE> is not needed because it is already performed by the
162 underlying language runtime.
167 <H2><A NAME="SEC251" HREF="gettext_toc.html#TOC251">15.3 The Translator's View</A></H2>
170 The translator works exactly as in the C language case. The only
171 difference is that when translating format strings, she has to be aware
172 of the language's particular syntax for positional arguments in format
179 <H3><A NAME="SEC252" HREF="gettext_toc.html#TOC252">15.3.1 C Format Strings</A></H3>
182 C format strings are described in POSIX (IEEE P1003.1 2001), section
184 <A HREF="http://www.opengroup.org/onlinepubs/007904975/functions/fprintf.html">http://www.opengroup.org/onlinepubs/007904975/functions/fprintf.html</A>.
185 See also the fprintf() manual page,
186 <A HREF="http://www.linuxvalley.it/encyclopedia/ldp/manpage/man3/printf.3.php">http://www.linuxvalley.it/encyclopedia/ldp/manpage/man3/printf.3.php</A>,
187 <A HREF="http://informatik.fh-wuerzburg.de/student/i510/man/printf.html">http://informatik.fh-wuerzburg.de/student/i510/man/printf.html</A>.
191 Although format strings with positions that reorder arguments, such as
196 "Only %2$d bytes free on '%1$s'."
200 which is semantically equivalent to
205 "'%s' has only %d bytes free."
209 are a POSIX/XSI feature and not specified by ISO C 99, translators can rely
210 on this reordering ability: On the few platforms where <CODE>printf()</CODE>,
211 <CODE>fprintf()</CODE> etc. don't support this feature natively, <TT>‘libintl.a’</TT>
212 or <TT>‘libintl.so’</TT> provides replacement functions, and GNU <CODE><libintl.h></CODE>
213 activates these replacement functions automatically.
217 <A NAME="IDX1219"></A>
218 <A NAME="IDX1220"></A>
219 As a special feature for Farsi (Persian) and maybe Arabic, translators can
220 insert an <SAMP>‘I’</SAMP> flag into numeric format directives. For example, the
221 translation of <CODE>"%d"</CODE> can be <CODE>"%Id"</CODE>. The effect of this flag,
222 on systems with GNU <CODE>libc</CODE>, is that in the output, the ASCII digits are
223 replaced with the <SAMP>‘outdigits’</SAMP> defined in the <CODE>LC_CTYPE</CODE> locale
224 category. On other systems, the <CODE>gettext</CODE> function removes this flag,
225 so that it has no effect.
229 Note that the programmer should <EM>not</EM> put this flag into the
230 untranslated string. (Putting the <SAMP>‘I’</SAMP> format directive flag into an
231 <VAR>msgid</VAR> string would lead to undefined behaviour on platforms without
232 glibc when NLS is disabled.)
237 <H3><A NAME="SEC253" HREF="gettext_toc.html#TOC253">15.3.2 Objective C Format Strings</A></H3>
240 Objective C format strings are like C format strings. They support an
241 additional format directive: "%@", which when executed consumes an argument
242 of type <CODE>Object *</CODE>.
247 <H3><A NAME="SEC254" HREF="gettext_toc.html#TOC254">15.3.3 Shell Format Strings</A></H3>
250 Shell format strings, as supported by GNU gettext and the <SAMP>‘envsubst’</SAMP>
251 program, are strings with references to shell variables in the form
252 <CODE>$<VAR>variable</VAR></CODE> or <CODE>${<VAR>variable</VAR>}</CODE>. References of the form
253 <CODE>${<VAR>variable</VAR>-<VAR>default</VAR>}</CODE>,
254 <CODE>${<VAR>variable</VAR>:-<VAR>default</VAR>}</CODE>,
255 <CODE>${<VAR>variable</VAR>=<VAR>default</VAR>}</CODE>,
256 <CODE>${<VAR>variable</VAR>:=<VAR>default</VAR>}</CODE>,
257 <CODE>${<VAR>variable</VAR>+<VAR>replacement</VAR>}</CODE>,
258 <CODE>${<VAR>variable</VAR>:+<VAR>replacement</VAR>}</CODE>,
259 <CODE>${<VAR>variable</VAR>?<VAR>ignored</VAR>}</CODE>,
260 <CODE>${<VAR>variable</VAR>:?<VAR>ignored</VAR>}</CODE>,
261 that would be valid inside shell scripts, are not supported. The
262 <VAR>variable</VAR> names must consist solely of alphanumeric or underscore
263 ASCII characters, not start with a digit and be nonempty; otherwise such
264 a variable reference is ignored.
269 <H3><A NAME="SEC255" HREF="gettext_toc.html#TOC255">15.3.4 Python Format Strings</A></H3>
272 There are two kinds of format strings in Python: those acceptable to
273 the Python built-in format operator <CODE>%</CODE>, labelled as
274 <SAMP>‘python-format’</SAMP>, and those acceptable to the <CODE>format</CODE> method
275 of the <SAMP>‘str’</SAMP> object.
279 Python <CODE>%</CODE> format strings are described in
280 Python Library reference /
282 5.6. Sequence Types /
283 5.6.2. String Formatting Operations.
284 <A HREF="http://docs.python.org/2/library/stdtypes.html#string-formatting-operations">http://docs.python.org/2/library/stdtypes.html#string-formatting-operations</A>.
288 Python brace format strings are described in PEP 3101 -- Advanced
289 String Formatting, <A HREF="http://www.python.org/dev/peps/pep-3101/">http://www.python.org/dev/peps/pep-3101/</A>.
294 <H3><A NAME="SEC256" HREF="gettext_toc.html#TOC256">15.3.5 Lisp Format Strings</A></H3>
297 Lisp format strings are described in the Common Lisp HyperSpec,
298 chapter 22.3 Formatted Output,
299 <A HREF="http://www.lisp.org/HyperSpec/Body/sec_22-3.html">http://www.lisp.org/HyperSpec/Body/sec_22-3.html</A>.
304 <H3><A NAME="SEC257" HREF="gettext_toc.html#TOC257">15.3.6 Emacs Lisp Format Strings</A></H3>
307 Emacs Lisp format strings are documented in the Emacs Lisp reference,
308 section Formatting Strings,
309 <A HREF="http://www.gnu.org/manual/elisp-manual-21-2.8/html_chapter/elisp_4.html#SEC75">http://www.gnu.org/manual/elisp-manual-21-2.8/html_chapter/elisp_4.html#SEC75</A>.
310 Note that as of version 21, XEmacs supports numbered argument specifications
311 in format strings while FSF Emacs doesn't.
316 <H3><A NAME="SEC258" HREF="gettext_toc.html#TOC258">15.3.7 librep Format Strings</A></H3>
319 librep format strings are documented in the librep manual, section
321 <A HREF="http://librep.sourceforge.net/librep-manual.html#Formatted%20Output">http://librep.sourceforge.net/librep-manual.html#Formatted%20Output</A>,
322 <A HREF="http://www.gwinnup.org/research/docs/librep.html#SEC122">http://www.gwinnup.org/research/docs/librep.html#SEC122</A>.
327 <H3><A NAME="SEC259" HREF="gettext_toc.html#TOC259">15.3.8 Scheme Format Strings</A></H3>
330 Scheme format strings are documented in the SLIB manual, section
331 Format Specification.
336 <H3><A NAME="SEC260" HREF="gettext_toc.html#TOC260">15.3.9 Smalltalk Format Strings</A></H3>
339 Smalltalk format strings are described in the GNU Smalltalk documentation,
340 class <CODE>CharArray</CODE>, methods <SAMP>‘bindWith:’</SAMP> and
341 <SAMP>‘bindWithArguments:’</SAMP>.
342 <A HREF="http://www.gnu.org/software/smalltalk/gst-manual/gst_68.html#SEC238">http://www.gnu.org/software/smalltalk/gst-manual/gst_68.html#SEC238</A>.
343 In summary, a directive starts with <SAMP>‘%’</SAMP> and is followed by <SAMP>‘%’</SAMP>
344 or a nonzero digit (<SAMP>‘1’</SAMP> to <SAMP>‘9’</SAMP>).
349 <H3><A NAME="SEC261" HREF="gettext_toc.html#TOC261">15.3.10 Java Format Strings</A></H3>
352 Java format strings are described in the JDK documentation for class
353 <CODE>java.text.MessageFormat</CODE>,
354 <A HREF="http://java.sun.com/j2se/1.4/docs/api/java/text/MessageFormat.html">http://java.sun.com/j2se/1.4/docs/api/java/text/MessageFormat.html</A>.
355 See also the ICU documentation
356 <A HREF="http://oss.software.ibm.com/icu/apiref/classMessageFormat.html">http://oss.software.ibm.com/icu/apiref/classMessageFormat.html</A>.
361 <H3><A NAME="SEC262" HREF="gettext_toc.html#TOC262">15.3.11 C# Format Strings</A></H3>
364 C# format strings are described in the .NET documentation for class
365 <CODE>System.String</CODE> and in
366 <A HREF="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpConFormattingOverview.asp">http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpConFormattingOverview.asp</A>.
371 <H3><A NAME="SEC263" HREF="gettext_toc.html#TOC263">15.3.12 awk Format Strings</A></H3>
374 awk format strings are described in the gawk documentation, section
376 <A HREF="http://www.gnu.org/manual/gawk/html_node/Printf.html#Printf">http://www.gnu.org/manual/gawk/html_node/Printf.html#Printf</A>.
381 <H3><A NAME="SEC264" HREF="gettext_toc.html#TOC264">15.3.13 Object Pascal Format Strings</A></H3>
384 Object Pascal format strings are described in the documentation of the
385 Free Pascal runtime library, section Format,
386 <A HREF="http://www.freepascal.org/docs-html/rtl/sysutils/format.html">http://www.freepascal.org/docs-html/rtl/sysutils/format.html</A>.
391 <H3><A NAME="SEC265" HREF="gettext_toc.html#TOC265">15.3.14 YCP Format Strings</A></H3>
394 YCP sformat strings are described in the libycp documentation
395 <A HREF="file:/usr/share/doc/packages/libycp/YCP-builtins.html">file:/usr/share/doc/packages/libycp/YCP-builtins.html</A>.
396 In summary, a directive starts with <SAMP>‘%’</SAMP> and is followed by <SAMP>‘%’</SAMP>
397 or a nonzero digit (<SAMP>‘1’</SAMP> to <SAMP>‘9’</SAMP>).
402 <H3><A NAME="SEC266" HREF="gettext_toc.html#TOC266">15.3.15 Tcl Format Strings</A></H3>
405 Tcl format strings are described in the <TT>‘format.n’</TT> manual page,
406 <A HREF="http://www.scriptics.com/man/tcl8.3/TclCmd/format.htm">http://www.scriptics.com/man/tcl8.3/TclCmd/format.htm</A>.
411 <H3><A NAME="SEC267" HREF="gettext_toc.html#TOC267">15.3.16 Perl Format Strings</A></H3>
414 There are two kinds format strings in Perl: those acceptable to the
415 Perl built-in function <CODE>printf</CODE>, labelled as <SAMP>‘perl-format’</SAMP>,
416 and those acceptable to the <CODE>libintl-perl</CODE> function <CODE>__x</CODE>,
417 labelled as <SAMP>‘perl-brace-format’</SAMP>.
421 Perl <CODE>printf</CODE> format strings are described in the <CODE>sprintf</CODE>
422 section of <SAMP>‘man perlfunc’</SAMP>.
426 Perl brace format strings are described in the
427 <TT>‘Locale::TextDomain(3pm)’</TT> manual page of the CPAN package
428 libintl-perl. In brief, Perl format uses placeholders put between
429 braces (<SAMP>‘{’</SAMP> and <SAMP>‘}’</SAMP>). The placeholder must have the syntax
430 of simple identifiers.
435 <H3><A NAME="SEC268" HREF="gettext_toc.html#TOC268">15.3.17 PHP Format Strings</A></H3>
438 PHP format strings are described in the documentation of the PHP function
439 <CODE>sprintf</CODE>, in <TT>‘phpdoc/manual/function.sprintf.html’</TT> or
440 <A HREF="http://www.php.net/manual/en/function.sprintf.php">http://www.php.net/manual/en/function.sprintf.php</A>.
445 <H3><A NAME="SEC269" HREF="gettext_toc.html#TOC269">15.3.18 GCC internal Format Strings</A></H3>
448 These format strings are used inside the GCC sources. In such a format
449 string, a directive starts with <SAMP>‘%’</SAMP>, is optionally followed by a
450 size specifier <SAMP>‘l’</SAMP>, an optional flag <SAMP>‘+’</SAMP>, another optional flag
451 <SAMP>‘#’</SAMP>, and is finished by a specifier: <SAMP>‘%’</SAMP> denotes a literal
452 percent sign, <SAMP>‘c’</SAMP> denotes a character, <SAMP>‘s’</SAMP> denotes a string,
453 <SAMP>‘i’</SAMP> and <SAMP>‘d’</SAMP> denote an integer, <SAMP>‘o’</SAMP>, <SAMP>‘u’</SAMP>, <SAMP>‘x’</SAMP>
454 denote an unsigned integer, <SAMP>‘.*s’</SAMP> denotes a string preceded by a
455 width specification, <SAMP>‘H’</SAMP> denotes a <SAMP>‘location_t *’</SAMP> pointer,
456 <SAMP>‘D’</SAMP> denotes a general declaration, <SAMP>‘F’</SAMP> denotes a function
457 declaration, <SAMP>‘T’</SAMP> denotes a type, <SAMP>‘A’</SAMP> denotes a function argument,
458 <SAMP>‘C’</SAMP> denotes a tree code, <SAMP>‘E’</SAMP> denotes an expression, <SAMP>‘L’</SAMP>
459 denotes a programming language, <SAMP>‘O’</SAMP> denotes a binary operator,
460 <SAMP>‘P’</SAMP> denotes a function parameter, <SAMP>‘Q’</SAMP> denotes an assignment
461 operator, <SAMP>‘V’</SAMP> denotes a const/volatile qualifier.
466 <H3><A NAME="SEC270" HREF="gettext_toc.html#TOC270">15.3.19 GFC internal Format Strings</A></H3>
469 These format strings are used inside the GNU Fortran Compiler sources,
470 that is, the Fortran frontend in the GCC sources. In such a format
471 string, a directive starts with <SAMP>‘%’</SAMP> and is finished by a
472 specifier: <SAMP>‘%’</SAMP> denotes a literal percent sign, <SAMP>‘C’</SAMP> denotes the
473 current source location, <SAMP>‘L’</SAMP> denotes a source location, <SAMP>‘c’</SAMP>
474 denotes a character, <SAMP>‘s’</SAMP> denotes a string, <SAMP>‘i’</SAMP> and <SAMP>‘d’</SAMP>
475 denote an integer, <SAMP>‘u’</SAMP> denotes an unsigned integer. <SAMP>‘i’</SAMP>,
476 <SAMP>‘d’</SAMP>, and <SAMP>‘u’</SAMP> may be preceded by a size specifier <SAMP>‘l’</SAMP>.
481 <H3><A NAME="SEC271" HREF="gettext_toc.html#TOC271">15.3.20 Qt Format Strings</A></H3>
484 Qt format strings are described in the documentation of the QString class
485 <A HREF="file:/usr/lib/qt-4.3.0/doc/html/qstring.html">file:/usr/lib/qt-4.3.0/doc/html/qstring.html</A>.
486 In summary, a directive consists of a <SAMP>‘%’</SAMP> followed by a digit. The same
487 directive cannot occur more than once in a format string.
492 <H3><A NAME="SEC272" HREF="gettext_toc.html#TOC272">15.3.21 Qt Format Strings</A></H3>
495 Qt format strings are described in the documentation of the QObject::tr method
496 <A HREF="file:/usr/lib/qt-4.3.0/doc/html/qobject.html">file:/usr/lib/qt-4.3.0/doc/html/qobject.html</A>.
497 In summary, the only allowed directive is <SAMP>‘%n’</SAMP>.
502 <H3><A NAME="SEC273" HREF="gettext_toc.html#TOC273">15.3.22 KDE Format Strings</A></H3>
505 KDE 4 format strings are defined as follows:
506 A directive consists of a <SAMP>‘%’</SAMP> followed by a non-zero decimal number.
507 If a <SAMP>‘%n’</SAMP> occurs in a format strings, all of <SAMP>‘%1’</SAMP>, ..., <SAMP>‘%(n-1)’</SAMP>
508 must occur as well, except possibly one of them.
513 <H3><A NAME="SEC274" HREF="gettext_toc.html#TOC274">15.3.23 KUIT Format Strings</A></H3>
516 KUIT (KDE User Interface Text) is compatible with KDE 4 format strings,
517 while it also allows programmers to add semantic information to a format
518 string, through XML markup tags. For example, if the first format
519 directive in a string is a filename, programmers could indicate that
520 with a <SAMP>‘filename’</SAMP> tag, like <SAMP>‘<filename>%1</filename>’</SAMP>.
524 KUIT format strings are described in
525 <A HREF="http://api.kde.org/frameworks-api/frameworks5-apidocs/ki18n/html/prg_guide.html#kuit_markup">http://api.kde.org/frameworks-api/frameworks5-apidocs/ki18n/html/prg_guide.html#kuit_markup</A>.
530 <H3><A NAME="SEC275" HREF="gettext_toc.html#TOC275">15.3.24 Boost Format Strings</A></H3>
533 Boost format strings are described in the documentation of the
534 <CODE>boost::format</CODE> class, at
535 <A HREF="http://www.boost.org/libs/format/doc/format.html">http://www.boost.org/libs/format/doc/format.html</A>.
536 In summary, a directive has either the same syntax as in a C format string,
537 such as <SAMP>‘%1$+5d’</SAMP>, or may be surrounded by vertical bars, such as
538 <SAMP>‘%|1$+5d|’</SAMP> or <SAMP>‘%|1$+5|’</SAMP>, or consists of just an argument number
539 between percent signs, such as <SAMP>‘%1%’</SAMP>.
544 <H3><A NAME="SEC276" HREF="gettext_toc.html#TOC276">15.3.25 Lua Format Strings</A></H3>
547 Lua format strings are described in the Lua reference manual, section String Manipulation,
548 <A HREF="http://www.lua.org/manual/5.1/manual.html#pdf-string.format">http://www.lua.org/manual/5.1/manual.html#pdf-string.format</A>.
553 <H3><A NAME="SEC277" HREF="gettext_toc.html#TOC277">15.3.26 JavaScript Format Strings</A></H3>
556 Although JavaScript specification itself does not define any format
557 strings, many JavaScript implementations provide printf-like
558 functions. <CODE>xgettext</CODE> understands a set of common format strings
559 used in popular JavaScript implementations including Gjs, Seed, and
560 Node.JS. In such a format string, a directive starts with <SAMP>‘%’</SAMP>
561 and is finished by a specifier: <SAMP>‘%’</SAMP> denotes a literal percent
562 sign, <SAMP>‘c’</SAMP> denotes a character, <SAMP>‘s’</SAMP> denotes a string,
563 <SAMP>‘b’</SAMP>, <SAMP>‘d’</SAMP>, <SAMP>‘o’</SAMP>, <SAMP>‘x’</SAMP>, <SAMP>‘X’</SAMP> denote an integer,
564 <SAMP>‘f’</SAMP> denotes floating-point number, <SAMP>‘j’</SAMP> denotes a JSON
571 <H2><A NAME="SEC278" HREF="gettext_toc.html#TOC278">15.4 The Maintainer's View</A></H2>
574 For the maintainer, the general procedure differs from the C language
582 For those languages that don't use GNU gettext, the <TT>‘intl/’</TT> directory
583 is not needed and can be omitted. This means that the maintainer calls the
584 <CODE>gettextize</CODE> program without the <SAMP>‘--intl’</SAMP> option, and that he
585 invokes the <CODE>AM_GNU_GETTEXT</CODE> autoconf macro via
586 <SAMP>‘AM_GNU_GETTEXT([external])’</SAMP>.
590 If only a single programming language is used, the <CODE>XGETTEXT_OPTIONS</CODE>
591 variable in <TT>‘po/Makevars’</TT> (see section <A HREF="gettext_13.html#SEC220">13.4.3 <TT>‘Makevars’</TT> in <TT>‘po/’</TT></A>) should be adjusted to
592 match the <CODE>xgettext</CODE> options for that particular programming language.
593 If the package uses more than one programming language with <CODE>gettext</CODE>
594 support, it becomes necessary to change the POT file construction rule
595 in <TT>‘po/Makefile.in.in’</TT>. It is recommended to make one <CODE>xgettext</CODE>
596 invocation per programming language, each with the options appropriate for
597 that language, and to combine the resulting files using <CODE>msgcat</CODE>.
602 <H2><A NAME="SEC279" HREF="gettext_toc.html#TOC279">15.5 Individual Programming Languages</A></H2>
606 <H3><A NAME="SEC280" HREF="gettext_toc.html#TOC280">15.5.1 C, C++, Objective C</A></H3>
608 <A NAME="IDX1221"></A>
615 gcc, gpp, gobjc, glibc, gettext
619 For C: <CODE>c</CODE>, <CODE>h</CODE>.
620 <BR>For C++: <CODE>C</CODE>, <CODE>c++</CODE>, <CODE>cc</CODE>, <CODE>cxx</CODE>, <CODE>cpp</CODE>, <CODE>hpp</CODE>.
621 <BR>For Objective C: <CODE>m</CODE>.
627 <DT>gettext shorthand
629 <CODE>_("abc")</CODE>
631 <DT>gettext/ngettext functions
633 <CODE>gettext</CODE>, <CODE>dgettext</CODE>, <CODE>dcgettext</CODE>, <CODE>ngettext</CODE>,
634 <CODE>dngettext</CODE>, <CODE>dcngettext</CODE>
638 <CODE>textdomain</CODE> function
642 <CODE>bindtextdomain</CODE> function
646 Programmer must call <CODE>setlocale (LC_ALL, "")</CODE>
650 <CODE>#include <libintl.h></CODE>
651 <BR><CODE>#include <locale.h></CODE>
652 <BR><CODE>#define _(string) gettext (string)</CODE>
654 <DT>Use or emulate GNU gettext
660 <CODE>xgettext -k_</CODE>
662 <DT>Formatting with positions
664 <CODE>fprintf "%2$d %1$d"</CODE>
665 <BR>In C++: <CODE>autosprintf "%2$d %1$d"</CODE>
666 (see section ‘Introduction’ in <CITE>GNU autosprintf</CITE>)
670 autoconf (gettext.m4) and #if ENABLE_NLS
678 The following examples are available in the <TT>‘examples’</TT> directory:
679 <CODE>hello-c</CODE>, <CODE>hello-c-gnome</CODE>, <CODE>hello-c++</CODE>, <CODE>hello-c++-qt</CODE>,
680 <CODE>hello-c++-kde</CODE>, <CODE>hello-c++-gnome</CODE>, <CODE>hello-c++-wxwidgets</CODE>,
681 <CODE>hello-objc</CODE>, <CODE>hello-objc-gnustep</CODE>, <CODE>hello-objc-gnome</CODE>.
686 <H3><A NAME="SEC281" HREF="gettext_toc.html#TOC281">15.5.2 sh - Shell Script</A></H3>
688 <A NAME="IDX1222"></A>
703 <CODE>"abc"</CODE>, <CODE>'abc'</CODE>, <CODE>abc</CODE>
705 <DT>gettext shorthand
707 <CODE>"`gettext \"abc\"`"</CODE>
709 <DT>gettext/ngettext functions
711 <A NAME="IDX1223"></A>
712 <A NAME="IDX1224"></A>
713 <CODE>gettext</CODE>, <CODE>ngettext</CODE> programs
714 <BR><CODE>eval_gettext</CODE>, <CODE>eval_ngettext</CODE> shell functions
718 <A NAME="IDX1225"></A>
719 environment variable <CODE>TEXTDOMAIN</CODE>
723 <A NAME="IDX1226"></A>
724 environment variable <CODE>TEXTDOMAINDIR</CODE>
732 <CODE>. gettext.sh</CODE>
734 <DT>Use or emulate GNU gettext
740 <CODE>xgettext</CODE>
742 <DT>Formatting with positions
756 An example is available in the <TT>‘examples’</TT> directory: <CODE>hello-sh</CODE>.
762 <H4><A NAME="SEC282" HREF="gettext_toc.html#TOC282">15.5.2.1 Preparing Shell Scripts for Internationalization</A></H4>
764 <A NAME="IDX1227"></A>
768 Preparing a shell script for internationalization is conceptually similar
769 to the steps described in section <A HREF="gettext_4.html#SEC16">4 Preparing Program Sources</A>. The concrete steps for shell
770 scripts are as follows.
784 near the top of the script. <CODE>gettext.sh</CODE> is a shell function library
785 that provides the functions
786 <CODE>eval_gettext</CODE> (see section <A HREF="gettext_15.html#SEC287">15.5.2.6 Invoking the <CODE>eval_gettext</CODE> function</A>) and
787 <CODE>eval_ngettext</CODE> (see section <A HREF="gettext_15.html#SEC288">15.5.2.7 Invoking the <CODE>eval_ngettext</CODE> function</A>).
788 You have to ensure that <CODE>gettext.sh</CODE> can be found in the <CODE>PATH</CODE>.
792 Set and export the <CODE>TEXTDOMAIN</CODE> and <CODE>TEXTDOMAINDIR</CODE> environment
793 variables. Usually <CODE>TEXTDOMAIN</CODE> is the package or program name, and
794 <CODE>TEXTDOMAINDIR</CODE> is the absolute pathname corresponding to
795 <CODE>$prefix/share/locale</CODE>, where <CODE>$prefix</CODE> is the installation location.
801 TEXTDOMAINDIR=@LOCALEDIR@
807 Prepare the strings for translation, as described in section <A HREF="gettext_4.html#SEC19">4.3 Preparing Translatable Strings</A>.
811 Simplify translatable strings so that they don't contain command substitution
812 (<CODE>"`...`"</CODE> or <CODE>"$(...)"</CODE>), variable access with defaulting (like
813 <CODE>${<VAR>variable</VAR>-<VAR>default</VAR>}</CODE>), access to positional arguments
814 (like <CODE>$0</CODE>, <CODE>$1</CODE>, ...) or highly volatile shell variables (like
815 <CODE>$?</CODE>). This can always be done through simple local code restructuring.
820 echo "Usage: $0 [OPTION] FILE..."
828 echo "Usage: $program_name [OPTION] FILE..."
835 echo "Remaining files: `ls | wc -l`"
842 filecount="`ls | wc -l`"
843 echo "Remaining files: $filecount"
848 For each translatable string, change the output command <SAMP>‘echo’</SAMP> or
849 <SAMP>‘$echo’</SAMP> to <SAMP>‘gettext’</SAMP> (if the string contains no references to
850 shell variables) or to <SAMP>‘eval_gettext’</SAMP> (if it refers to shell variables),
851 followed by a no-argument <SAMP>‘echo’</SAMP> command (to account for the terminating
852 newline). Similarly, for cases with plural handling, replace a conditional
853 <SAMP>‘echo’</SAMP> command with an invocation of <SAMP>‘ngettext’</SAMP> or
854 <SAMP>‘eval_ngettext’</SAMP>, followed by a no-argument <SAMP>‘echo’</SAMP> command.
856 When doing this, you also need to add an extra backslash before the dollar
857 sign in references to shell variables, so that the <SAMP>‘eval_gettext’</SAMP>
858 function receives the translatable string before the variable values are
859 substituted into it. For example,
863 echo "Remaining files: $filecount"
870 eval_gettext "Remaining files: \$filecount"; echo
873 If the output command is not <SAMP>‘echo’</SAMP>, you can make it use <SAMP>‘echo’</SAMP>
874 nevertheless, through the use of backquotes. However, note that inside
875 backquotes, backslashes must be doubled to be effective (because the
876 backquoting eats one level of backslashes). For example, assuming that
877 <SAMP>‘error’</SAMP> is a shell function that signals an error,
881 error "file not found: $filename"
884 is first transformed into
888 error "`echo \"file not found: \$filename\"`"
895 error "`eval_gettext \"file not found: \\\$filename\"`"
902 <H4><A NAME="SEC283" HREF="gettext_toc.html#TOC283">15.5.2.2 Contents of <CODE>gettext.sh</CODE></A></H4>
905 <CODE>gettext.sh</CODE>, contained in the run-time package of GNU gettext, provides
913 The variable <CODE>echo</CODE> is set to a command that outputs its first argument
914 and a newline, without interpreting backslashes in the argument string.
918 See section <A HREF="gettext_15.html#SEC287">15.5.2.6 Invoking the <CODE>eval_gettext</CODE> function</A>.
922 See section <A HREF="gettext_15.html#SEC288">15.5.2.7 Invoking the <CODE>eval_ngettext</CODE> function</A>.
927 <H4><A NAME="SEC284" HREF="gettext_toc.html#TOC284">15.5.2.3 Invoking the <CODE>gettext</CODE> program</A></H4>
930 <A NAME="IDX1228"></A>
931 <A NAME="IDX1229"></A>
934 gettext [<VAR>option</VAR>] [[<VAR>textdomain</VAR>] <VAR>msgid</VAR>]
935 gettext [<VAR>option</VAR>] -s [<VAR>msgid</VAR>]...
939 <A NAME="IDX1230"></A>
940 The <CODE>gettext</CODE> program displays the native language translation of a
945 <STRONG>Arguments</STRONG>
950 <DT><SAMP>‘-d <VAR>textdomain</VAR>’</SAMP>
952 <DT><SAMP>‘--domain=<VAR>textdomain</VAR>’</SAMP>
954 <A NAME="IDX1231"></A>
955 <A NAME="IDX1232"></A>
956 Retrieve translated messages from <VAR>textdomain</VAR>. Usually a <VAR>textdomain</VAR>
957 corresponds to a package, a program, or a module of a program.
959 <DT><SAMP>‘-e’</SAMP>
961 <A NAME="IDX1233"></A>
962 Enable expansion of some escape sequences. This option is for compatibility
963 with the <SAMP>‘echo’</SAMP> program or shell built-in. The escape sequences
964 <SAMP>‘\a’</SAMP>, <SAMP>‘\b’</SAMP>, <SAMP>‘\c’</SAMP>, <SAMP>‘\f’</SAMP>, <SAMP>‘\n’</SAMP>, <SAMP>‘\r’</SAMP>, <SAMP>‘\t’</SAMP>,
965 <SAMP>‘\v’</SAMP>, <SAMP>‘\\’</SAMP>, and <SAMP>‘\’</SAMP> followed by one to three octal digits, are
966 interpreted like the System V <SAMP>‘echo’</SAMP> program did.
968 <DT><SAMP>‘-E’</SAMP>
970 <A NAME="IDX1234"></A>
971 This option is only for compatibility with the <SAMP>‘echo’</SAMP> program or shell
972 built-in. It has no effect.
974 <DT><SAMP>‘-h’</SAMP>
976 <DT><SAMP>‘--help’</SAMP>
978 <A NAME="IDX1235"></A>
979 <A NAME="IDX1236"></A>
980 Display this help and exit.
982 <DT><SAMP>‘-n’</SAMP>
984 <A NAME="IDX1237"></A>
985 Suppress trailing newline. By default, <CODE>gettext</CODE> adds a newline to
988 <DT><SAMP>‘-V’</SAMP>
990 <DT><SAMP>‘--version’</SAMP>
992 <A NAME="IDX1238"></A>
993 <A NAME="IDX1239"></A>
994 Output version information and exit.
996 <DT><SAMP>‘[<VAR>textdomain</VAR>] <VAR>msgid</VAR>’</SAMP>
998 Retrieve translated message corresponding to <VAR>msgid</VAR> from <VAR>textdomain</VAR>.
1003 If the <VAR>textdomain</VAR> parameter is not given, the domain is determined from
1004 the environment variable <CODE>TEXTDOMAIN</CODE>. If the message catalog is not
1005 found in the regular directory, another location can be specified with the
1006 environment variable <CODE>TEXTDOMAINDIR</CODE>.
1010 When used with the <CODE>-s</CODE> option the program behaves like the <SAMP>‘echo’</SAMP>
1011 command. But it does not simply copy its arguments to stdout. Instead those
1012 messages found in the selected catalog are translated.
1016 Note: <CODE>xgettext</CODE> supports only the one-argument form of the
1017 <CODE>gettext</CODE> invocation, where no options are present and the
1018 <VAR>textdomain</VAR> is implicit, from the environment.
1023 <H4><A NAME="SEC285" HREF="gettext_toc.html#TOC285">15.5.2.4 Invoking the <CODE>ngettext</CODE> program</A></H4>
1026 <A NAME="IDX1240"></A>
1027 <A NAME="IDX1241"></A>
1030 ngettext [<VAR>option</VAR>] [<VAR>textdomain</VAR>] <VAR>msgid</VAR> <VAR>msgid-plural</VAR> <VAR>count</VAR>
1034 <A NAME="IDX1242"></A>
1035 The <CODE>ngettext</CODE> program displays the native language translation of a
1036 textual message whose grammatical form depends on a number.
1040 <STRONG>Arguments</STRONG>
1045 <DT><SAMP>‘-d <VAR>textdomain</VAR>’</SAMP>
1047 <DT><SAMP>‘--domain=<VAR>textdomain</VAR>’</SAMP>
1049 <A NAME="IDX1243"></A>
1050 <A NAME="IDX1244"></A>
1051 Retrieve translated messages from <VAR>textdomain</VAR>. Usually a <VAR>textdomain</VAR>
1052 corresponds to a package, a program, or a module of a program.
1054 <DT><SAMP>‘-e’</SAMP>
1056 <A NAME="IDX1245"></A>
1057 Enable expansion of some escape sequences. This option is for compatibility
1058 with the <SAMP>‘gettext’</SAMP> program. The escape sequences
1059 <SAMP>‘\a’</SAMP>, <SAMP>‘\b’</SAMP>, <SAMP>‘\c’</SAMP>, <SAMP>‘\f’</SAMP>, <SAMP>‘\n’</SAMP>, <SAMP>‘\r’</SAMP>, <SAMP>‘\t’</SAMP>,
1060 <SAMP>‘\v’</SAMP>, <SAMP>‘\\’</SAMP>, and <SAMP>‘\’</SAMP> followed by one to three octal digits, are
1061 interpreted like the System V <SAMP>‘echo’</SAMP> program did.
1063 <DT><SAMP>‘-E’</SAMP>
1065 <A NAME="IDX1246"></A>
1066 This option is only for compatibility with the <SAMP>‘gettext’</SAMP> program. It has
1069 <DT><SAMP>‘-h’</SAMP>
1071 <DT><SAMP>‘--help’</SAMP>
1073 <A NAME="IDX1247"></A>
1074 <A NAME="IDX1248"></A>
1075 Display this help and exit.
1077 <DT><SAMP>‘-V’</SAMP>
1079 <DT><SAMP>‘--version’</SAMP>
1081 <A NAME="IDX1249"></A>
1082 <A NAME="IDX1250"></A>
1083 Output version information and exit.
1085 <DT><SAMP>‘<VAR>textdomain</VAR>’</SAMP>
1087 Retrieve translated message from <VAR>textdomain</VAR>.
1089 <DT><SAMP>‘<VAR>msgid</VAR> <VAR>msgid-plural</VAR>’</SAMP>
1091 Translate <VAR>msgid</VAR> (English singular) / <VAR>msgid-plural</VAR> (English plural).
1093 <DT><SAMP>‘<VAR>count</VAR>’</SAMP>
1095 Choose singular/plural form based on this value.
1100 If the <VAR>textdomain</VAR> parameter is not given, the domain is determined from
1101 the environment variable <CODE>TEXTDOMAIN</CODE>. If the message catalog is not
1102 found in the regular directory, another location can be specified with the
1103 environment variable <CODE>TEXTDOMAINDIR</CODE>.
1107 Note: <CODE>xgettext</CODE> supports only the three-arguments form of the
1108 <CODE>ngettext</CODE> invocation, where no options are present and the
1109 <VAR>textdomain</VAR> is implicit, from the environment.
1114 <H4><A NAME="SEC286" HREF="gettext_toc.html#TOC286">15.5.2.5 Invoking the <CODE>envsubst</CODE> program</A></H4>
1117 <A NAME="IDX1251"></A>
1118 <A NAME="IDX1252"></A>
1121 envsubst [<VAR>option</VAR>] [<VAR>shell-format</VAR>]
1125 <A NAME="IDX1253"></A>
1126 <A NAME="IDX1254"></A>
1127 <A NAME="IDX1255"></A>
1128 The <CODE>envsubst</CODE> program substitutes the values of environment variables.
1132 <STRONG>Operation mode</STRONG>
1137 <DT><SAMP>‘-v’</SAMP>
1139 <DT><SAMP>‘--variables’</SAMP>
1141 <A NAME="IDX1256"></A>
1142 <A NAME="IDX1257"></A>
1143 Output the variables occurring in <VAR>shell-format</VAR>.
1148 <STRONG>Informative output</STRONG>
1153 <DT><SAMP>‘-h’</SAMP>
1155 <DT><SAMP>‘--help’</SAMP>
1157 <A NAME="IDX1258"></A>
1158 <A NAME="IDX1259"></A>
1159 Display this help and exit.
1161 <DT><SAMP>‘-V’</SAMP>
1163 <DT><SAMP>‘--version’</SAMP>
1165 <A NAME="IDX1260"></A>
1166 <A NAME="IDX1261"></A>
1167 Output version information and exit.
1172 In normal operation mode, standard input is copied to standard output,
1173 with references to environment variables of the form <CODE>$VARIABLE</CODE> or
1174 <CODE>${VARIABLE}</CODE> being replaced with the corresponding values. If a
1175 <VAR>shell-format</VAR> is given, only those environment variables that are
1176 referenced in <VAR>shell-format</VAR> are substituted; otherwise all environment
1177 variables references occurring in standard input are substituted.
1181 These substitutions are a subset of the substitutions that a shell performs
1182 on unquoted and double-quoted strings. Other kinds of substitutions done
1183 by a shell, such as <CODE>${<VAR>variable</VAR>-<VAR>default</VAR>}</CODE> or
1184 <CODE>$(<VAR>command-list</VAR>)</CODE> or <CODE>`<VAR>command-list</VAR>`</CODE>, are not performed
1185 by the <CODE>envsubst</CODE> program, due to security reasons.
1189 When <CODE>--variables</CODE> is used, standard input is ignored, and the output
1190 consists of the environment variables that are referenced in
1191 <VAR>shell-format</VAR>, one per line.
1196 <H4><A NAME="SEC287" HREF="gettext_toc.html#TOC287">15.5.2.6 Invoking the <CODE>eval_gettext</CODE> function</A></H4>
1199 <A NAME="IDX1262"></A>
1202 eval_gettext <VAR>msgid</VAR>
1206 <A NAME="IDX1263"></A>
1207 This function outputs the native language translation of a textual message,
1208 performing dollar-substitution on the result. Note that only shell variables
1209 mentioned in <VAR>msgid</VAR> will be dollar-substituted in the result.
1214 <H4><A NAME="SEC288" HREF="gettext_toc.html#TOC288">15.5.2.7 Invoking the <CODE>eval_ngettext</CODE> function</A></H4>
1217 <A NAME="IDX1264"></A>
1220 eval_ngettext <VAR>msgid</VAR> <VAR>msgid-plural</VAR> <VAR>count</VAR>
1224 <A NAME="IDX1265"></A>
1225 This function outputs the native language translation of a textual message
1226 whose grammatical form depends on a number, performing dollar-substitution
1227 on the result. Note that only shell variables mentioned in <VAR>msgid</VAR> or
1228 <VAR>msgid-plural</VAR> will be dollar-substituted in the result.
1233 <H3><A NAME="SEC289" HREF="gettext_toc.html#TOC289">15.5.3 bash - Bourne-Again Shell Script</A></H3>
1235 <A NAME="IDX1266"></A>
1239 GNU <CODE>bash</CODE> 2.0 or newer has a special shorthand for translating a
1240 string and substituting variable values in it: <CODE>$"msgid"</CODE>. But
1241 the use of this construct is <STRONG>discouraged</STRONG>, due to the security
1242 holes it opens and due to its portability problems.
1246 The security holes of <CODE>$"..."</CODE> come from the fact that after looking up
1247 the translation of the string, <CODE>bash</CODE> processes it like it processes
1248 any double-quoted string: dollar and backquote processing, like <SAMP>‘eval’</SAMP>
1256 In a locale whose encoding is one of BIG5, BIG5-HKSCS, GBK, GB18030, SHIFT_JIS,
1257 JOHAB, some double-byte characters have a second byte whose value is
1258 <CODE>0x60</CODE>. For example, the byte sequence <CODE>\xe0\x60</CODE> is a single
1259 character in these locales. Many versions of <CODE>bash</CODE> (all versions
1260 up to bash-2.05, and newer versions on platforms without <CODE>mbsrtowcs()</CODE>
1261 function) don't know about character boundaries and see a backquote character
1262 where there is only a particular Chinese character. Thus it can start
1263 executing part of the translation as a command list. This situation can occur
1264 even without the translator being aware of it: if the translator provides
1265 translations in the UTF-8 encoding, it is the <CODE>gettext()</CODE> function which
1266 will, during its conversion from the translator's encoding to the user's
1267 locale's encoding, produce the dangerous <CODE>\x60</CODE> bytes.
1271 A translator could - voluntarily or inadvertently - use backquotes
1272 <CODE>"`...`"</CODE> or dollar-parentheses <CODE>"$(...)"</CODE> in her translations.
1273 The enclosed strings would be executed as command lists by the shell.
1277 The portability problem is that <CODE>bash</CODE> must be built with
1278 internationalization support; this is normally not the case on systems
1279 that don't have the <CODE>gettext()</CODE> function in libc.
1284 <H3><A NAME="SEC290" HREF="gettext_toc.html#TOC290">15.5.4 Python</A></H3>
1286 <A NAME="IDX1267"></A>
1301 <CODE>'abc'</CODE>, <CODE>u'abc'</CODE>, <CODE>r'abc'</CODE>, <CODE>ur'abc'</CODE>,
1302 <BR><CODE>"abc"</CODE>, <CODE>u"abc"</CODE>, <CODE>r"abc"</CODE>, <CODE>ur"abc"</CODE>,
1303 <BR><CODE>”'abc”'</CODE>, <CODE>u”'abc”'</CODE>, <CODE>r”'abc”'</CODE>, <CODE>ur”'abc”'</CODE>,
1304 <BR><CODE>"""abc"""</CODE>, <CODE>u"""abc"""</CODE>, <CODE>r"""abc"""</CODE>, <CODE>ur"""abc"""</CODE>
1306 <DT>gettext shorthand
1308 <CODE>_('abc')</CODE> etc.
1310 <DT>gettext/ngettext functions
1312 <CODE>gettext.gettext</CODE>, <CODE>gettext.dgettext</CODE>,
1313 <CODE>gettext.ngettext</CODE>, <CODE>gettext.dngettext</CODE>,
1314 also <CODE>ugettext</CODE>, <CODE>ungettext</CODE>
1318 <CODE>gettext.textdomain</CODE> function, or
1319 <CODE>gettext.install(<VAR>domain</VAR>)</CODE> function
1323 <CODE>gettext.bindtextdomain</CODE> function, or
1324 <CODE>gettext.install(<VAR>domain</VAR>,<VAR>localedir</VAR>)</CODE> function
1328 not used by the gettext emulation
1332 <CODE>import gettext</CODE>
1334 <DT>Use or emulate GNU gettext
1340 <CODE>xgettext</CODE>
1342 <DT>Formatting with positions
1344 <CODE>'...%(ident)d...' % { 'ident': value }</CODE>
1356 An example is available in the <TT>‘examples’</TT> directory: <CODE>hello-python</CODE>.
1360 A note about format strings: Python supports format strings with unnamed
1361 arguments, such as <CODE>'...%d...'</CODE>, and format strings with named arguments,
1362 such as <CODE>'...%(ident)d...'</CODE>. The latter are preferable for
1363 internationalized programs, for two reasons:
1370 When a format string takes more than one argument, the translator can provide
1371 a translation that uses the arguments in a different order, if the format
1372 string uses named arguments. For example, the translator can reformulate
1375 "'%(volume)s' has only %(freespace)d bytes free."
1381 "Only %(freespace)d bytes free on '%(volume)s'."
1384 Additionally, the identifiers also provide some context to the translator.
1388 In the context of plural forms, the format string used for the singular form
1389 does not use the numeric argument in many languages. Even in English, one
1390 prefers to write <CODE>"one hour"</CODE> instead of <CODE>"1 hour"</CODE>. Omitting
1391 individual arguments from format strings like this is only possible with
1392 the named argument syntax. (With unnamed arguments, Python -- unlike C --
1393 verifies that the format string uses all supplied arguments.)
1398 <H3><A NAME="SEC291" HREF="gettext_toc.html#TOC291">15.5.5 GNU clisp - Common Lisp</A></H3>
1400 <A NAME="IDX1268"></A>
1401 <A NAME="IDX1269"></A>
1402 <A NAME="IDX1270"></A>
1419 <DT>gettext shorthand
1421 <CODE>(_ "abc")</CODE>, <CODE>(ENGLISH "abc")</CODE>
1423 <DT>gettext/ngettext functions
1425 <CODE>i18n:gettext</CODE>, <CODE>i18n:ngettext</CODE>
1429 <CODE>i18n:textdomain</CODE>
1433 <CODE>i18n:textdomaindir</CODE>
1443 <DT>Use or emulate GNU gettext
1449 <CODE>xgettext -k_ -kENGLISH</CODE>
1451 <DT>Formatting with positions
1453 <CODE>format "~1@*~D ~0@*~D"</CODE>
1457 On platforms without gettext, no translation.
1465 An example is available in the <TT>‘examples’</TT> directory: <CODE>hello-clisp</CODE>.
1470 <H3><A NAME="SEC292" HREF="gettext_toc.html#TOC292">15.5.6 GNU clisp C sources</A></H3>
1472 <A NAME="IDX1271"></A>
1489 <DT>gettext shorthand
1491 <CODE>ENGLISH ? "abc" : ""</CODE>
1492 <BR><CODE>GETTEXT("abc")</CODE>
1493 <BR><CODE>GETTEXTL("abc")</CODE>
1495 <DT>gettext/ngettext functions
1497 <CODE>clgettext</CODE>, <CODE>clgettextl</CODE>
1513 <CODE>#include "lispbibl.c"</CODE>
1515 <DT>Use or emulate GNU gettext
1521 <CODE>clisp-xgettext</CODE>
1523 <DT>Formatting with positions
1525 <CODE>fprintf "%2$d %1$d"</CODE>
1529 On platforms without gettext, no translation.
1538 <H3><A NAME="SEC293" HREF="gettext_toc.html#TOC293">15.5.7 Emacs Lisp</A></H3>
1540 <A NAME="IDX1272"></A>
1557 <DT>gettext shorthand
1559 <CODE>(_"abc")</CODE>
1561 <DT>gettext/ngettext functions
1563 <CODE>gettext</CODE>, <CODE>dgettext</CODE> (xemacs only)
1567 <CODE>domain</CODE> special form (xemacs only)
1571 <CODE>bind-text-domain</CODE> function (xemacs only)
1581 <DT>Use or emulate GNU gettext
1587 <CODE>xgettext</CODE>
1589 <DT>Formatting with positions
1591 <CODE>format "%2$d %1$d"</CODE>
1595 Only XEmacs. Without <CODE>I18N3</CODE> defined at build time, no translation.
1604 <H3><A NAME="SEC294" HREF="gettext_toc.html#TOC294">15.5.8 librep</A></H3>
1606 <A NAME="IDX1273"></A>
1613 librep 0.15.3 or newer
1623 <DT>gettext shorthand
1625 <CODE>(_"abc")</CODE>
1627 <DT>gettext/ngettext functions
1629 <CODE>gettext</CODE>
1633 <CODE>textdomain</CODE> function
1637 <CODE>bindtextdomain</CODE> function
1645 <CODE>(require 'rep.i18n.gettext)</CODE>
1647 <DT>Use or emulate GNU gettext
1653 <CODE>xgettext</CODE>
1655 <DT>Formatting with positions
1657 <CODE>format "%2$d %1$d"</CODE>
1661 On platforms without gettext, no translation.
1669 An example is available in the <TT>‘examples’</TT> directory: <CODE>hello-librep</CODE>.
1674 <H3><A NAME="SEC295" HREF="gettext_toc.html#TOC295">15.5.9 GNU guile - Scheme</A></H3>
1676 <A NAME="IDX1274"></A>
1677 <A NAME="IDX1275"></A>
1694 <DT>gettext shorthand
1696 <CODE>(_ "abc")</CODE>, <CODE>_"abc"</CODE> (GIMP script-fu extension)
1698 <DT>gettext/ngettext functions
1700 <CODE>gettext</CODE>, <CODE>ngettext</CODE>
1704 <CODE>textdomain</CODE>
1708 <CODE>bindtextdomain</CODE>
1712 <CODE>(catch #t (lambda () (setlocale LC_ALL "")) (lambda args #f))</CODE>
1716 <CODE>(use-modules (ice-9 format))</CODE>
1718 <DT>Use or emulate GNU gettext
1724 <CODE>xgettext -k_</CODE>
1726 <DT>Formatting with positions
1732 On platforms without gettext, no translation.
1740 An example is available in the <TT>‘examples’</TT> directory: <CODE>hello-guile</CODE>.
1745 <H3><A NAME="SEC296" HREF="gettext_toc.html#TOC296">15.5.10 GNU Smalltalk</A></H3>
1747 <A NAME="IDX1276"></A>
1764 <DT>gettext shorthand
1766 <CODE>NLS ? 'abc'</CODE>
1768 <DT>gettext/ngettext functions
1770 <CODE>LcMessagesDomain>>#at:</CODE>, <CODE>LcMessagesDomain>>#at:plural:with:</CODE>
1774 <CODE>LcMessages>>#domain:localeDirectory:</CODE> (returns a <CODE>LcMessagesDomain</CODE>
1776 Example: <CODE>I18N Locale default messages domain: 'gettext' localeDirectory: /usr/local/share/locale'</CODE>
1780 <CODE>LcMessages>>#domain:localeDirectory:</CODE>, see above.
1784 Automatic if you use <CODE>I18N Locale default</CODE>.
1788 <CODE>PackageLoader fileInPackage: 'I18N'!</CODE>
1790 <DT>Use or emulate GNU gettext
1796 <CODE>xgettext</CODE>
1798 <DT>Formatting with positions
1800 <CODE>'%1 %2' bindWith: 'Hello' with: 'world'</CODE>
1812 An example is available in the <TT>‘examples’</TT> directory:
1813 <CODE>hello-smalltalk</CODE>.
1818 <H3><A NAME="SEC297" HREF="gettext_toc.html#TOC297">15.5.11 Java</A></H3>
1820 <A NAME="IDX1277"></A>
1837 <DT>gettext shorthand
1841 <DT>gettext/ngettext functions
1843 <CODE>GettextResource.gettext</CODE>, <CODE>GettextResource.ngettext</CODE>,
1844 <CODE>GettextResource.pgettext</CODE>, <CODE>GettextResource.npgettext</CODE>
1848 ---, use <CODE>ResourceBundle.getResource</CODE> instead
1852 ---, use CLASSPATH instead
1862 <DT>Use or emulate GNU gettext
1864 ---, uses a Java specific message catalog format
1868 <CODE>xgettext -k_</CODE>
1870 <DT>Formatting with positions
1872 <CODE>MessageFormat.format "{1,number} {0,number}"</CODE>
1884 Before marking strings as internationalizable, uses of the string
1885 concatenation operator need to be converted to <CODE>MessageFormat</CODE>
1886 applications. For example, <CODE>"file "+filename+" not found"</CODE> becomes
1887 <CODE>MessageFormat.format("file {0} not found", new Object[] { filename })</CODE>.
1888 Only after this is done, can the strings be marked and extracted.
1892 GNU gettext uses the native Java internationalization mechanism, namely
1893 <CODE>ResourceBundle</CODE>s. There are two formats of <CODE>ResourceBundle</CODE>s:
1894 <CODE>.properties</CODE> files and <CODE>.class</CODE> files. The <CODE>.properties</CODE>
1895 format is a text file which the translators can directly edit, like PO
1896 files, but which doesn't support plural forms. Whereas the <CODE>.class</CODE>
1897 format is compiled from <CODE>.java</CODE> source code and can support plural
1898 forms (provided it is accessed through an appropriate API, see below).
1902 To convert a PO file to a <CODE>.properties</CODE> file, the <CODE>msgcat</CODE>
1903 program can be used with the option <CODE>--properties-output</CODE>. To convert
1904 a <CODE>.properties</CODE> file back to a PO file, the <CODE>msgcat</CODE> program
1905 can be used with the option <CODE>--properties-input</CODE>. All the tools
1906 that manipulate PO files can work with <CODE>.properties</CODE> files as well,
1907 if given the <CODE>--properties-input</CODE> and/or <CODE>--properties-output</CODE>
1912 To convert a PO file to a ResourceBundle class, the <CODE>msgfmt</CODE> program
1913 can be used with the option <CODE>--java</CODE> or <CODE>--java2</CODE>. To convert a
1914 ResourceBundle back to a PO file, the <CODE>msgunfmt</CODE> program can be used
1915 with the option <CODE>--java</CODE>.
1919 Two different programmatic APIs can be used to access ResourceBundles.
1920 Note that both APIs work with all kinds of ResourceBundles, whether
1921 GNU gettext generated classes, or other <CODE>.class</CODE> or <CODE>.properties</CODE>
1929 The <CODE>java.util.ResourceBundle</CODE> API.
1931 In particular, its <CODE>getString</CODE> function returns a string translation.
1932 Note that a missing translation yields a <CODE>MissingResourceException</CODE>.
1934 This has the advantage of being the standard API. And it does not require
1935 any additional libraries, only the <CODE>msgcat</CODE> generated <CODE>.properties</CODE>
1936 files or the <CODE>msgfmt</CODE> generated <CODE>.class</CODE> files. But it cannot do
1937 plural handling, even if the resource was generated by <CODE>msgfmt</CODE> from
1938 a PO file with plural handling.
1942 The <CODE>gnu.gettext.GettextResource</CODE> API.
1944 Reference documentation in Javadoc 1.1 style format is in the
1945 <A HREF="javadoc2/index.html">javadoc2 directory</A>.
1947 Its <CODE>gettext</CODE> function returns a string translation. Note that when
1948 a translation is missing, the <VAR>msgid</VAR> argument is returned unchanged.
1950 This has the advantage of having the <CODE>ngettext</CODE> function for plural
1951 handling and the <CODE>pgettext</CODE> and <CODE>npgettext</CODE> for strings constraint
1952 to a particular context.
1954 <A NAME="IDX1278"></A>
1955 To use this API, one needs the <CODE>libintl.jar</CODE> file which is part of
1956 the GNU gettext package and distributed under the LGPL.
1960 Four examples, using the second API, are available in the <TT>‘examples’</TT>
1961 directory: <CODE>hello-java</CODE>, <CODE>hello-java-awt</CODE>, <CODE>hello-java-swing</CODE>,
1962 <CODE>hello-java-qtjambi</CODE>.
1966 Now, to make use of the API and define a shorthand for <SAMP>‘getString’</SAMP>,
1967 there are three idioms that you can choose from:
1974 (This one assumes Java 1.5 or newer.)
1975 In a unique class of your project, say <SAMP>‘Util’</SAMP>, define a static variable
1976 holding the <CODE>ResourceBundle</CODE> instance and the shorthand:
1980 private static ResourceBundle myResources =
1981 ResourceBundle.getBundle("domain-name");
1982 public static String _(String s) {
1983 return myResources.getString(s);
1987 All classes containing internationalized strings then contain
1991 import static Util._;
1994 and the shorthand is used like this:
1998 System.out.println(_("Operation completed."));
2003 In a unique class of your project, say <SAMP>‘Util’</SAMP>, define a static variable
2004 holding the <CODE>ResourceBundle</CODE> instance:
2008 public static ResourceBundle myResources =
2009 ResourceBundle.getBundle("domain-name");
2012 All classes containing internationalized strings then contain
2016 private static ResourceBundle res = Util.myResources;
2017 private static String _(String s) { return res.getString(s); }
2020 and the shorthand is used like this:
2024 System.out.println(_("Operation completed."));
2029 You add a class with a very short name, say <SAMP>‘S’</SAMP>, containing just the
2030 definition of the resource bundle and of the shorthand:
2035 public static ResourceBundle myResources =
2036 ResourceBundle.getBundle("domain-name");
2037 public static String _(String s) {
2038 return myResources.getString(s);
2043 and the shorthand is used like this:
2047 System.out.println(S._("Operation completed."));
2053 Which of the three idioms you choose, will depend on whether your project
2054 requires portability to Java versions prior to Java 1.5 and, if so, whether
2055 copying two lines of codes into every class is more acceptable in your project
2056 than a class with a single-letter name.
2061 <H3><A NAME="SEC298" HREF="gettext_toc.html#TOC298">15.5.12 C#</A></H3>
2063 <A NAME="IDX1279"></A>
2070 pnet, pnetlib 0.6.2 or newer, or mono 0.29 or newer
2078 <CODE>"abc"</CODE>, <CODE>@"abc"</CODE>
2080 <DT>gettext shorthand
2084 <DT>gettext/ngettext functions
2086 <CODE>GettextResourceManager.GetString</CODE>,
2087 <CODE>GettextResourceManager.GetPluralString</CODE>
2088 <CODE>GettextResourceManager.GetParticularString</CODE>
2089 <CODE>GettextResourceManager.GetParticularPluralString</CODE>
2093 <CODE>new GettextResourceManager(domain)</CODE>
2097 ---, compiled message catalogs are located in subdirectories of the directory
2098 containing the executable
2108 <DT>Use or emulate GNU gettext
2110 ---, uses a C# specific message catalog format
2114 <CODE>xgettext -k_</CODE>
2116 <DT>Formatting with positions
2118 <CODE>String.Format "{1} {0}"</CODE>
2130 Before marking strings as internationalizable, uses of the string
2131 concatenation operator need to be converted to <CODE>String.Format</CODE>
2132 invocations. For example, <CODE>"file "+filename+" not found"</CODE> becomes
2133 <CODE>String.Format("file {0} not found", filename)</CODE>.
2134 Only after this is done, can the strings be marked and extracted.
2138 GNU gettext uses the native C#/.NET internationalization mechanism, namely
2139 the classes <CODE>ResourceManager</CODE> and <CODE>ResourceSet</CODE>. Applications
2140 use the <CODE>ResourceManager</CODE> methods to retrieve the native language
2141 translation of strings. An instance of <CODE>ResourceSet</CODE> is the in-memory
2142 representation of a message catalog file. The <CODE>ResourceManager</CODE> loads
2143 and accesses <CODE>ResourceSet</CODE> instances as needed to look up the
2148 There are two formats of <CODE>ResourceSet</CODE>s that can be directly loaded by
2149 the C# runtime: <CODE>.resources</CODE> files and <CODE>.dll</CODE> files.
2156 The <CODE>.resources</CODE> format is a binary file usually generated through the
2157 <CODE>resgen</CODE> or <CODE>monoresgen</CODE> utility, but which doesn't support plural
2158 forms. <CODE>.resources</CODE> files can also be embedded in .NET <CODE>.exe</CODE> files.
2159 This only affects whether a file system access is performed to load the message
2160 catalog; it doesn't affect the contents of the message catalog.
2164 On the other hand, the <CODE>.dll</CODE> format is a binary file that is compiled
2165 from <CODE>.cs</CODE> source code and can support plural forms (provided it is
2166 accessed through the GNU gettext API, see below).
2170 Note that these .NET <CODE>.dll</CODE> and <CODE>.exe</CODE> files are not tied to a
2171 particular platform; their file format and GNU gettext for C# can be used
2176 To convert a PO file to a <CODE>.resources</CODE> file, the <CODE>msgfmt</CODE> program
2177 can be used with the option <SAMP>‘--csharp-resources’</SAMP>. To convert a
2178 <CODE>.resources</CODE> file back to a PO file, the <CODE>msgunfmt</CODE> program can be
2179 used with the option <SAMP>‘--csharp-resources’</SAMP>. You can also, in some cases,
2180 use the <CODE>resgen</CODE> program (from the <CODE>pnet</CODE> package) or the
2181 <CODE>monoresgen</CODE> program (from the <CODE>mono</CODE>/<CODE>mcs</CODE> package). These
2182 programs can also convert a <CODE>.resources</CODE> file back to a PO file. But
2183 beware: as of this writing (January 2004), the <CODE>monoresgen</CODE> converter is
2184 quite buggy and the <CODE>resgen</CODE> converter ignores the encoding of the PO
2189 To convert a PO file to a <CODE>.dll</CODE> file, the <CODE>msgfmt</CODE> program can be
2190 used with the option <CODE>--csharp</CODE>. The result will be a <CODE>.dll</CODE> file
2191 containing a subclass of <CODE>GettextResourceSet</CODE>, which itself is a subclass
2192 of <CODE>ResourceSet</CODE>. To convert a <CODE>.dll</CODE> file containing a
2193 <CODE>GettextResourceSet</CODE> subclass back to a PO file, the <CODE>msgunfmt</CODE>
2194 program can be used with the option <CODE>--csharp</CODE>.
2198 The advantages of the <CODE>.dll</CODE> format over the <CODE>.resources</CODE> format
2206 Freedom to localize: Users can add their own translations to an application
2207 after it has been built and distributed. Whereas when the programmer uses
2208 a <CODE>ResourceManager</CODE> constructor provided by the system, the set of
2209 <CODE>.resources</CODE> files for an application must be specified when the
2210 application is built and cannot be extended afterwards.
2214 Plural handling: A message catalog in <CODE>.dll</CODE> format supports the plural
2215 handling function <CODE>GetPluralString</CODE>. Whereas <CODE>.resources</CODE> files can
2216 only contain data and only support lookups that depend on a single string.
2220 Context handling: A message catalog in <CODE>.dll</CODE> format supports the
2221 query-with-context functions <CODE>GetParticularString</CODE> and
2222 <CODE>GetParticularPluralString</CODE>. Whereas <CODE>.resources</CODE> files can
2223 only contain data and only support lookups that depend on a single string.
2227 The <CODE>GettextResourceManager</CODE> that loads the message catalogs in
2228 <CODE>.dll</CODE> format also provides for inheritance on a per-message basis.
2229 For example, in Austrian (<CODE>de_AT</CODE>) locale, translations from the German
2230 (<CODE>de</CODE>) message catalog will be used for messages not found in the
2231 Austrian message catalog. This has the consequence that the Austrian
2232 translators need only translate those few messages for which the translation
2233 into Austrian differs from the German one. Whereas when working with
2234 <CODE>.resources</CODE> files, each message catalog must provide the translations
2235 of all messages by itself.
2239 The <CODE>GettextResourceManager</CODE> that loads the message catalogs in
2240 <CODE>.dll</CODE> format also provides for a fallback: The English <VAR>msgid</VAR> is
2241 returned when no translation can be found. Whereas when working with
2242 <CODE>.resources</CODE> files, a language-neutral <CODE>.resources</CODE> file must
2243 explicitly be provided as a fallback.
2247 On the side of the programmatic APIs, the programmer can use either the
2248 standard <CODE>ResourceManager</CODE> API and the GNU <CODE>GettextResourceManager</CODE>
2249 API. The latter is an extension of the former, because
2250 <CODE>GettextResourceManager</CODE> is a subclass of <CODE>ResourceManager</CODE>.
2257 The <CODE>System.Resources.ResourceManager</CODE> API.
2259 This API works with resources in <CODE>.resources</CODE> format.
2261 The creation of the <CODE>ResourceManager</CODE> is done through
2264 new ResourceManager(domainname, Assembly.GetExecutingAssembly())
2268 The <CODE>GetString</CODE> function returns a string's translation. Note that this
2269 function returns null when a translation is missing (i.e. not even found in
2270 the fallback resource file).
2274 The <CODE>GNU.Gettext.GettextResourceManager</CODE> API.
2276 This API works with resources in <CODE>.dll</CODE> format.
2278 Reference documentation is in the
2279 <A HREF="csharpdoc/index.html">csharpdoc directory</A>.
2281 The creation of the <CODE>ResourceManager</CODE> is done through
2284 new GettextResourceManager(domainname)
2287 The <CODE>GetString</CODE> function returns a string's translation. Note that when
2288 a translation is missing, the <VAR>msgid</VAR> argument is returned unchanged.
2290 The <CODE>GetPluralString</CODE> function returns a string translation with plural
2291 handling, like the <CODE>ngettext</CODE> function in C.
2293 The <CODE>GetParticularString</CODE> function returns a string's translation,
2294 specific to a particular context, like the <CODE>pgettext</CODE> function in C.
2295 Note that when a translation is missing, the <VAR>msgid</VAR> argument is returned
2298 The <CODE>GetParticularPluralString</CODE> function returns a string translation,
2299 specific to a particular context, with plural handling, like the
2300 <CODE>npgettext</CODE> function in C.
2302 <A NAME="IDX1280"></A>
2303 To use this API, one needs the <CODE>GNU.Gettext.dll</CODE> file which is part of
2304 the GNU gettext package and distributed under the LGPL.
2308 You can also mix both approaches: use the
2309 <CODE>GNU.Gettext.GettextResourceManager</CODE> constructor, but otherwise use
2310 only the <CODE>ResourceManager</CODE> type and only the <CODE>GetString</CODE> method.
2311 This is appropriate when you want to profit from the tools for PO files,
2312 but don't want to change an existing source code that uses
2313 <CODE>ResourceManager</CODE> and don't (yet) need the <CODE>GetPluralString</CODE> method.
2317 Two examples, using the second API, are available in the <TT>‘examples’</TT>
2318 directory: <CODE>hello-csharp</CODE>, <CODE>hello-csharp-forms</CODE>.
2322 Now, to make use of the API and define a shorthand for <SAMP>‘GetString’</SAMP>,
2323 there are two idioms that you can choose from:
2330 In a unique class of your project, say <SAMP>‘Util’</SAMP>, define a static variable
2331 holding the <CODE>ResourceManager</CODE> instance:
2335 public static GettextResourceManager MyResourceManager =
2336 new GettextResourceManager("domain-name");
2339 All classes containing internationalized strings then contain
2343 private static GettextResourceManager Res = Util.MyResourceManager;
2344 private static String _(String s) { return Res.GetString(s); }
2347 and the shorthand is used like this:
2351 Console.WriteLine(_("Operation completed."));
2356 You add a class with a very short name, say <SAMP>‘S’</SAMP>, containing just the
2357 definition of the resource manager and of the shorthand:
2362 public static GettextResourceManager MyResourceManager =
2363 new GettextResourceManager("domain-name");
2364 public static String _(String s) {
2365 return MyResourceManager.GetString(s);
2370 and the shorthand is used like this:
2374 Console.WriteLine(S._("Operation completed."));
2380 Which of the two idioms you choose, will depend on whether copying two lines
2381 of codes into every class is more acceptable in your project than a class
2382 with a single-letter name.
2387 <H3><A NAME="SEC299" HREF="gettext_toc.html#TOC299">15.5.13 GNU awk</A></H3>
2389 <A NAME="IDX1281"></A>
2390 <A NAME="IDX1282"></A>
2401 <CODE>awk</CODE>, <CODE>gawk</CODE>, <CODE>twjr</CODE>.
2402 The file extension <CODE>twjr</CODE> is used by TexiWeb Jr
2403 (<A HREF="https://github.com/arnoldrobbins/texiwebjr">https://github.com/arnoldrobbins/texiwebjr</A>).
2409 <DT>gettext shorthand
2413 <DT>gettext/ngettext functions
2415 <CODE>dcgettext</CODE>, missing <CODE>dcngettext</CODE> in gawk-3.1.0
2419 <CODE>TEXTDOMAIN</CODE> variable
2423 <CODE>bindtextdomain</CODE> function
2427 automatic, but missing <CODE>setlocale (LC_MESSAGES, "")</CODE> in gawk-3.1.0
2433 <DT>Use or emulate GNU gettext
2439 <CODE>xgettext</CODE>
2441 <DT>Formatting with positions
2443 <CODE>printf "%2$d %1$d"</CODE> (GNU awk only)
2447 On platforms without gettext, no translation. On non-GNU awks, you must
2448 define <CODE>dcgettext</CODE>, <CODE>dcngettext</CODE> and <CODE>bindtextdomain</CODE>
2457 An example is available in the <TT>‘examples’</TT> directory: <CODE>hello-gawk</CODE>.
2462 <H3><A NAME="SEC300" HREF="gettext_toc.html#TOC300">15.5.14 Pascal - Free Pascal Compiler</A></H3>
2464 <A NAME="IDX1283"></A>
2465 <A NAME="IDX1284"></A>
2466 <A NAME="IDX1285"></A>
2477 <CODE>pp</CODE>, <CODE>pas</CODE>
2483 <DT>gettext shorthand
2487 <DT>gettext/ngettext functions
2489 ---, use <CODE>ResourceString</CODE> data type instead
2493 ---, use <CODE>TranslateResourceStrings</CODE> function instead
2497 ---, use <CODE>TranslateResourceStrings</CODE> function instead
2501 automatic, but uses only LANG, not LC_MESSAGES or LC_ALL
2505 <CODE>{$mode delphi}</CODE> or <CODE>{$mode objfpc}</CODE><BR><CODE>uses gettext;</CODE>
2507 <DT>Use or emulate GNU gettext
2513 <CODE>ppc386</CODE> followed by <CODE>xgettext</CODE> or <CODE>rstconv</CODE>
2515 <DT>Formatting with positions
2517 <CODE>uses sysutils;</CODE><BR><CODE>format "%1:d %0:d"</CODE>
2529 The Pascal compiler has special support for the <CODE>ResourceString</CODE> data
2530 type. It generates a <CODE>.rst</CODE> file. This is then converted to a
2531 <CODE>.pot</CODE> file by use of <CODE>xgettext</CODE> or <CODE>rstconv</CODE>. At runtime,
2532 a <CODE>.mo</CODE> file corresponding to translations of this <CODE>.pot</CODE> file
2533 can be loaded using the <CODE>TranslateResourceStrings</CODE> function in the
2534 <CODE>gettext</CODE> unit.
2538 An example is available in the <TT>‘examples’</TT> directory: <CODE>hello-pascal</CODE>.
2543 <H3><A NAME="SEC301" HREF="gettext_toc.html#TOC301">15.5.15 wxWidgets library</A></H3>
2545 <A NAME="IDX1286"></A>
2562 <DT>gettext shorthand
2564 <CODE>_("abc")</CODE>
2566 <DT>gettext/ngettext functions
2568 <CODE>wxLocale::GetString</CODE>, <CODE>wxGetTranslation</CODE>
2572 <CODE>wxLocale::AddCatalog</CODE>
2576 <CODE>wxLocale::AddCatalogLookupPathPrefix</CODE>
2580 <CODE>wxLocale::Init</CODE>, <CODE>wxSetLocale</CODE>
2584 <CODE>#include <wx/intl.h></CODE>
2586 <DT>Use or emulate GNU gettext
2588 emulate, see <CODE>include/wx/intl.h</CODE> and <CODE>src/common/intl.cpp</CODE>
2592 <CODE>xgettext</CODE>
2594 <DT>Formatting with positions
2596 wxString::Format supports positions if and only if the system has
2597 <CODE>wprintf()</CODE>, <CODE>vswprintf()</CODE> functions and they support positions
2611 <H3><A NAME="SEC302" HREF="gettext_toc.html#TOC302">15.5.16 YCP - YaST2 scripting language</A></H3>
2613 <A NAME="IDX1287"></A>
2614 <A NAME="IDX1288"></A>
2621 libycp, libycp-devel, yast2-core, yast2-core-devel
2631 <DT>gettext shorthand
2633 <CODE>_("abc")</CODE>
2635 <DT>gettext/ngettext functions
2637 <CODE>_()</CODE> with 1 or 3 arguments
2641 <CODE>textdomain</CODE> statement
2655 <DT>Use or emulate GNU gettext
2661 <CODE>xgettext</CODE>
2663 <DT>Formatting with positions
2665 <CODE>sformat "%2 %1"</CODE>
2677 An example is available in the <TT>‘examples’</TT> directory: <CODE>hello-ycp</CODE>.
2682 <H3><A NAME="SEC303" HREF="gettext_toc.html#TOC303">15.5.17 Tcl - Tk's scripting language</A></H3>
2684 <A NAME="IDX1289"></A>
2685 <A NAME="IDX1290"></A>
2702 <DT>gettext shorthand
2704 <CODE>[_ "abc"]</CODE>
2706 <DT>gettext/ngettext functions
2708 <CODE>::msgcat::mc</CODE>
2716 ---, use <CODE>::msgcat::mcload</CODE> instead
2720 automatic, uses LANG, but ignores LC_MESSAGES and LC_ALL
2724 <CODE>package require msgcat</CODE>
2725 <BR><CODE>proc _ {s} {return [::msgcat::mc $s]}</CODE>
2727 <DT>Use or emulate GNU gettext
2729 ---, uses a Tcl specific message catalog format
2733 <CODE>xgettext -k_</CODE>
2735 <DT>Formatting with positions
2737 <CODE>format "%2\$d %1\$d"</CODE>
2749 Two examples are available in the <TT>‘examples’</TT> directory:
2750 <CODE>hello-tcl</CODE>, <CODE>hello-tcl-tk</CODE>.
2754 Before marking strings as internationalizable, substitutions of variables
2755 into the string need to be converted to <CODE>format</CODE> applications. For
2756 example, <CODE>"file $filename not found"</CODE> becomes
2757 <CODE>[format "file %s not found" $filename]</CODE>.
2758 Only after this is done, can the strings be marked and extracted.
2759 After marking, this example becomes
2760 <CODE>[format [_ "file %s not found"] $filename]</CODE> or
2761 <CODE>[msgcat::mc "file %s not found" $filename]</CODE>. Note that the
2762 <CODE>msgcat::mc</CODE> function implicitly calls <CODE>format</CODE> when more than one
2768 <H3><A NAME="SEC304" HREF="gettext_toc.html#TOC304">15.5.18 Perl</A></H3>
2770 <A NAME="IDX1291"></A>
2781 <CODE>pl</CODE>, <CODE>PL</CODE>, <CODE>pm</CODE>, <CODE>perl</CODE>, <CODE>cgi</CODE>
2788 <LI><CODE>"abc"</CODE>
2790 <LI><CODE>'abc'</CODE>
2792 <LI><CODE>qq (abc)</CODE>
2794 <LI><CODE>q (abc)</CODE>
2796 <LI><CODE>qr /abc/</CODE>
2798 <LI><CODE>qx (/bin/date)</CODE>
2800 <LI><CODE>/pattern match/</CODE>
2802 <LI><CODE>?pattern match?</CODE>
2804 <LI><CODE>s/substitution/operators/</CODE>
2806 <LI><CODE>$tied_hash{"message"}</CODE>
2808 <LI><CODE>$tied_hash_reference->{"message"}</CODE>
2810 <LI>etc., issue the command <SAMP>‘man perlsyn’</SAMP> for details
2814 <DT>gettext shorthand
2816 <CODE>__</CODE> (double underscore)
2818 <DT>gettext/ngettext functions
2820 <CODE>gettext</CODE>, <CODE>dgettext</CODE>, <CODE>dcgettext</CODE>, <CODE>ngettext</CODE>,
2821 <CODE>dngettext</CODE>, <CODE>dcngettext</CODE>
2825 <CODE>textdomain</CODE> function
2829 <CODE>bindtextdomain</CODE> function
2831 <DT>bind_textdomain_codeset
2833 <CODE>bind_textdomain_codeset</CODE> function
2837 Use <CODE>setlocale (LC_ALL, "");</CODE>
2841 <CODE>use POSIX;</CODE>
2842 <BR><CODE>use Locale::TextDomain;</CODE> (included in the package libintl-perl
2843 which is available on the Comprehensive Perl Archive Network CPAN,
2844 http://www.cpan.org/).
2846 <DT>Use or emulate GNU gettext
2848 platform dependent: gettext_pp emulates, gettext_xs uses GNU gettext
2852 <CODE>xgettext -k__ -k\$__ -k%__ -k__x -k__n:1,2 -k__nx:1,2 -k__xn:1,2 -kN__ -k</CODE>
2854 <DT>Formatting with positions
2856 Both kinds of format strings support formatting with positions.
2857 <BR><CODE>printf "%2\$d %1\$d", ...</CODE> (requires Perl 5.8.0 or newer)
2858 <BR><CODE>__expand("[new] replaces [old]", old => $oldvalue, new => $newvalue)</CODE>
2862 The <CODE>libintl-perl</CODE> package is platform independent but is not
2863 part of the Perl core. The programmer is responsible for
2864 providing a dummy implementation of the required functions if the
2865 package is not installed on the target system.
2873 Included in <CODE>libintl-perl</CODE>, available on CPAN
2874 (http://www.cpan.org/).
2879 An example is available in the <TT>‘examples’</TT> directory: <CODE>hello-perl</CODE>.
2883 <A NAME="IDX1292"></A>
2887 The <CODE>xgettext</CODE> parser backend for Perl differs significantly from
2888 the parser backends for other programming languages, just as Perl
2889 itself differs significantly from other programming languages. The
2890 Perl parser backend offers many more string marking facilities than
2891 the other backends but it also has some Perl specific limitations, the
2892 worst probably being its imperfectness.
2898 <H4><A NAME="SEC305" HREF="gettext_toc.html#TOC305">15.5.18.1 General Problems Parsing Perl Code</A></H4>
2901 It is often heard that only Perl can parse Perl. This is not true.
2902 Perl cannot be <EM>parsed</EM> at all, it can only be <EM>executed</EM>.
2903 Perl has various built-in ambiguities that can only be resolved at runtime.
2907 The following example may illustrate one common problem:
2912 print gettext "Hello World!";
2916 Although this example looks like a bullet-proof case of a function
2917 invocation, it is not:
2922 open gettext, ">testfile" or die;
2923 print gettext "Hello world!"
2927 In this context, the string <CODE>gettext</CODE> looks more like a
2928 file handle. But not necessarily:
2933 use Locale::Messages qw (:libintl_h);
2934 open gettext ">testfile" or die;
2935 print gettext "Hello world!";
2939 Now, the file is probably syntactically incorrect, provided that the module
2940 <CODE>Locale::Messages</CODE> found first in the Perl include path exports a
2941 function <CODE>gettext</CODE>. But what if the module
2942 <CODE>Locale::Messages</CODE> really looks like this?
2947 use vars qw (*gettext);
2953 In this case, the string <CODE>gettext</CODE> will be interpreted as a file
2954 handle again, and the above example will create a file <TT>‘testfile’</TT>
2955 and write the string “Hello world!” into it. Even advanced
2956 control flow analysis will not really help:
2961 if (0.5 < rand) {
2966 print gettext "Hello world!";
2970 If the module <CODE>Sane</CODE> exports a function <CODE>gettext</CODE> that does
2971 what we expect, and the module <CODE>InSane</CODE> opens a file for writing
2972 and associates the <EM>handle</EM> <CODE>gettext</CODE> with this output
2973 stream, we are clueless again about what will happen at runtime. It is
2974 completely unpredictable. The truth is that Perl has so many ways to
2975 fill its symbol table at runtime that it is impossible to interpret a
2976 particular piece of code without executing it.
2980 Of course, <CODE>xgettext</CODE> will not execute your Perl sources while
2981 scanning for translatable strings, but rather use heuristics in order
2982 to guess what you meant.
2986 Another problem is the ambiguity of the slash and the question mark.
2987 Their interpretation depends on the context:
2993 print "OK\n" if /foobar/;
2998 # Another pattern match.
2999 print "OK\n" if ?foobar?;
3002 print $x ? "foo" : "bar";
3006 The slash may either act as the division operator or introduce a
3007 pattern match, whereas the question mark may act as the ternary
3008 conditional operator or as a pattern match, too. Other programming
3009 languages like <CODE>awk</CODE> present similar problems, but the consequences of a
3010 misinterpretation are particularly nasty with Perl sources. In <CODE>awk</CODE>
3011 for instance, a statement can never exceed one line and the parser
3012 can recover from a parsing error at the next newline and interpret
3013 the rest of the input stream correctly. Perl is different, as a
3014 pattern match is terminated by the next appearance of the delimiter
3015 (the slash or the question mark) in the input stream, regardless of
3016 the semantic context. If a slash is really a division sign but
3017 mis-interpreted as a pattern match, the rest of the input file is most
3018 probably parsed incorrectly.
3022 There are certain cases, where the ambiguity cannot be resolved at all:
3027 $x = wantarray ? 1 : 0;
3031 The Perl built-in function <CODE>wantarray</CODE> does not accept any arguments.
3032 The Perl parser therefore knows that the question mark does not start
3033 a regular expression but is the ternary conditional operator.
3039 $x = wantarrays ? 1 : 0;
3043 Now the situation is different. The function <CODE>wantarrays</CODE> takes
3044 a variable number of arguments (like any non-prototyped Perl function).
3045 The question mark is now the delimiter of a pattern match, and hence
3046 the piece of code does not compile.
3052 $x = wantarrays ? 1 : 0;
3056 Now the function is prototyped, Perl knows that it does not accept any
3057 arguments, and the question mark is therefore interpreted as the
3058 ternaray operator again. But that unfortunately outsmarts <CODE>xgettext</CODE>.
3062 The Perl parser in <CODE>xgettext</CODE> cannot know whether a function has
3063 a prototype and what that prototype would look like. It therefore makes
3064 an educated guess. If a function is known to be a Perl built-in and
3065 this function does not accept any arguments, a following question mark
3066 or slash is treated as an operator, otherwise as the delimiter of a
3067 following regular expression. The Perl built-ins that do not accept
3068 arguments are <CODE>wantarray</CODE>, <CODE>fork</CODE>, <CODE>time</CODE>, <CODE>times</CODE>,
3069 <CODE>getlogin</CODE>, <CODE>getppid</CODE>, <CODE>getpwent</CODE>, <CODE>getgrent</CODE>,
3070 <CODE>gethostent</CODE>, <CODE>getnetent</CODE>, <CODE>getprotoent</CODE>, <CODE>getservent</CODE>,
3071 <CODE>setpwent</CODE>, <CODE>setgrent</CODE>, <CODE>endpwent</CODE>, <CODE>endgrent</CODE>,
3072 <CODE>endhostent</CODE>, <CODE>endnetent</CODE>, <CODE>endprotoent</CODE>, and
3073 <CODE>endservent</CODE>.
3077 If you find that <CODE>xgettext</CODE> fails to extract strings from
3078 portions of your sources, you should therefore look out for slashes
3079 and/or question marks preceding these sections. You may have come
3080 across a bug in <CODE>xgettext</CODE>'s Perl parser (and of course you
3081 should report that bug). In the meantime you should consider to
3082 reformulate your code in a manner less challenging to <CODE>xgettext</CODE>.
3086 In particular, if the parser is too dumb to see that a function
3087 does not accept arguments, use parentheses:
3092 $x = somefunc() ? 1 : 0;
3093 $y = (somefunc) ? 1 : 0;
3097 In fact the Perl parser itself has similar problems and warns you
3098 about such constructs.
3103 <H4><A NAME="SEC306" HREF="gettext_toc.html#TOC306">15.5.18.2 Which keywords will xgettext look for?</A></H4>
3105 <A NAME="IDX1293"></A>
3109 Unless you instruct <CODE>xgettext</CODE> otherwise by invoking it with one
3110 of the options <CODE>--keyword</CODE> or <CODE>-k</CODE>, it will recognize the
3111 following keywords in your Perl sources:
3117 <LI><CODE>gettext</CODE>
3119 <LI><CODE>dgettext</CODE>
3121 <LI><CODE>dcgettext</CODE>
3123 <LI><CODE>ngettext:1,2</CODE>
3125 The first (singular) and the second (plural) argument will be
3128 <LI><CODE>dngettext:1,2</CODE>
3130 The first (singular) and the second (plural) argument will be
3133 <LI><CODE>dcngettext:1,2</CODE>
3135 The first (singular) and the second (plural) argument will be
3138 <LI><CODE>gettext_noop</CODE>
3140 <LI><CODE>%gettext</CODE>
3142 The keys of lookups into the hash <CODE>%gettext</CODE> will be extracted.
3144 <LI><CODE>$gettext</CODE>
3146 The keys of lookups into the hash reference <CODE>$gettext</CODE> will be extracted.
3152 <H4><A NAME="SEC307" HREF="gettext_toc.html#TOC307">15.5.18.3 How to Extract Hash Keys</A></H4>
3154 <A NAME="IDX1294"></A>
3158 Translating messages at runtime is normally performed by looking up the
3159 original string in the translation database and returning the
3160 translated version. The “natural” Perl implementation is a hash
3161 lookup, and, of course, <CODE>xgettext</CODE> supports such practice.
3166 print __"Hello world!";
3167 print $__{"Hello world!"};
3168 print $__->{"Hello world!"};
3169 print $$__{"Hello world!"};
3173 The above four lines all do the same thing. The Perl module
3174 <CODE>Locale::TextDomain</CODE> exports by default a hash <CODE>%__</CODE> that
3175 is tied to the function <CODE>__()</CODE>. It also exports a reference
3176 <CODE>$__</CODE> to <CODE>%__</CODE>.
3180 If an argument to the <CODE>xgettext</CODE> option <CODE>--keyword</CODE>,
3181 resp. <CODE>-k</CODE> starts with a percent sign, the rest of the keyword is
3182 interpreted as the name of a hash. If it starts with a dollar
3183 sign, the rest of the keyword is interpreted as a reference to a
3188 Note that you can omit the quotation marks (single or double) around
3189 the hash key (almost) whenever Perl itself allows it:
3194 print $gettext{Error};
3198 The exact rule is: You can omit the surrounding quotes, when the hash
3199 key is a valid C (!) identifier, i.e. when it starts with an
3200 underscore or an ASCII letter and is followed by an arbitrary number
3201 of underscores, ASCII letters or digits. Other Unicode characters
3202 are <EM>not</EM> allowed, regardless of the <CODE>use utf8</CODE> pragma.
3207 <H4><A NAME="SEC308" HREF="gettext_toc.html#TOC308">15.5.18.4 What are Strings And Quote-like Expressions?</A></H4>
3209 <A NAME="IDX1295"></A>
3213 Perl offers a plethora of different string constructs. Those that can
3214 be used either as arguments to functions or inside braces for hash
3215 lookups are generally supported by <CODE>xgettext</CODE>.
3220 <LI><STRONG>double-quoted strings</STRONG>
3225 print gettext "Hello World!";
3228 <LI><STRONG>single-quoted strings</STRONG>
3233 print gettext 'Hello World!';
3236 <LI><STRONG>the operator qq</STRONG>
3241 print gettext qq |Hello World!|;
3242 print gettext qq <E-mail: <guido\@imperia.net>>;
3245 The operator <CODE>qq</CODE> is fully supported. You can use arbitrary
3246 delimiters, including the four bracketing delimiters (round, angle,
3247 square, curly) that nest.
3249 <LI><STRONG>the operator q</STRONG>
3254 print gettext q |Hello World!|;
3255 print gettext q <E-mail: <guido@imperia.net>>;
3258 The operator <CODE>q</CODE> is fully supported. You can use arbitrary
3259 delimiters, including the four bracketing delimiters (round, angle,
3260 square, curly) that nest.
3262 <LI><STRONG>the operator qx</STRONG>
3267 print gettext qx ;LANGUAGE=C /bin/date;
3268 print gettext qx [/usr/bin/ls | grep '^[A-Z]*'];
3271 The operator <CODE>qx</CODE> is fully supported. You can use arbitrary
3272 delimiters, including the four bracketing delimiters (round, angle,
3273 square, curly) that nest.
3275 The example is actually a useless use of <CODE>gettext</CODE>. It will
3276 invoke the <CODE>gettext</CODE> function on the output of the command
3277 specified with the <CODE>qx</CODE> operator. The feature was included
3278 in order to make the interface consistent (the parser will extract
3279 all strings and quote-like expressions).
3281 <LI><STRONG>here documents</STRONG>
3286 print gettext <<'EOF';
3287 program not found in $PATH
3290 print ngettext <<EOF, <<"EOF";
3293 several files deleted
3297 Here-documents are recognized. If the delimiter is enclosed in single
3298 quotes, the string is not interpolated. If it is enclosed in double
3299 quotes or has no quotes at all, the string is interpolated.
3301 Delimiters that start with a digit are not supported!
3307 <H4><A NAME="SEC309" HREF="gettext_toc.html#TOC309">15.5.18.5 Invalid Uses Of String Interpolation</A></H4>
3309 <A NAME="IDX1296"></A>
3313 Perl is capable of interpolating variables into strings. This offers
3314 some nice features in localized programs but can also lead to
3319 A common error is a construct like the following:
3324 print gettext "This is the program $0!\n";
3328 Perl will interpolate at runtime the value of the variable <CODE>$0</CODE>
3329 into the argument of the <CODE>gettext()</CODE> function. Hence, this
3330 argument is not a string constant but a variable argument (<CODE>$0</CODE>
3331 is a global variable that holds the name of the Perl script being
3332 executed). The interpolation is performed by Perl before the string
3333 argument is passed to <CODE>gettext()</CODE> and will therefore depend on
3334 the name of the script which can only be determined at runtime.
3335 Consequently, it is almost impossible that a translation can be looked
3336 up at runtime (except if, by accident, the interpolated string is found
3337 in the message catalog).
3341 The <CODE>xgettext</CODE> program will therefore terminate parsing with a fatal
3342 error if it encounters a variable inside of an extracted string. In
3343 general, this will happen for all kinds of string interpolations that
3344 cannot be safely performed at compile time. If you absolutely know
3345 what you are doing, you can always circumvent this behavior:
3350 my $know_what_i_am_doing = "This is program $0!\n";
3351 print gettext $know_what_i_am_doing;
3355 Since the parser only recognizes strings and quote-like expressions,
3356 but not variables or other terms, the above construct will be
3357 accepted. You will have to find another way, however, to let your
3358 original string make it into your message catalog.
3362 If invoked with the option <CODE>--extract-all</CODE>, resp. <CODE>-a</CODE>,
3363 variable interpolation will be accepted. Rationale: You will
3364 generally use this option in order to prepare your sources for
3365 internationalization.
3369 Please see the manual page <SAMP>‘man perlop’</SAMP> for details of strings and
3370 quote-like expressions that are subject to interpolation and those
3371 that are not. Safe interpolations (that will not lead to a fatal
3378 <LI>the escape sequences <CODE>\t</CODE> (tab, HT, TAB), <CODE>\n</CODE>
3380 (newline, NL), <CODE>\r</CODE> (return, CR), <CODE>\f</CODE> (form feed, FF),
3381 <CODE>\b</CODE> (backspace, BS), <CODE>\a</CODE> (alarm, bell, BEL), and <CODE>\e</CODE>
3384 <LI>octal chars, like <CODE>\033</CODE>
3387 Note that octal escapes in the range of 400-777 are translated into a
3388 UTF-8 representation, regardless of the presence of the <CODE>use utf8</CODE> pragma.
3390 <LI>hex chars, like <CODE>\x1b</CODE>
3392 <LI>wide hex chars, like <CODE>\x{263a}</CODE>
3395 Note that this escape is translated into a UTF-8 representation,
3396 regardless of the presence of the <CODE>use utf8</CODE> pragma.
3398 <LI>control chars, like <CODE>\c[</CODE> (CTRL-[)
3400 <LI>named Unicode chars, like <CODE>\N{LATIN CAPITAL LETTER C WITH CEDILLA}</CODE>
3403 Note that this escape is translated into a UTF-8 representation,
3404 regardless of the presence of the <CODE>use utf8</CODE> pragma.
3408 The following escapes are considered partially safe:
3414 <LI><CODE>\l</CODE> lowercase next char
3416 <LI><CODE>\u</CODE> uppercase next char
3418 <LI><CODE>\L</CODE> lowercase till \E
3420 <LI><CODE>\U</CODE> uppercase till \E
3422 <LI><CODE>\E</CODE> end case modification
3424 <LI><CODE>\Q</CODE> quote non-word characters till \E
3429 These escapes are only considered safe if the string consists of
3430 ASCII characters only. Translation of characters outside the range
3431 defined by ASCII is locale-dependent and can actually only be performed
3432 at runtime; <CODE>xgettext</CODE> doesn't do these locale-dependent translations
3437 Except for the modifier <CODE>\Q</CODE>, these translations, albeit valid,
3438 are generally useless and only obfuscate your sources. If a
3439 translation can be safely performed at compile time you can just as
3440 well write what you mean.
3445 <H4><A NAME="SEC310" HREF="gettext_toc.html#TOC310">15.5.18.6 Valid Uses Of String Interpolation</A></H4>
3447 <A NAME="IDX1297"></A>
3451 Perl is often used to generate sources for other programming languages
3452 or arbitrary file formats. Web applications that output HTML code
3453 make a prominent example for such usage.
3457 You will often come across situations where you want to intersperse
3458 code written in the target (programming) language with translatable
3459 messages, like in the following HTML example:
3464 print gettext <<EOF;
3465 <h1>My Homepage</h1>
3466 <script language="JavaScript"><!--
3467 for (i = 0; i < 100; ++i) {
3468 alert ("Thank you so much for visiting my homepage!");
3470 //--></script>
3475 The parser will extract the entire here document, and it will appear
3476 entirely in the resulting PO file, including the JavaScript snippet
3477 embedded in the HTML code. If you exaggerate with constructs like
3478 the above, you will run the risk that the translators of your package
3479 will look out for a less challenging project. You should consider an
3480 alternative expression here:
3485 print <<EOF;
3486 <h1>$gettext{"My Homepage"}</h1>
3487 <script language="JavaScript"><!--
3488 for (i = 0; i < 100; ++i) {
3489 alert ("$gettext{'Thank you so much for visiting my homepage!'}");
3491 //--></script>
3496 Only the translatable portions of the code will be extracted here, and
3497 the resulting PO file will begrudgingly improve in terms of readability.
3501 You can interpolate hash lookups in all strings or quote-like
3502 expressions that are subject to interpolation (see the manual page
3503 <SAMP>‘man perlop’</SAMP> for details). Double interpolation is invalid, however:
3508 # TRANSLATORS: Replace "the earth" with the name of your planet.
3509 print gettext qq{Welcome to $gettext->{"the earth"}};
3513 The <CODE>qq</CODE>-quoted string is recognized as an argument to <CODE>xgettext</CODE> in
3514 the first place, and checked for invalid variable interpolation. The
3515 dollar sign of hash-dereferencing will therefore terminate the parser
3516 with an “invalid interpolation” error.
3520 It is valid to interpolate hash lookups in regular expressions:
3525 if ($var =~ /$gettext{"the earth"}/) {
3526 print gettext "Match!\n";
3528 s/$gettext{"U. S. A."}/$gettext{"U. S. A."} $gettext{"(dial +0)"}/g;
3533 <H4><A NAME="SEC311" HREF="gettext_toc.html#TOC311">15.5.18.7 When To Use Parentheses</A></H4>
3535 <A NAME="IDX1298"></A>
3539 In Perl, parentheses around function arguments are mostly optional.
3540 <CODE>xgettext</CODE> will always assume that all
3541 recognized keywords (except for hashes and hash references) are names
3542 of properly prototyped functions, and will (hopefully) only require
3543 parentheses where Perl itself requires them. All constructs in the
3544 following example are therefore ok to use:
3549 print gettext ("Hello World!\n");
3550 print gettext "Hello World!\n";
3551 print dgettext ($package => "Hello World!\n");
3552 print dgettext $package, "Hello World!\n";
3554 # The "fat comma" => turns the left-hand side argument into a
3555 # single-quoted string!
3556 print dgettext smellovision => "Hello World!\n";
3558 # The following assignment only works with prototyped functions.
3559 # Otherwise, the functions will act as "greedy" list operators and
3560 # eat up all following arguments.
3561 my $anonymous_hash = {
3562 planet => gettext "earth",
3563 cakes => ngettext "one cake", "several cakes", $n,
3564 still => $works,
3566 # The same without fat comma:
3568 'planet', gettext "earth",
3569 'cakes', ngettext "one cake", "several cakes", $n,
3573 # Parentheses are only significant for the first argument.
3574 print dngettext 'package', ("one cake", "several cakes", $n), $discarded;
3579 <H4><A NAME="SEC312" HREF="gettext_toc.html#TOC312">15.5.18.8 How To Grok with Long Lines</A></H4>
3581 <A NAME="IDX1299"></A>
3585 The necessity of long messages can often lead to a cumbersome or
3586 unreadable coding style. Perl has several options that may prevent
3587 you from writing unreadable code, and
3588 <CODE>xgettext</CODE> does its best to do likewise. This is where the dot
3589 operator (the string concatenation operator) may come in handy:
3594 print gettext ("This is a very long"
3595 . " message that is still"
3596 . " readable, because"
3597 . " it is split into"
3598 . " multiple lines.\n");
3602 Perl is smart enough to concatenate these constant string fragments
3603 into one long string at compile time, and so is
3604 <CODE>xgettext</CODE>. You will only find one long message in the resulting
3609 Note that the future Perl 6 will probably use the underscore
3610 (<SAMP>‘_’</SAMP>) as the string concatenation operator, and the dot
3611 (<SAMP>‘.’</SAMP>) for dereferencing. This new syntax is not yet supported by
3612 <CODE>xgettext</CODE>.
3616 If embedded newline characters are not an issue, or even desired, you
3617 may also insert newline characters inside quoted strings wherever you
3623 print gettext ("<em>In HTML output
3624 embedded newlines are generally no
3625 problem, since adjacent whitespace
3626 is always rendered into a single
3627 space character.</em>");
3631 You may also consider to use here documents:
3636 print gettext <<EOF;
3637 <em>In HTML output
3638 embedded newlines are generally no
3639 problem, since adjacent whitespace
3640 is always rendered into a single
3641 space character.</em>
3646 Please do not forget that the line breaks are real, i.e. they
3647 translate into newline characters that will consequently show up in
3648 the resulting POT file.
3653 <H4><A NAME="SEC313" HREF="gettext_toc.html#TOC313">15.5.18.9 Bugs, Pitfalls, And Things That Do Not Work</A></H4>
3655 <A NAME="IDX1300"></A>
3659 The foregoing sections should have proven that
3660 <CODE>xgettext</CODE> is quite smart in extracting translatable strings from
3661 Perl sources. Yet, some more or less exotic constructs that could be
3662 expected to work, actually do not work.
3666 One of the more relevant limitations can be found in the
3667 implementation of variable interpolation inside quoted strings. Only
3668 simple hash lookups can be used there:
3673 print <<EOF;
3674 $gettext{"The dot operator"
3677 Likewise, you cannot @{[ gettext ("interpolate function calls") ]}
3678 inside quoted strings or quote-like expressions.
3683 This is valid Perl code and will actually trigger invocations of the
3684 <CODE>gettext</CODE> function at runtime. Yet, the Perl parser in
3685 <CODE>xgettext</CODE> will fail to recognize the strings. A less obvious
3686 example can be found in the interpolation of regular expressions:
3691 s/<!--START_OF_WEEK-->/gettext ("Sunday")/e;
3695 The modifier <CODE>e</CODE> will cause the substitution to be interpreted as
3696 an evaluable statement. Consequently, at runtime the function
3697 <CODE>gettext()</CODE> is called, but again, the parser fails to extract the
3698 string “Sunday”. Use a temporary variable as a simple workaround if
3699 you really happen to need this feature:
3704 my $sunday = gettext "Sunday";
3705 s/<!--START_OF_WEEK-->/$sunday/;
3709 Hash slices would also be handy but are not recognized:
3714 my @weekdays = @gettext{'Sunday', 'Monday', 'Tuesday', 'Wednesday',
3715 'Thursday', 'Friday', 'Saturday'};
3717 @weekdays = @gettext{qw (Sunday Monday Tuesday Wednesday Thursday
3722 This is perfectly valid usage of the tied hash <CODE>%gettext</CODE> but the
3723 strings are not recognized and therefore will not be extracted.
3727 Another caveat of the current version is its rudimentary support for
3728 non-ASCII characters in identifiers. You may encounter serious
3729 problems if you use identifiers with characters outside the range of
3730 'A'-'Z', 'a'-'z', '0'-'9' and the underscore '_'.
3734 Maybe some of these missing features will be implemented in future
3735 versions, but since you can always make do without them at minimal effort,
3736 these todos have very low priority.
3740 A nasty problem are brace format strings that already contain braces
3741 as part of the normal text, for example the usage strings typically
3742 encountered in programs:
3747 die "usage: $0 {OPTIONS} FILENAME...\n";
3751 If you want to internationalize this code with Perl brace format strings,
3752 you will run into a problem:
3757 die __x ("usage: {program} {OPTIONS} FILENAME...\n", program => $0);
3761 Whereas <SAMP>‘{program}’</SAMP> is a placeholder, <SAMP>‘{OPTIONS}’</SAMP>
3762 is not and should probably be translated. Yet, there is no way to teach
3763 the Perl parser in <CODE>xgettext</CODE> to recognize the first one, and leave
3764 the other one alone.
3768 There are two possible work-arounds for this problem. If you are
3769 sure that your program will run under Perl 5.8.0 or newer (these
3770 Perl versions handle positional parameters in <CODE>printf()</CODE>) or
3771 if you are sure that the translator will not have to reorder the arguments
3772 in her translation -- for example if you have only one brace placeholder
3773 in your string, or if it describes a syntax, like in this one --, you can
3774 mark the string as <CODE>no-perl-brace-format</CODE> and use <CODE>printf()</CODE>:
3779 # xgettext: no-perl-brace-format
3780 die sprintf ("usage: %s {OPTIONS} FILENAME...\n", $0);
3784 If you want to use the more portable Perl brace format, you will have to do
3785 put placeholders in place of the literal braces:
3790 die __x ("usage: {program} {[}OPTIONS{]} FILENAME...\n",
3791 program => $0, '[' => '{', ']' => '}');
3795 Perl brace format strings know no escaping mechanism. No matter how this
3796 escaping mechanism looked like, it would either give the programmer a
3797 hard time, make translating Perl brace format strings heavy-going, or
3798 result in a performance penalty at runtime, when the format directives
3799 get executed. Most of the time you will happily get along with
3800 <CODE>printf()</CODE> for this special case.
3805 <H3><A NAME="SEC314" HREF="gettext_toc.html#TOC314">15.5.19 PHP Hypertext Preprocessor</A></H3>
3807 <A NAME="IDX1301"></A>
3814 mod_php4, mod_php4-core, phpdoc
3818 <CODE>php</CODE>, <CODE>php3</CODE>, <CODE>php4</CODE>
3822 <CODE>"abc"</CODE>, <CODE>'abc'</CODE>
3824 <DT>gettext shorthand
3826 <CODE>_("abc")</CODE>
3828 <DT>gettext/ngettext functions
3830 <CODE>gettext</CODE>, <CODE>dgettext</CODE>, <CODE>dcgettext</CODE>; starting with PHP 4.2.0
3831 also <CODE>ngettext</CODE>, <CODE>dngettext</CODE>, <CODE>dcngettext</CODE>
3835 <CODE>textdomain</CODE> function
3839 <CODE>bindtextdomain</CODE> function
3843 Programmer must call <CODE>setlocale (LC_ALL, "")</CODE>
3849 <DT>Use or emulate GNU gettext
3855 <CODE>xgettext</CODE>
3857 <DT>Formatting with positions
3859 <CODE>printf "%2\$d %1\$d"</CODE>
3863 On platforms without gettext, the functions are not available.
3871 An example is available in the <TT>‘examples’</TT> directory: <CODE>hello-php</CODE>.
3876 <H3><A NAME="SEC315" HREF="gettext_toc.html#TOC315">15.5.20 Pike</A></H3>
3878 <A NAME="IDX1302"></A>
3895 <DT>gettext shorthand
3899 <DT>gettext/ngettext functions
3901 <CODE>gettext</CODE>, <CODE>dgettext</CODE>, <CODE>dcgettext</CODE>
3905 <CODE>textdomain</CODE> function
3909 <CODE>bindtextdomain</CODE> function
3913 <CODE>setlocale</CODE> function
3917 <CODE>import Locale.Gettext;</CODE>
3919 <DT>Use or emulate GNU gettext
3927 <DT>Formatting with positions
3933 On platforms without gettext, the functions are not available.
3942 <H3><A NAME="SEC316" HREF="gettext_toc.html#TOC316">15.5.21 GNU Compiler Collection sources</A></H3>
3944 <A NAME="IDX1303"></A>
3955 <CODE>c</CODE>, <CODE>h</CODE>.
3961 <DT>gettext shorthand
3963 <CODE>_("abc")</CODE>
3965 <DT>gettext/ngettext functions
3967 <CODE>gettext</CODE>, <CODE>dgettext</CODE>, <CODE>dcgettext</CODE>, <CODE>ngettext</CODE>,
3968 <CODE>dngettext</CODE>, <CODE>dcngettext</CODE>
3972 <CODE>textdomain</CODE> function
3976 <CODE>bindtextdomain</CODE> function
3980 Programmer must call <CODE>setlocale (LC_ALL, "")</CODE>
3984 <CODE>#include "intl.h"</CODE>
3986 <DT>Use or emulate GNU gettext
3992 <CODE>xgettext -k_</CODE>
3994 <DT>Formatting with positions
4000 Uses autoconf macros
4009 <H3><A NAME="SEC317" HREF="gettext_toc.html#TOC317">15.5.22 Lua</A></H3>
4026 <LI><CODE>"abc"</CODE>
4028 <LI><CODE>'abc'</CODE>
4030 <LI><CODE>[[abc]]</CODE>
4032 <LI><CODE>[=[abc]=]</CODE>
4034 <LI><CODE>[==[abc]==]</CODE>
4040 <DT>gettext shorthand
4042 <CODE>_("abc")</CODE>
4044 <DT>gettext/ngettext functions
4046 <CODE>gettext.gettext</CODE>, <CODE>gettext.dgettext</CODE>, <CODE>gettext.dcgettext</CODE>,
4047 <CODE>gettext.ngettext</CODE>, <CODE>gettext.dngettext</CODE>, <CODE>gettext.dcngettext</CODE>
4051 <CODE>textdomain</CODE> function
4055 <CODE>bindtextdomain</CODE> function
4063 <CODE>require 'gettext'</CODE> or running lua interpreter with <CODE>-l gettext</CODE> option
4065 <DT>Use or emulate GNU gettext
4071 <CODE>xgettext</CODE>
4073 <DT>Formatting with positions
4079 On platforms without gettext, the functions are not available.
4088 <H3><A NAME="SEC318" HREF="gettext_toc.html#TOC318">15.5.23 JavaScript</A></H3>
4105 <LI><CODE>"abc"</CODE>
4107 <LI><CODE>'abc'</CODE>
4111 <DT>gettext shorthand
4113 <CODE>_("abc")</CODE>
4115 <DT>gettext/ngettext functions
4117 <CODE>gettext</CODE>, <CODE>dgettext</CODE>, <CODE>dcgettext</CODE>, <CODE>ngettext</CODE>,
4118 <CODE>dngettext</CODE>
4122 <CODE>textdomain</CODE> function
4126 <CODE>bindtextdomain</CODE> function
4136 <DT>Use or emulate GNU gettext
4142 <CODE>xgettext</CODE>
4144 <DT>Formatting with positions
4150 On platforms without gettext, the functions are not available.
4159 <H3><A NAME="SEC319" HREF="gettext_toc.html#TOC319">15.5.24 Vala</A></H3>
4176 <LI><CODE>"abc"</CODE>
4178 <LI><CODE>"""abc"""</CODE>
4182 <DT>gettext shorthand
4184 <CODE>_("abc")</CODE>
4186 <DT>gettext/ngettext functions
4188 <CODE>gettext</CODE>, <CODE>dgettext</CODE>, <CODE>dcgettext</CODE>, <CODE>ngettext</CODE>,
4189 <CODE>dngettext</CODE>, <CODE>dpgettext</CODE>, <CODE>dpgettext2</CODE>
4193 <CODE>textdomain</CODE> function, defined under the <CODE>Intl</CODE> namespace
4197 <CODE>bindtextdomain</CODE> function, defined under the <CODE>Intl</CODE> namespace
4201 Programmer must call <CODE>Intl.setlocale (LocaleCategory.ALL, "")</CODE>
4207 <DT>Use or emulate GNU gettext
4213 <CODE>xgettext</CODE>
4215 <DT>Formatting with positions
4217 Same as for the C language.
4221 autoconf (gettext.m4) and #if ENABLE_NLS
4230 <H2><A NAME="SEC320" HREF="gettext_toc.html#TOC320">15.6 Internationalizable Data</A></H2>
4233 Here is a list of other data formats which can be internationalized
4240 <H3><A NAME="SEC321" HREF="gettext_toc.html#TOC321">15.6.1 POT - Portable Object Template</A></H3>
4250 <CODE>pot</CODE>, <CODE>po</CODE>
4254 <CODE>xgettext</CODE>
4259 <H3><A NAME="SEC322" HREF="gettext_toc.html#TOC322">15.6.2 Resource String Table</A></H3>
4261 <A NAME="IDX1304"></A>
4276 <CODE>xgettext</CODE>, <CODE>rstconv</CODE>
4281 <H3><A NAME="SEC323" HREF="gettext_toc.html#TOC323">15.6.3 Glade - GNOME user interface description</A></H3>
4287 glade, libglade, glade2, libglade2, intltool
4291 <CODE>glade</CODE>, <CODE>glade2</CODE>, <CODE>ui</CODE>
4295 <CODE>xgettext</CODE>, <CODE>libglade-xgettext</CODE>, <CODE>xml-i18n-extract</CODE>, <CODE>intltool-extract</CODE>
4300 <H3><A NAME="SEC324" HREF="gettext_toc.html#TOC324">15.6.4 GSettings - GNOME user configuration schema</A></H3>
4310 <CODE>gschema.xml</CODE>
4314 <CODE>xgettext</CODE>, <CODE>intltool-extract</CODE>
4319 <H3><A NAME="SEC325" HREF="gettext_toc.html#TOC325">15.6.5 AppData - freedesktop.org application description</A></H3>
4325 appdata-tools, appstream, libappstream-glib, libappstream-glib-builder
4329 <CODE>appdata.xml</CODE>
4333 <CODE>xgettext</CODE>, <CODE>intltool-extract</CODE>, <CODE>itstool</CODE>
4338 <H3><A NAME="SEC326" HREF="gettext_toc.html#TOC326">15.6.6 Preparing Rules for XML Internationalization</A></H3>
4340 <A NAME="IDX1305"></A>
4344 Marking translatable strings in an XML file is done through a separate
4345 "rule" file, making use of the Internationalization Tag Set standard
4346 (ITS, <A HREF="http://www.w3.org/TR/its20/">http://www.w3.org/TR/its20/</A>). The currently supported ITS
4347 data categories are: <SAMP>‘Translate’</SAMP>, <SAMP>‘Localization Note’</SAMP>,
4348 <SAMP>‘Elements Within Text’</SAMP>, and <SAMP>‘Preserve Space’</SAMP>. In addition to
4349 them, <CODE>xgettext</CODE> also recognizes the following extended data
4355 <DT><SAMP>‘Context’</SAMP>
4357 This data category associates <CODE>msgctxt</CODE> to the extracted text. In
4358 the global rule, the <CODE>contextRule</CODE> element contains the following:
4364 A required <CODE>selector</CODE> attribute. It contains an absolute selector
4365 that selects the nodes to which this rule applies.
4369 A required <CODE>contextPointer</CODE> attribute that contains a relative
4370 selector pointing to a node that holds the <CODE>msgctxt</CODE> value.
4374 An optional <CODE>textPointer</CODE> attribute that contains a relative
4375 selector pointing to a node that holds the <CODE>msgid</CODE> value.
4378 <DT><SAMP>‘Escape Special Characters’</SAMP>
4380 This data category indicates whether the special XML characters
4381 (<CODE><</CODE>, <CODE>></CODE>, <CODE>&</CODE>, <CODE>"</CODE>) are escaped with entity
4382 reference. In the global rule, the <CODE>escapeRule</CODE> element contains
4389 A required <CODE>selector</CODE> attribute. It contains an absolute selector
4390 that selects the nodes to which this rule applies.
4394 A required <CODE>escape</CODE> attribute with the value <CODE>yes</CODE> or <CODE>no</CODE>.
4397 <DT><SAMP>‘Extended Preserve Space’</SAMP>
4399 This data category extends the standard <SAMP>‘Preserve Space’</SAMP> data
4400 category with the additional value <SAMP>‘trim’</SAMP>. The value means to
4401 remove the leading and trailing whitespaces of the content, but not to
4402 normalize whitespaces in the middle. In the global rule, the
4403 <CODE>preserveSpaceRule</CODE> element contains the following:
4409 A required <CODE>selector</CODE> attribute. It contains an absolute selector
4410 that selects the nodes to which this rule applies.
4414 A required <CODE>space</CODE> attribute with the value <CODE>default</CODE>,
4415 <CODE>preserve</CODE>, or <CODE>trim</CODE>.
4421 All those extended data categories can only be expressed with global
4422 rules, and the rule elements have to have the
4423 <CODE>https://www.gnu.org/s/gettext/ns/its/extensions/1.0</CODE> namespace.
4427 Given the following XML document in a file <TT>‘messages.xml’</TT>:
4432 <?xml version="1.0"?>
4435 <p>A translatable string</p>
4438 <p translatable="no">A non-translatable string</p>
4444 To extract the first text content ("A translatable string"), but not the
4445 second ("A non-translatable string"), the following ITS rules can be used:
4450 <?xml version="1.0"?>
4451 <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0">
4452 <its:translateRule selector="/messages" translate="no"/>
4453 <its:translateRule selector="//message/p" translate="yes"/>
4455 <!-- If 'p' has an attribute 'translatable' with the value 'no', then
4456 the content is not translatable. -->
4457 <its:translateRule selector="//message/p[@translatable = 'no']"
4458 translate="no"/>
4459 </its:rules>
4463 <SAMP>‘xgettext’</SAMP> needs another file called "locating rule" to associate
4464 an ITS rule with an XML file. If the above ITS file is saved as
4465 <TT>‘messages.its’</TT>, the locating rule would look like:
4470 <?xml version="1.0"?>
4471 <locatingRules>
4472 <locatingRule name="Messages" pattern="*.xml">
4473 <documentRule localName="messages" target="messages.its"/>
4474 </locatingRule>
4475 <locatingRule name="Messages" pattern="*.msg" target="messages.its"/>
4476 </locatingRules>
4480 The <CODE>locatingRule</CODE> element must have a <CODE>pattern</CODE> attribute,
4481 which denotes either a literal file name or a wildcard pattern of the
4482 XML file. The <CODE>locatingRule</CODE> element can have child
4483 <CODE>documentRule</CODE> element, which adds checks on the content of the XML
4488 The first rule matches any file with the <TT>‘.xml’</TT> file extension, but
4489 it only applies to XML files whose root element is <SAMP>‘<messages>’</SAMP>.
4493 The second rule indicates that the same ITS rule file are also
4494 applicable to any file with the <TT>‘.msg’</TT> file extension. The
4495 optional <CODE>name</CODE> attribute of <CODE>locatingRule</CODE> allows to choose
4496 rules by name, typically with <CODE>xgettext</CODE>'s <CODE>-L</CODE> option.
4500 The associated ITS rule file is indicated by the <CODE>target</CODE> attribute
4501 of <CODE>locatingRule</CODE> or <CODE>documentRule</CODE>. If it is specified in a
4502 <CODE>documentRule</CODE> element, the parent <CODE>locatingRule</CODE> shouldn't
4503 have the <CODE>target</CODE> attribute.
4507 Locating rule files must have the <TT>‘.loc’</TT> file extension. Both ITS
4508 rule files and locating rule files must be installed in the
4509 <TT>‘$prefix/share/gettext/its’</TT> directory. Once those files are
4510 properly installed, <CODE>xgettext</CODE> can extract translatable strings
4511 from the matching XML files.
4516 <H4><A NAME="SEC327" HREF="gettext_toc.html#TOC327">15.6.6.1 Two Use-cases of Translated Strings in XML</A></H4>
4519 For XML, there are two use-cases of translated strings. One is the case
4520 where the translated strings are directly consumed by programs, and the
4521 other is the case where the translated strings are merged back to the
4522 original XML document. In the former case, special characters in the
4523 extracted strings shouldn't be escaped, while they should in the latter
4524 case. To control wheter to escape special characters, the <SAMP>‘Escape
4525 Special Characters’</SAMP> data category can be used.
4529 To merge the translations, the <SAMP>‘msgfmt’</SAMP> program can be used with
4530 the option <CODE>--xml</CODE>. See section <A HREF="gettext_10.html#SEC157">10.1 Invoking the <CODE>msgfmt</CODE> Program</A>, for more details
4531 about how one calls the <SAMP>‘msgfmt’</SAMP> program. <SAMP>‘msgfmt’</SAMP>'s
4532 <CODE>--xml</CODE> option doesn't perform character escaping, so translated
4533 strings can have arbitrary XML constructs, such as elements for markup.
4538 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_14.html">previous</A>, <A HREF="gettext_16.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.