1 \input texinfo @c -*-texinfo-*-
3 @setfilename gettext.info
4 @c The @ifset makeinfo ... @end ifset conditional evaluates to true in makeinfo
5 @c for info and html output, but to false in texi2html.
11 @c The @documentencoding is needed for makeinfo; texi2html 1.52
12 @c doesn't recognize it.
14 @documentencoding UTF-8
16 @settitle GNU @code{gettext} utilities
19 @c am = autoconf macro @amindex
20 @c cp = concept @cindex
21 @c ef = emacs function @efindex
22 @c em = emacs mode @emindex
23 @c ev = emacs variable @evindex
24 @c fn = function @findex
25 @c kw = keyword @kwindex
26 @c op = option @opindex
27 @c pg = program @pindex
28 @c vr = variable @vindex
29 @c Unused predefined indices:
31 @c ky = keystroke @kindex
43 @firstparagraphindent insert
50 @dircategory GNU Gettext Utilities
52 * gettext: (gettext). GNU gettext utilities.
53 * autopoint: (gettext)autopoint Invocation. Copy gettext infrastructure.
54 * envsubst: (gettext)envsubst Invocation. Expand environment variables.
55 * gettextize: (gettext)gettextize Invocation. Prepare a package for gettext.
56 * msgattrib: (gettext)msgattrib Invocation. Select part of a PO file.
57 * msgcat: (gettext)msgcat Invocation. Combine several PO files.
58 * msgcmp: (gettext)msgcmp Invocation. Compare a PO file and template.
59 * msgcomm: (gettext)msgcomm Invocation. Match two PO files.
60 * msgconv: (gettext)msgconv Invocation. Convert PO file to encoding.
61 * msgen: (gettext)msgen Invocation. Create an English PO file.
62 * msgexec: (gettext)msgexec Invocation. Process a PO file.
63 * msgfilter: (gettext)msgfilter Invocation. Pipe a PO file through a filter.
64 * msgfmt: (gettext)msgfmt Invocation. Make MO files out of PO files.
65 * msggrep: (gettext)msggrep Invocation. Select part of a PO file.
66 * msginit: (gettext)msginit Invocation. Create a fresh PO file.
67 * msgmerge: (gettext)msgmerge Invocation. Update a PO file from template.
68 * msgunfmt: (gettext)msgunfmt Invocation. Uncompile MO file into PO file.
69 * msguniq: (gettext)msguniq Invocation. Unify duplicates for PO file.
70 * ngettext: (gettext)ngettext Invocation. Translate a message with plural.
71 * xgettext: (gettext)xgettext Invocation. Extract strings into a PO file.
72 * ISO639: (gettext)Language Codes. ISO 639 language codes.
73 * ISO3166: (gettext)Country Codes. ISO 3166 country codes.
78 This file provides documentation for GNU @code{gettext} utilities.
79 It also serves as a reference for the free Translation Project.
82 Copyright (C) 1995-1998, 2001-2012 Free Software Foundation, Inc.
84 This manual is free documentation. It is dually licensed under the
85 GNU FDL and the GNU GPL. This means that you can redistribute this
86 manual under either of these two licenses, at your choice.
88 This manual is covered by the GNU FDL. Permission is granted to copy,
89 distribute and/or modify this document under the terms of the
90 GNU Free Documentation License (FDL), either version 1.2 of the
91 License, or (at your option) any later version published by the
92 Free Software Foundation (FSF); with no Invariant Sections, with no
93 Front-Cover Text, and with no Back-Cover Texts.
94 A copy of the license is included in @ref{GNU FDL}.
96 This manual is covered by the GNU GPL. You can redistribute it and/or
97 modify it under the terms of the GNU General Public License (GPL), either
98 version 2 of the License, or (at your option) any later version published
99 by the Free Software Foundation (FSF).
100 A copy of the license is included in @ref{GNU GPL}.
105 @title GNU gettext tools, version @value{VERSION}
106 @subtitle Native Language Support Library and Tools
107 @subtitle Edition @value{EDITION}, @value{UPDATED}
108 @author Ulrich Drepper
110 @author Fran@,{c}ois Pinard
115 @vskip 0pt plus 1filll
117 Copyright (C) 1995-1998, 2001-2012 Free Software Foundation, Inc.
119 This manual is free documentation. It is dually licensed under the
120 GNU FDL and the GNU GPL. This means that you can redistribute this
121 manual under either of these two licenses, at your choice.
123 This manual is covered by the GNU FDL. Permission is granted to copy,
124 distribute and/or modify this document under the terms of the
125 GNU Free Documentation License (FDL), either version 1.2 of the
126 License, or (at your option) any later version published by the
127 Free Software Foundation (FSF); with no Invariant Sections, with no
128 Front-Cover Text, and with no Back-Cover Texts.
129 A copy of the license is included in @ref{GNU FDL}.
131 This manual is covered by the GNU GPL. You can redistribute it and/or
132 modify it under the terms of the GNU General Public License (GPL), either
133 version 2 of the License, or (at your option) any later version published
134 by the Free Software Foundation (FSF).
135 A copy of the license is included in @ref{GNU GPL}.
143 @node Top, Introduction, (dir), (dir)
144 @top GNU @code{gettext} utilities
146 This manual documents the GNU gettext tools and the GNU libintl library,
147 version @value{VERSION}.
150 * Introduction:: Introduction
151 * Users:: The User's View
152 * PO Files:: The Format of PO Files
153 * Sources:: Preparing Program Sources
154 * Template:: Making the PO Template File
155 * Creating:: Creating a New PO File
156 * Updating:: Updating Existing PO Files
157 * Editing:: Editing PO Files
158 * Manipulating:: Manipulating PO Files
159 * Binaries:: Producing Binary MO Files
160 * Programmers:: The Programmer's View
161 * Translators:: The Translator's View
162 * Maintainers:: The Maintainer's View
163 * Installers:: The Installer's and Distributor's View
164 * Programming Languages:: Other Programming Languages
165 * Conclusion:: Concluding Remarks
167 * Language Codes:: ISO 639 language codes
168 * Country Codes:: ISO 3166 country codes
169 * Licenses:: Licenses
171 * Program Index:: Index of Programs
172 * Option Index:: Index of Command-Line Options
173 * Variable Index:: Index of Environment Variables
174 * PO Mode Index:: Index of Emacs PO Mode Commands
175 * Autoconf Macro Index:: Index of Autoconf Macros
176 * Index:: General Index
179 --- The Detailed Node Listing ---
183 * Why:: The Purpose of GNU @code{gettext}
184 * Concepts:: I18n, L10n, and Such
185 * Aspects:: Aspects in Native Language Support
186 * Files:: Files Conveying Translations
187 * Overview:: Overview of GNU @code{gettext}
191 * System Installation:: Questions During Operating System Installation
192 * Setting the GUI Locale:: How to Specify the Locale Used by GUI Programs
193 * Setting the POSIX Locale:: How to Specify the Locale According to POSIX
194 * Installing Localizations:: How to Install Additional Translations
196 Setting the POSIX Locale
198 * Locale Names:: How a Locale Specification Looks Like
199 * Locale Environment Variables:: Which Environment Variable Specfies What
200 * The LANGUAGE variable:: How to Specify a Priority List of Languages
202 Preparing Program Sources
204 * Importing:: Importing the @code{gettext} declaration
205 * Triggering:: Triggering @code{gettext} Operations
206 * Preparing Strings:: Preparing Translatable Strings
207 * Mark Keywords:: How Marks Appear in Sources
208 * Marking:: Marking Translatable Strings
209 * c-format Flag:: Telling something about the following string
210 * Special cases:: Special Cases of Translatable Strings
211 * Bug Report Address:: Letting Users Report Translation Bugs
212 * Names:: Marking Proper Names for Translation
213 * Libraries:: Preparing Library Sources
215 Making the PO Template File
217 * xgettext Invocation:: Invoking the @code{xgettext} Program
219 Creating a New PO File
221 * msginit Invocation:: Invoking the @code{msginit} Program
222 * Header Entry:: Filling in the Header Entry
224 Updating Existing PO Files
226 * msgmerge Invocation:: Invoking the @code{msgmerge} Program
230 * KBabel:: KDE's PO File Editor
231 * Gtranslator:: GNOME's PO File Editor
232 * PO Mode:: Emacs's PO File Editor
233 * Compendium:: Using Translation Compendia
235 Emacs's PO File Editor
237 * Installation:: Completing GNU @code{gettext} Installation
238 * Main PO Commands:: Main Commands
239 * Entry Positioning:: Entry Positioning
240 * Normalizing:: Normalizing Strings in Entries
241 * Translated Entries:: Translated Entries
242 * Fuzzy Entries:: Fuzzy Entries
243 * Untranslated Entries:: Untranslated Entries
244 * Obsolete Entries:: Obsolete Entries
245 * Modifying Translations:: Modifying Translations
246 * Modifying Comments:: Modifying Comments
247 * Subedit:: Mode for Editing Translations
248 * C Sources Context:: C Sources Context
249 * Auxiliary:: Consulting Auxiliary PO Files
251 Using Translation Compendia
253 * Creating Compendia:: Merging translations for later use
254 * Using Compendia:: Using older translations if they fit
256 Manipulating PO Files
258 * msgcat Invocation:: Invoking the @code{msgcat} Program
259 * msgconv Invocation:: Invoking the @code{msgconv} Program
260 * msggrep Invocation:: Invoking the @code{msggrep} Program
261 * msgfilter Invocation:: Invoking the @code{msgfilter} Program
262 * msguniq Invocation:: Invoking the @code{msguniq} Program
263 * msgcomm Invocation:: Invoking the @code{msgcomm} Program
264 * msgcmp Invocation:: Invoking the @code{msgcmp} Program
265 * msgattrib Invocation:: Invoking the @code{msgattrib} Program
266 * msgen Invocation:: Invoking the @code{msgen} Program
267 * msgexec Invocation:: Invoking the @code{msgexec} Program
268 * Colorizing:: Highlighting parts of PO files
269 * libgettextpo:: Writing your own programs that process PO files
271 Highlighting parts of PO files
273 * The --color option:: Triggering colorized output
274 * The TERM variable:: The environment variable @code{TERM}
275 * The --style option:: The @code{--style} option
276 * Style rules:: Style rules for PO files
277 * Customizing less:: Customizing @code{less} for viewing PO files
279 Producing Binary MO Files
281 * msgfmt Invocation:: Invoking the @code{msgfmt} Program
282 * msgunfmt Invocation:: Invoking the @code{msgunfmt} Program
283 * MO Files:: The Format of GNU MO Files
285 The Programmer's View
287 * catgets:: About @code{catgets}
288 * gettext:: About @code{gettext}
289 * Comparison:: Comparing the two interfaces
290 * Using libintl.a:: Using libintl.a in own programs
291 * gettext grok:: Being a @code{gettext} grok
292 * Temp Programmers:: Temporary Notes for the Programmers Chapter
296 * Interface to catgets:: The interface
297 * Problems with catgets:: Problems with the @code{catgets} interface?!
301 * Interface to gettext:: The interface
302 * Ambiguities:: Solving ambiguities
303 * Locating Catalogs:: Locating message catalog files
304 * Charset conversion:: How to request conversion to Unicode
305 * Contexts:: Solving ambiguities in GUI programs
306 * Plural forms:: Additional functions for handling plurals
307 * Optimized gettext:: Optimization of the *gettext functions
309 Temporary Notes for the Programmers Chapter
311 * Temp Implementations:: Temporary - Two Possible Implementations
312 * Temp catgets:: Temporary - About @code{catgets}
313 * Temp WSI:: Temporary - Why a single implementation
314 * Temp Notes:: Temporary - Notes
316 The Translator's View
318 * Trans Intro 0:: Introduction 0
319 * Trans Intro 1:: Introduction 1
320 * Discussions:: Discussions
321 * Organization:: Organization
322 * Information Flow:: Information Flow
323 * Translating plural forms:: How to fill in @code{msgstr[0]}, @code{msgstr[1]}
324 * Prioritizing messages:: How to find which messages to translate first
328 * Central Coordination:: Central Coordination
329 * National Teams:: National Teams
330 * Mailing Lists:: Mailing Lists
334 * Sub-Cultures:: Sub-Cultures
335 * Organizational Ideas:: Organizational Ideas
337 The Maintainer's View
339 * Flat and Non-Flat:: Flat or Non-Flat Directory Structures
340 * Prerequisites:: Prerequisite Works
341 * gettextize Invocation:: Invoking the @code{gettextize} Program
342 * Adjusting Files:: Files You Must Create or Alter
343 * autoconf macros:: Autoconf macros for use in @file{configure.ac}
344 * CVS Issues:: Integrating with CVS
345 * Release Management:: Creating a Distribution Tarball
347 Files You Must Create or Alter
349 * po/POTFILES.in:: @file{POTFILES.in} in @file{po/}
350 * po/LINGUAS:: @file{LINGUAS} in @file{po/}
351 * po/Makevars:: @file{Makevars} in @file{po/}
352 * po/Rules-*:: Extending @file{Makefile} in @file{po/}
353 * configure.ac:: @file{configure.ac} at top level
354 * config.guess:: @file{config.guess}, @file{config.sub} at top level
355 * mkinstalldirs:: @file{mkinstalldirs} at top level
356 * aclocal:: @file{aclocal.m4} at top level
357 * acconfig:: @file{acconfig.h} at top level
358 * config.h.in:: @file{config.h.in} at top level
359 * Makefile:: @file{Makefile.in} at top level
360 * src/Makefile:: @file{Makefile.in} in @file{src/}
361 * lib/gettext.h:: @file{gettext.h} in @file{lib/}
363 Autoconf macros for use in @file{configure.ac}
365 * AM_GNU_GETTEXT:: AM_GNU_GETTEXT in @file{gettext.m4}
366 * AM_GNU_GETTEXT_VERSION:: AM_GNU_GETTEXT_VERSION in @file{gettext.m4}
367 * AM_GNU_GETTEXT_NEED:: AM_GNU_GETTEXT_NEED in @file{gettext.m4}
368 * AM_GNU_GETTEXT_INTL_SUBDIR:: AM_GNU_GETTEXT_INTL_SUBDIR in @file{intldir.m4}
369 * AM_PO_SUBDIRS:: AM_PO_SUBDIRS in @file{po.m4}
370 * AM_ICONV:: AM_ICONV in @file{iconv.m4}
374 * Distributed CVS:: Avoiding version mismatch in distributed development
375 * Files under CVS:: Files to put under CVS version control
376 * autopoint Invocation:: Invoking the @code{autopoint} Program
378 Other Programming Languages
380 * Language Implementors:: The Language Implementor's View
381 * Programmers for other Languages:: The Programmer's View
382 * Translators for other Languages:: The Translator's View
383 * Maintainers for other Languages:: The Maintainer's View
384 * List of Programming Languages:: Individual Programming Languages
385 * List of Data Formats:: Internationalizable Data
387 The Translator's View
389 * c-format:: C Format Strings
390 * objc-format:: Objective C Format Strings
391 * sh-format:: Shell Format Strings
392 * python-format:: Python Format Strings
393 * lisp-format:: Lisp Format Strings
394 * elisp-format:: Emacs Lisp Format Strings
395 * librep-format:: librep Format Strings
396 * scheme-format:: Scheme Format Strings
397 * smalltalk-format:: Smalltalk Format Strings
398 * java-format:: Java Format Strings
399 * csharp-format:: C# Format Strings
400 * awk-format:: awk Format Strings
401 * object-pascal-format:: Object Pascal Format Strings
402 * ycp-format:: YCP Format Strings
403 * tcl-format:: Tcl Format Strings
404 * perl-format:: Perl Format Strings
405 * php-format:: PHP Format Strings
406 * gcc-internal-format:: GCC internal Format Strings
407 * gfc-internal-format:: GFC internal Format Strings
408 * qt-format:: Qt Format Strings
409 * qt-plural-format:: Qt Plural Format Strings
410 * kde-format:: KDE Format Strings
411 * boost-format:: Boost Format Strings
412 * lua-format:: Lua Format Strings
413 * javascript-format:: JavaScript Format Strings
415 Individual Programming Languages
417 * C:: C, C++, Objective C
418 * sh:: sh - Shell Script
419 * bash:: bash - Bourne-Again Shell Script
421 * Common Lisp:: GNU clisp - Common Lisp
422 * clisp C:: GNU clisp C sources
423 * Emacs Lisp:: Emacs Lisp
425 * Scheme:: GNU guile - Scheme
426 * Smalltalk:: GNU Smalltalk
430 * Pascal:: Pascal - Free Pascal Compiler
431 * wxWidgets:: wxWidgets library
432 * YCP:: YCP - YaST2 scripting language
433 * Tcl:: Tcl - Tk's scripting language
435 * PHP:: PHP Hypertext Preprocessor
437 * GCC-source:: GNU Compiler Collection sources
439 * JavaScript:: JavaScript
443 * Preparing Shell Scripts:: Preparing Shell Scripts for Internationalization
444 * gettext.sh:: Contents of @code{gettext.sh}
445 * gettext Invocation:: Invoking the @code{gettext} program
446 * ngettext Invocation:: Invoking the @code{ngettext} program
447 * envsubst Invocation:: Invoking the @code{envsubst} program
448 * eval_gettext Invocation:: Invoking the @code{eval_gettext} function
449 * eval_ngettext Invocation:: Invoking the @code{eval_ngettext} function
453 * General Problems:: General Problems Parsing Perl Code
454 * Default Keywords:: Which Keywords Will xgettext Look For?
455 * Special Keywords:: How to Extract Hash Keys
456 * Quote-like Expressions:: What are Strings And Quote-like Expressions?
457 * Interpolation I:: Invalid String Interpolation
458 * Interpolation II:: Valid String Interpolation
459 * Parentheses:: When To Use Parentheses
460 * Long Lines:: How To Grok with Long Lines
461 * Perl Pitfalls:: Bugs, Pitfalls, and Things That Do Not Work
463 Internationalizable Data
465 * POT:: POT - Portable Object Template
466 * RST:: Resource String Table
467 * Glade:: Glade - GNOME user interface description
471 * History:: History of GNU @code{gettext}
472 * References:: Related Readings
476 * Usual Language Codes:: Two-letter ISO 639 language codes
477 * Rare Language Codes:: Three-letter ISO 639 language codes
481 * GNU GPL:: GNU General Public License
482 * GNU LGPL:: GNU Lesser General Public License
483 * GNU FDL:: GNU Free Documentation License
490 @node Introduction, Users, Top, Top
491 @chapter Introduction
493 This chapter explains the goals sought in the creation
494 of GNU @code{gettext} and the free Translation Project.
495 Then, it explains a few broad concepts around
496 Native Language Support, and positions message translation with regard
497 to other aspects of national and cultural variance, as they apply
498 to programs. It also surveys those files used to convey the
499 translations. It explains how the various tools interact in the
500 initial generation of these files, and later, how the maintenance
501 cycle should usually operate.
504 @cindex he, she, and they
505 @cindex she, he, and they
506 In this manual, we use @emph{he} when speaking of the programmer or
507 maintainer, @emph{she} when speaking of the translator, and @emph{they}
508 when speaking of the installers or end users of the translated program.
509 This is only a convenience for clarifying the documentation. It is
510 @emph{absolutely} not meant to imply that some roles are more appropriate
511 to males or females. Besides, as you might guess, GNU @code{gettext}
512 is meant to be useful for people using computers, whatever their sex,
513 race, religion or nationality!
515 @cindex bug report address
516 Please send suggestions and corrections to:
520 @r{Internet address:}
521 bug-gnu-gettext@@gnu.org
526 Please include the manual's edition number and update date in your messages.
529 * Why:: The Purpose of GNU @code{gettext}
530 * Concepts:: I18n, L10n, and Such
531 * Aspects:: Aspects in Native Language Support
532 * Files:: Files Conveying Translations
533 * Overview:: Overview of GNU @code{gettext}
536 @node Why, Concepts, Introduction, Introduction
537 @section The Purpose of GNU @code{gettext}
539 Usually, programs are written and documented in English, and use
540 English at execution time to interact with users. This is true
541 not only of GNU software, but also of a great deal of proprietary
542 and free software. Using a common language is quite handy for
543 communication between developers, maintainers and users from all
544 countries. On the other hand, most people are less comfortable with
545 English than with their own native language, and would prefer to
546 use their mother tongue for day to day's work, as far as possible.
547 Many would simply @emph{love} to see their computer screen showing
548 a lot less of English, and far more of their own language.
550 @cindex Translation Project
551 However, to many people, this dream might appear so far fetched that
552 they may believe it is not even worth spending time thinking about
553 it. They have no confidence at all that the dream might ever
554 become true. Yet some have not lost hope, and have organized themselves.
555 The Translation Project is a formalization of this hope into a
556 workable structure, which has a good chance to get all of us nearer
557 the achievement of a truly multi-lingual set of programs.
559 GNU @code{gettext} is an important step for the Translation Project,
560 as it is an asset on which we may build many other steps. This package
561 offers to programmers, translators and even users, a well integrated
562 set of tools and documentation. Specifically, the GNU @code{gettext}
563 utilities are a set of tools that provides a framework within which
564 other free packages may produce multi-lingual messages. These tools
569 A set of conventions about how programs should be written to support
573 A directory and file naming organization for the message catalogs
577 A runtime library supporting the retrieval of translated messages.
580 A few stand-alone programs to massage in various ways the sets of
581 translatable strings, or already translated strings.
584 A library supporting the parsing and creation of files containing
588 A special mode for Emacs@footnote{In this manual, all mentions of Emacs
589 refers to either GNU Emacs or to XEmacs, which people sometimes call FSF
590 Emacs and Lucid Emacs, respectively.} which helps preparing these sets
591 and bringing them up to date.
594 GNU @code{gettext} is designed to minimize the impact of
595 internationalization on program sources, keeping this impact as small
596 and hardly noticeable as possible. Internationalization has better
597 chances of succeeding if it is very light weighted, or at least,
598 appear to be so, when looking at program sources.
600 The Translation Project also uses the GNU @code{gettext} distribution
601 as a vehicle for documenting its structure and methods. This goes
602 beyond the strict technicalities of documenting the GNU @code{gettext}
603 proper. By so doing, translators will find in a single place, as
604 far as possible, all they need to know for properly doing their
605 translating work. Also, this supplemental documentation might also
606 help programmers, and even curious users, in understanding how GNU
607 @code{gettext} is related to the remainder of the Translation
608 Project, and consequently, have a glimpse at the @emph{big picture}.
610 @node Concepts, Aspects, Why, Introduction
611 @section I18n, L10n, and Such
615 Two long words appear all the time when we discuss support of native
616 language in programs, and these words have a precise meaning, worth
617 being explained here, once and for all in this document. The words are
618 @emph{internationalization} and @emph{localization}. Many people,
619 tired of writing these long words over and over again, took the
620 habit of writing @dfn{i18n} and @dfn{l10n} instead, quoting the first
621 and last letter of each word, and replacing the run of intermediate
622 letters by a number merely telling how many such letters there are.
623 But in this manual, in the sake of clarity, we will patiently write
624 the names in full, each time@dots{}
626 @cindex internationalization
627 By @dfn{internationalization}, one refers to the operation by which a
628 program, or a set of programs turned into a package, is made aware of and
629 able to support multiple languages. This is a generalization process,
630 by which the programs are untied from calling only English strings or
631 other English specific habits, and connected to generic ways of doing
632 the same, instead. Program developers may use various techniques to
633 internationalize their programs. Some of these have been standardized.
634 GNU @code{gettext} offers one of these standards. @xref{Programmers}.
637 By @dfn{localization}, one means the operation by which, in a set
638 of programs already internationalized, one gives the program all
639 needed information so that it can adapt itself to handle its input
640 and output in a fashion which is correct for some native language and
641 cultural habits. This is a particularisation process, by which generic
642 methods already implemented in an internationalized program are used
643 in specific ways. The programming environment puts several functions
644 to the programmers disposal which allow this runtime configuration.
645 The formal description of specific set of cultural habits for some
646 country, together with all associated translations targeted to the
647 same native language, is called the @dfn{locale} for this language
648 or country. Users achieve localization of programs by setting proper
649 values to special environment variables, prior to executing those
650 programs, identifying which locale should be used.
652 In fact, locale message support is only one component of the cultural
653 data that makes up a particular locale. There are a whole host of
654 routines and functions provided to aid programmers in developing
655 internationalized software and which allow them to access the data
656 stored in a particular locale. When someone presently refers to a
657 particular locale, they are obviously referring to the data stored
658 within that particular locale. Similarly, if a programmer is referring
659 to ``accessing the locale routines'', they are referring to the
660 complete suite of routines that access all of the locale's information.
663 @cindex Native Language Support
664 @cindex Natural Language Support
665 One uses the expression @dfn{Native Language Support}, or merely NLS,
666 for speaking of the overall activity or feature encompassing both
667 internationalization and localization, allowing for multi-lingual
668 interactions in a program. In a nutshell, one could say that
669 internationalization is the operation by which further localizations
672 Also, very roughly said, when it comes to multi-lingual messages,
673 internationalization is usually taken care of by programmers, and
674 localization is usually taken care of by translators.
676 @node Aspects, Files, Concepts, Introduction
677 @section Aspects in Native Language Support
679 @cindex translation aspects
680 For a totally multi-lingual distribution, there are many things to
681 translate beyond output messages.
685 As of today, GNU @code{gettext} offers a complete toolset for
686 translating messages output by C programs. Perl scripts and shell
687 scripts will also need to be translated. Even if there are today some hooks
688 by which this can be done, these hooks are not integrated as well as they
692 Some programs, like @code{autoconf} or @code{bison}, are able
693 to produce other programs (or scripts). Even if the generating
694 programs themselves are internationalized, the generated programs they
695 produce may need internationalization on their own, and this indirect
696 internationalization could be automated right from the generating
697 program. In fact, quite usually, generating and generated programs
698 could be internationalized independently, as the effort needed is
702 A few programs include textual tables which might need translation
703 themselves, independently of the strings contained in the program
704 itself. For example, @w{RFC 1345} gives an English description for each
705 character which the @code{recode} program is able to reconstruct at execution.
706 Since these descriptions are extracted from the RFC by mechanical means,
707 translating them properly would require a prior translation of the RFC
711 Almost all programs accept options, which are often worded out so to
712 be descriptive for the English readers; one might want to consider
713 offering translated versions for program options as well.
716 Many programs read, interpret, compile, or are somewhat driven by
717 input files which are texts containing keywords, identifiers, or
718 replies which are inherently translatable. For example, one may want
719 @code{gcc} to allow diacriticized characters in identifiers or use
720 translated keywords; @samp{rm -i} might accept something else than
721 @samp{y} or @samp{n} for replies, etc. Even if the program will
722 eventually make most of its output in the foreign languages, one has
723 to decide whether the input syntax, option values, etc., are to be
727 The manual accompanying a package, as well as all documentation files
728 in the distribution, could surely be translated, too. Translating a
729 manual, with the intent of later keeping up with updates, is a major
730 undertaking in itself, generally.
734 As we already stressed, translation is only one aspect of locales.
735 Other internationalization aspects are system services and are handled
736 in GNU @code{libc}. There
737 are many attributes that are needed to define a country's cultural
738 conventions. These attributes include beside the country's native
739 language, the formatting of the date and time, the representation of
740 numbers, the symbols for currency, etc. These local @dfn{rules} are
741 termed the country's locale. The locale represents the knowledge
742 needed to support the country's native attributes.
744 @cindex locale categories
745 There are a few major areas which may vary between countries and
746 hence, define what a locale must describe. The following list helps
747 putting multi-lingual messages into the proper context of other tasks
748 related to locales. See the GNU @code{libc} manual for details.
752 @item Characters and Codesets
755 @cindex character encoding
756 @cindex locale category, LC_CTYPE
758 The codeset most commonly used through out the USA and most English
759 speaking parts of the world is the ASCII codeset. However, there are
760 many characters needed by various locales that are not found within
761 this codeset. The 8-bit @w{ISO 8859-1} code set has most of the special
762 characters needed to handle the major European languages. However, in
763 many cases, choosing @w{ISO 8859-1} is nevertheless not adequate: it
764 doesn't even handle the major European currency. Hence each locale
765 will need to specify which codeset they need to use and will need
766 to have the appropriate character handling routines to cope with
770 @cindex currency symbols
771 @cindex locale category, LC_MONETARY
773 The symbols used vary from country to country as does the position
774 used by the symbol. Software needs to be able to transparently
775 display currency figures in the native mode for each locale.
779 @cindex locale category, LC_TIME
781 The format of date varies between locales. For example, Christmas day
782 in 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia.
783 Other countries might use @w{ISO 8601} dates, etc.
785 Time of the day may be noted as @var{hh}:@var{mm}, @var{hh}.@var{mm},
786 or otherwise. Some locales require time to be specified in 24-hour
787 mode rather than as AM or PM. Further, the nature and yearly extent
788 of the Daylight Saving correction vary widely between countries.
791 @cindex number format
792 @cindex locale category, LC_NUMERIC
794 Numbers can be represented differently in different locales.
795 For example, the following numbers are all written correctly for
796 their respective locales:
805 Some programs could go further and use different unit systems, like
806 English units or Metric units, or even take into account variants
807 about how numbers are spelled in full.
811 @cindex locale category, LC_MESSAGES
813 The most obvious area is the language support within a locale. This is
814 where GNU @code{gettext} provides the means for developers and users to
815 easily change the language that the software uses to communicate to
820 @cindex locale categories
821 These areas of cultural conventions are called @emph{locale categories}.
822 It is an unfortunate term; @emph{locale aspects} or @emph{locale feature
823 categories} would be a better term, because each ``locale category''
824 describes an area or task that requires localization. The concrete data
825 that describes the cultural conventions for such an area and for a particular
826 culture is also called a @emph{locale category}. In this sense, a locale
827 is composed of several locale categories: the locale category describing
828 the codeset, the locale category describing the formatting of numbers,
829 the locale category containing the translated messages, and so on.
832 Components of locale outside of message handling are standardized in
833 the ISO C standard and the POSIX:2001 standard (also known as the SUSV3
834 specification). GNU @code{libc}
835 fully implements this, and most other modern systems provide a more
836 or less reasonable support for at least some of the missing components.
838 @node Files, Overview, Aspects, Introduction
839 @section Files Conveying Translations
841 @cindex files, @file{.po} and @file{.mo}
842 The letters PO in @file{.po} files means Portable Object, to
843 distinguish it from @file{.mo} files, where MO stands for Machine
844 Object. This paradigm, as well as the PO file format, is inspired
845 by the NLS standard developed by Uniforum, and first implemented by
846 Sun in their Solaris system.
848 PO files are meant to be read and edited by humans, and associate each
849 original, translatable string of a given package with its translation
850 in a particular target language. A single PO file is dedicated to
851 a single target language. If a package supports many languages,
852 there is one such PO file per language supported, and each package
853 has its own set of PO files. These PO files are best created by
854 the @code{xgettext} program, and later updated or refreshed through
855 the @code{msgmerge} program. Program @code{xgettext} extracts all
856 marked messages from a set of C files and initializes a PO file with
857 empty translations. Program @code{msgmerge} takes care of adjusting
858 PO files between releases of the corresponding sources, commenting
859 obsolete entries, initializing new ones, and updating all source
860 line references. Files ending with @file{.pot} are kind of base
861 translation files found in distributions, in PO file format.
863 MO files are meant to be read by programs, and are binary in nature.
864 A few systems already offer tools for creating and handling MO files
865 as part of the Native Language Support coming with the system, but the
866 format of these MO files is often different from system to system,
867 and non-portable. The tools already provided with these systems don't
868 support all the features of GNU @code{gettext}. Therefore GNU
869 @code{gettext} uses its own format for MO files. Files ending with
870 @file{.gmo} are really MO files, when it is known that these files use
873 @node Overview, , Files, Introduction
874 @section Overview of GNU @code{gettext}
876 @cindex overview of @code{gettext}
878 @cindex tutorial of @code{gettext} usage
879 The following diagram summarizes the relation between the files
880 handled by GNU @code{gettext} and the tools acting on these files.
881 It is followed by somewhat detailed explanations, which you should
882 read while keeping an eye on the diagram. Having a clear understanding
883 of these interrelations will surely help programmers, translators
889 Original C Sources ───> Preparation ───> Marked C Sources ───╮
891 â•â”€â”€â”€â”€â”€â”€â”€â”€â”€<─── GNU gettext Library │
892 â•â”€â”€â”€ make <───┤ │
893 │ ╰─────────<────────────────────┬───────────────╯
895 │ â•â”€â”€â”€â”€â”€<─── PACKAGE.pot <─── xgettext <───╯ â•â”€â”€â”€<─── PO Compendium
897 │ │ ╰───╮ │
898 │ ╰───╮ ├───> PO editor ───╮
899 │ ├────> msgmerge ──────> LANG.po ────>────────╯ │
900 │ â•â”€â”€â”€â•¯ │
902 │ ╰─────────────<───────────────╮ │
903 │ ├─── New LANG.po <────────────────────╯
904 │ â•â”€â”€â”€ LANG.gmo <─── msgfmt <───╯
906 │ ╰───> install ───> /.../LANG/PACKAGE.mo ───╮
907 │ ├───> "Hello world!"
908 ╰───────> install ───> /.../bin/PROGRAM ───────╯
915 Original C Sources ---> Preparation ---> Marked C Sources ---.
917 .---------<--- GNU gettext Library |
919 | `---------<--------------------+---------------'
921 | .-----<--- PACKAGE.pot <--- xgettext <---' .---<--- PO Compendium
924 | `---. +---> PO editor ---.
925 | +----> msgmerge ------> LANG.po ---->--------' |
928 | `-------------<---------------. |
929 | +--- New LANG.po <--------------------'
930 | .--- LANG.gmo <--- msgfmt <---'
932 | `---> install ---> /.../LANG/PACKAGE.mo ---.
933 | +---> "Hello world!"
934 `-------> install ---> /.../bin/PROGRAM -------'
939 @cindex marking translatable strings
940 As a programmer, the first step to bringing GNU @code{gettext}
941 into your package is identifying, right in the C sources, those strings
942 which are meant to be translatable, and those which are untranslatable.
943 This tedious job can be done a little more comfortably using emacs PO
944 mode, but you can use any means familiar to you for modifying your
945 C sources. Beside this some other simple, standard changes are needed to
946 properly initialize the translation library. @xref{Sources}, for
947 more information about all this.
949 For newly written software the strings of course can and should be
950 marked while writing it. The @code{gettext} approach makes this
951 very easy. Simply put the following lines at the beginning of each file
952 or in a central header file:
956 #define _(String) (String)
957 #define N_(String) String
958 #define textdomain(Domain)
959 #define bindtextdomain(Package, Directory)
964 Doing this allows you to prepare the sources for internationalization.
965 Later when you feel ready for the step to use the @code{gettext} library
966 simply replace these definitions by the following:
968 @cindex include file @file{libintl.h}
972 #define _(String) gettext (String)
973 #define gettext_noop(String) String
974 #define N_(String) gettext_noop (String)
978 @cindex link with @file{libintl}
981 and link against @file{libintl.a} or @file{libintl.so}. Note that on
982 GNU systems, you don't need to link with @code{libintl} because the
983 @code{gettext} library functions are already contained in GNU libc.
984 That is all you have to change.
986 @cindex template PO file
987 @cindex files, @file{.pot}
988 Once the C sources have been modified, the @code{xgettext} program
989 is used to find and extract all translatable strings, and create a
990 PO template file out of all these. This @file{@var{package}.pot} file
991 contains all original program strings. It has sets of pointers to
992 exactly where in C sources each string is used. All translations
993 are set to empty. The letter @code{t} in @file{.pot} marks this as
994 a Template PO file, not yet oriented towards any particular language.
995 @xref{xgettext Invocation}, for more details about how one calls the
996 @code{xgettext} program. If you are @emph{really} lazy, you might
997 be interested at working a lot more right away, and preparing the
998 whole distribution setup (@pxref{Maintainers}). By doing so, you
999 spare yourself typing the @code{xgettext} command, as @code{make}
1000 should now generate the proper things automatically for you!
1002 The first time through, there is no @file{@var{lang}.po} yet, so the
1003 @code{msgmerge} step may be skipped and replaced by a mere copy of
1004 @file{@var{package}.pot} to @file{@var{lang}.po}, where @var{lang}
1005 represents the target language. See @ref{Creating} for details.
1007 Then comes the initial translation of messages. Translation in
1008 itself is a whole matter, still exclusively meant for humans,
1009 and whose complexity far overwhelms the level of this manual.
1010 Nevertheless, a few hints are given in some other chapter of this
1011 manual (@pxref{Translators}). You will also find there indications
1012 about how to contact translating teams, or becoming part of them,
1013 for sharing your translating concerns with others who target the same
1016 While adding the translated messages into the @file{@var{lang}.po}
1017 PO file, if you are not using one of the dedicated PO file editors
1018 (@pxref{Editing}), you are on your own
1019 for ensuring that your efforts fully respect the PO file format, and quoting
1020 conventions (@pxref{PO Files}). This is surely not an impossible task,
1021 as this is the way many people have handled PO files around 1995.
1022 On the other hand, by using a PO file editor, most details
1023 of PO file format are taken care of for you, but you have to acquire
1024 some familiarity with PO file editor itself.
1026 If some common translations have already been saved into a compendium
1027 PO file, translators may use PO mode for initializing untranslated
1028 entries from the compendium, and also save selected translations into
1029 the compendium, updating it (@pxref{Compendium}). Compendium files
1030 are meant to be exchanged between members of a given translation team.
1032 Programs, or packages of programs, are dynamic in nature: users write
1033 bug reports and suggestion for improvements, maintainers react by
1034 modifying programs in various ways. The fact that a package has
1035 already been internationalized should not make maintainers shy
1036 of adding new strings, or modifying strings already translated.
1037 They just do their job the best they can. For the Translation
1038 Project to work smoothly, it is important that maintainers do not
1039 carry translation concerns on their already loaded shoulders, and that
1040 translators be kept as free as possible of programming concerns.
1042 The only concern maintainers should have is carefully marking new
1043 strings as translatable, when they should be, and do not otherwise
1044 worry about them being translated, as this will come in proper time.
1045 Consequently, when programs and their strings are adjusted in various
1046 ways by maintainers, and for matters usually unrelated to translation,
1047 @code{xgettext} would construct @file{@var{package}.pot} files which are
1048 evolving over time, so the translations carried by @file{@var{lang}.po}
1049 are slowly fading out of date.
1051 @cindex evolution of packages
1052 It is important for translators (and even maintainers) to understand
1053 that package translation is a continuous process in the lifetime of a
1054 package, and not something which is done once and for all at the start.
1055 After an initial burst of translation activity for a given package,
1056 interventions are needed once in a while, because here and there,
1057 translated entries become obsolete, and new untranslated entries
1058 appear, needing translation.
1060 The @code{msgmerge} program has the purpose of refreshing an already
1061 existing @file{@var{lang}.po} file, by comparing it with a newer
1062 @file{@var{package}.pot} template file, extracted by @code{xgettext}
1063 out of recent C sources. The refreshing operation adjusts all
1064 references to C source locations for strings, since these strings
1065 move as programs are modified. Also, @code{msgmerge} comments out as
1066 obsolete, in @file{@var{lang}.po}, those already translated entries
1067 which are no longer used in the program sources (@pxref{Obsolete
1068 Entries}). It finally discovers new strings and inserts them in
1069 the resulting PO file as untranslated entries (@pxref{Untranslated
1070 Entries}). @xref{msgmerge Invocation}, for more information about what
1071 @code{msgmerge} really does.
1073 Whatever route or means taken, the goal is to obtain an updated
1074 @file{@var{lang}.po} file offering translations for all strings.
1076 The temporal mobility, or fluidity of PO files, is an integral part of
1077 the translation game, and should be well understood, and accepted.
1078 People resisting it will have a hard time participating in the
1079 Translation Project, or will give a hard time to other participants! In
1080 particular, maintainers should relax and include all available official
1081 PO files in their distributions, even if these have not recently been
1082 updated, without exerting pressure on the translator teams to get the
1083 job done. The pressure should rather come
1084 from the community of users speaking a particular language, and
1085 maintainers should consider themselves fairly relieved of any concern
1086 about the adequacy of translation files. On the other hand, translators
1087 should reasonably try updating the PO files they are responsible for,
1088 while the package is undergoing pretest, prior to an official
1091 Once the PO file is complete and dependable, the @code{msgfmt} program
1092 is used for turning the PO file into a machine-oriented format, which
1093 may yield efficient retrieval of translations by the programs of the
1094 package, whenever needed at runtime (@pxref{MO Files}). @xref{msgfmt
1095 Invocation}, for more information about all modes of execution
1096 for the @code{msgfmt} program.
1098 Finally, the modified and marked C sources are compiled and linked
1099 with the GNU @code{gettext} library, usually through the operation of
1100 @code{make}, given a suitable @file{Makefile} exists for the project,
1101 and the resulting executable is installed somewhere users will find it.
1102 The MO files themselves should also be properly installed. Given the
1103 appropriate environment variables are set (@pxref{Setting the POSIX Locale}),
1104 the program should localize itself automatically, whenever it executes.
1106 The remainder of this manual has the purpose of explaining in depth the various
1107 steps outlined above.
1109 @node Users, PO Files, Introduction, Top
1110 @chapter The User's View
1112 Nowadays, when users log into a computer, they usually find that all
1113 their programs show messages in their native language -- at least for
1114 users of languages with an active free software community, like French or
1115 German; to a lesser extent for languages with a smaller participation in
1116 free software and the GNU project, like Hindi and Filipino.
1118 How does this work? How can the user influence the language that is used
1119 by the programs? This chapter will answer it.
1122 * System Installation:: Questions During Operating System Installation
1123 * Setting the GUI Locale:: How to Specify the Locale Used by GUI Programs
1124 * Setting the POSIX Locale:: How to Specify the Locale According to POSIX
1125 * Installing Localizations:: How to Install Additional Translations
1128 @node System Installation, Setting the GUI Locale, Users, Users
1129 @section Operating System Installation
1131 The default language is often already specified during operating system
1132 installation. When the operating system is installed, the installer
1133 typically asks for the language used for the installation process and,
1134 separately, for the language to use in the installed system. Some OS
1135 installers only ask for the language once.
1137 This determines the system-wide default language for all users. But the
1138 installers often give the possibility to install extra localizations for
1139 additional languages. For example, the localizations of KDE (the K
1140 Desktop Environment) and OpenOffice.org are often bundled separately,
1141 as one installable package per language.
1143 At this point it is good to consider the intended use of the machine: If
1144 it is a machine designated for personal use, additional localizations are
1145 probably not necessary. If, however, the machine is in use in an
1146 organization or company that has international relationships, one can
1147 consider the needs of guest users. If you have a guest from abroad, for
1148 a week, what could be his preferred locales? It may be worth installing
1149 these additional localizations ahead of time, since they cost only a bit
1150 of disk space at this point.
1152 The system-wide default language is the locale configuration that is used
1153 when a new user account is created. But the user can have his own locale
1154 configuration that is different from the one of the other users of the
1155 same machine. He can specify it, typically after the first login, as
1156 described in the next section.
1158 @node Setting the GUI Locale, Setting the POSIX Locale, System Installation, Users
1159 @section Setting the Locale Used by GUI Programs
1161 The immediately available programs in a user's desktop come from a group
1162 of programs called a ``desktop environment''; it usually includes the window
1163 manager, a web browser, a text editor, and more. The most common free
1164 desktop environments are KDE, GNOME, and Xfce.
1166 The locale used by GUI programs of the desktop environment can be specified
1167 in a configuration screen called ``control center'', ``language settings''
1168 or ``country settings''.
1170 Individual GUI programs that are not part of the desktop environment can
1171 have their locale specified either in a settings panel, or through environment
1174 For some programs, it is possible to specify the locale through environment
1175 variables, possibly even to a different locale than the desktop's locale.
1176 This means, instead of starting a program through a menu or from the file
1177 system, you can start it from the command-line, after having set some
1178 environment variables. The environment variables can be those specified
1179 in the next section (@ref{Setting the POSIX Locale}); for some versions of
1180 KDE, however, the locale is specified through a variable @code{KDE_LANG},
1181 rather than @code{LANG} or @code{LC_ALL}.
1183 @node Setting the POSIX Locale, Installing Localizations, Setting the GUI Locale, Users
1184 @section Setting the Locale through Environment Variables
1186 As a user, if your language has been installed for this package, in the
1187 simplest case, you only have to set the @code{LANG} environment variable
1188 to the appropriate @samp{@var{ll}_@var{CC}} combination. For example,
1189 let's suppose that you speak German and live in Germany. At the shell
1190 prompt, merely execute
1191 @w{@samp{setenv LANG de_DE}} (in @code{csh}),
1192 @w{@samp{export LANG; LANG=de_DE}} (in @code{sh}) or
1193 @w{@samp{export LANG=de_DE}} (in @code{bash}). This can be done from your
1194 @file{.login} or @file{.profile} file, once and for all.
1197 * Locale Names:: How a Locale Specification Looks Like
1198 * Locale Environment Variables:: Which Environment Variable Specfies What
1199 * The LANGUAGE variable:: How to Specify a Priority List of Languages
1202 @node Locale Names, Locale Environment Variables, Setting the POSIX Locale, Setting the POSIX Locale
1203 @subsection Locale Names
1205 A locale name usually has the form @samp{@var{ll}_@var{CC}}. Here
1206 @samp{@var{ll}} is an @w{ISO 639} two-letter language code, and
1207 @samp{@var{CC}} is an @w{ISO 3166} two-letter country code. For example,
1208 for German in Germany, @var{ll} is @code{de}, and @var{CC} is @code{DE}.
1209 You find a list of the language codes in appendix @ref{Language Codes} and
1210 a list of the country codes in appendix @ref{Country Codes}.
1212 You might think that the country code specification is redundant. But in
1213 fact, some languages have dialects in different countries. For example,
1214 @samp{de_AT} is used for Austria, and @samp{pt_BR} for Brazil. The country
1215 code serves to distinguish the dialects.
1217 Many locale names have an extended syntax
1218 @samp{@var{ll}_@var{CC}.@var{encoding}} that also specifies the character
1219 encoding. These are in use because between 2000 and 2005, most users have
1220 switched to locales in UTF-8 encoding. For example, the German locale on
1221 glibc systems is nowadays @samp{de_DE.UTF-8}. The older name @samp{de_DE}
1222 still refers to the German locale as of 2000 that stores characters in
1223 ISO-8859-1 encoding -- a text encoding that cannot even accommodate the Euro
1226 Some locale names use @samp{@var{ll}_@var{CC}.@@@var{variant}} instead of
1227 @samp{@var{ll}_@var{CC}}. The @samp{@@@var{variant}} can denote any kind of
1228 characteristics that is not already implied by the language @var{ll} and
1229 the country @var{CC}. It can denote a particular monetary unit. For example,
1230 on glibc systems, @samp{de_DE@@euro} denotes the locale that uses the Euro
1231 currency, in contrast to the older locale @samp{de_DE} which implies the use
1232 of the currency before 2002. It can also denote a dialect of the language,
1233 or the script used to write text (for example, @samp{sr_RS@@latin} uses the
1234 Latin script, whereas @samp{sr_RS} uses the Cyrillic script to write Serbian),
1235 or the orthography rules, or similar.
1237 On other systems, some variations of this scheme are used, such as
1238 @samp{@var{ll}}. You can get the list of locales supported by your system
1239 for your language by running the command @samp{locale -a | grep '^@var{ll}'}.
1241 There is also a special locale, called @samp{C}.
1242 @c Don't mention that this locale also has the name "POSIX". When we talk about
1243 @c the "POSIX locale", we mean the "locale as specified in the POSIX way", and
1244 @c mentioning a locale called "POSIX" would bring total confusion.
1245 When it is used, it disables all localization: in this locale, all programs
1246 standardized by POSIX use English messages and an unspecified character
1247 encoding (often US-ASCII, but sometimes also ISO-8859-1 or UTF-8, depending on
1248 the operating system).
1250 @node Locale Environment Variables, The LANGUAGE variable, Locale Names, Setting the POSIX Locale
1251 @subsection Locale Environment Variables
1252 @cindex setting up @code{gettext} at run time
1253 @cindex selecting message language
1254 @cindex language selection
1256 A locale is composed of several @emph{locale categories}, see @ref{Aspects}.
1257 When a program looks up locale dependent values, it does this according to
1258 the following environment variables, in priority order:
1261 @vindex LANGUAGE@r{, environment variable}
1262 @item @code{LANGUAGE}
1263 @vindex LC_ALL@r{, environment variable}
1265 @vindex LC_CTYPE@r{, environment variable}
1266 @vindex LC_NUMERIC@r{, environment variable}
1267 @vindex LC_TIME@r{, environment variable}
1268 @vindex LC_COLLATE@r{, environment variable}
1269 @vindex LC_MONETARY@r{, environment variable}
1270 @vindex LC_MESSAGES@r{, environment variable}
1271 @item @code{LC_xxx}, according to selected locale category:
1272 @code{LC_CTYPE}, @code{LC_NUMERIC}, @code{LC_TIME}, @code{LC_COLLATE},
1273 @code{LC_MONETARY}, @code{LC_MESSAGES}, ...
1274 @vindex LANG@r{, environment variable}
1278 Variables whose value is set but is empty are ignored in this lookup.
1280 @code{LANG} is the normal environment variable for specifying a locale.
1281 As a user, you normally set this variable (unless some of the other variables
1282 have already been set by the system, in @file{/etc/profile} or similar
1283 initialization files).
1285 @code{LC_CTYPE}, @code{LC_NUMERIC}, @code{LC_TIME}, @code{LC_COLLATE},
1286 @code{LC_MONETARY}, @code{LC_MESSAGES}, and so on, are the environment
1287 variables meant to override @code{LANG} and affecting a single locale
1288 category only. For example, assume you are a Swedish user in Spain, and you
1289 want your programs to handle numbers and dates according to Spanish
1290 conventions, and only the messages should be in Swedish. Then you could
1291 create a locale named @samp{sv_ES} or @samp{sv_ES.UTF-8} by use of the
1292 @code{localedef} program. But it is simpler, and achieves the same effect,
1293 to set the @code{LANG} variable to @code{es_ES.UTF-8} and the
1294 @code{LC_MESSAGES} variable to @code{sv_SE.UTF-8}; these two locales come
1295 already preinstalled with the operating system.
1297 @code{LC_ALL} is an environment variable that overrides all of these.
1298 It is typically used in scripts that run particular programs. For example,
1299 @code{configure} scripts generated by GNU autoconf use @code{LC_ALL} to make
1300 sure that the configuration tests don't operate in locale dependent ways.
1302 Some systems, unfortunately, set @code{LC_ALL} in @file{/etc/profile} or in
1303 similar initialization files. As a user, you therefore have to unset this
1304 variable if you want to set @code{LANG} and optionally some of the other
1305 @code{LC_xxx} variables.
1307 The @code{LANGUAGE} variable is described in the next subsection.
1309 @node The LANGUAGE variable, , Locale Environment Variables, Setting the POSIX Locale
1310 @subsection Specifying a Priority List of Languages
1312 Not all programs have translations for all languages. By default, an
1313 English message is shown in place of a nonexistent translation. If you
1314 understand other languages, you can set up a priority list of languages.
1315 This is done through a different environment variable, called
1316 @code{LANGUAGE}. GNU @code{gettext} gives preference to @code{LANGUAGE}
1317 over @code{LC_ALL} and @code{LANG} for the purpose of message handling,
1318 but you still need to have @code{LANG} (or @code{LC_ALL}) set to the primary
1319 language; this is required by other parts of the system libraries.
1320 For example, some Swedish users who would rather read translations in
1321 German than English for when Swedish is not available, set @code{LANGUAGE}
1322 to @samp{sv:de} while leaving @code{LANG} to @samp{sv_SE}.
1324 Special advice for Norwegian users: The language code for Norwegian
1325 bokm@ringaccent{a}l changed from @samp{no} to @samp{nb} recently (in 2003).
1326 During the transition period, while some message catalogs for this language
1327 are installed under @samp{nb} and some older ones under @samp{no}, it is
1328 recommended for Norwegian users to set @code{LANGUAGE} to @samp{nb:no} so that
1329 both newer and older translations are used.
1331 In the @code{LANGUAGE} environment variable, but not in the other
1332 environment variables, @samp{@var{ll}_@var{CC}} combinations can be
1333 abbreviated as @samp{@var{ll}} to denote the language's main dialect.
1334 For example, @samp{de} is equivalent to @samp{de_DE} (German as spoken in
1335 Germany), and @samp{pt} to @samp{pt_PT} (Portuguese as spoken in Portugal)
1338 Note: The variable @code{LANGUAGE} is ignored if the locale is set to
1339 @samp{C}. In other words, you have to first enable localization, by setting
1340 @code{LANG} (or @code{LC_ALL}) to a value other than @samp{C}, before you can
1341 use a language priority list through the @code{LANGUAGE} variable.
1343 @node Installing Localizations, , Setting the POSIX Locale, Users
1344 @section Installing Translations for Particular Programs
1345 @cindex Translation Matrix
1346 @cindex available translations
1348 Languages are not equally well supported in all packages using GNU
1349 @code{gettext}, and more translations are added over time. Usually, you
1350 use the translations that are shipped with the operating system
1351 or with particular packages that you install afterwards. But you can also
1352 install newer localizations directly. For doing this, you will need an
1353 understanding where each localization file is stored on the file system.
1355 @cindex @file{ABOUT-NLS} file
1356 For programs that participate in the Translation Project, you can start
1357 looking for translations here:
1358 @url{http://translationproject.org/team/index.html}.
1359 A snapshot of this information is also found in the @file{ABOUT-NLS} file
1360 that is shipped with GNU gettext.
1362 For programs that are part of the KDE project, the starting point is:
1363 @url{http://i18n.kde.org/}.
1365 For programs that are part of the GNOME project, the starting point is:
1366 @url{http://www.gnome.org/i18n/}.
1368 For other programs, you may check whether the program's source code package
1369 contains some @file{@var{ll}.po} files; often they are kept together in a
1370 directory called @file{po/}. Each @file{@var{ll}.po} file contains the
1371 message translations for the language whose abbreviation of @var{ll}.
1373 @node PO Files, Sources, Users, Top
1374 @chapter The Format of PO Files
1375 @cindex PO files' format
1376 @cindex file format, @file{.po}
1378 The GNU @code{gettext} toolset helps programmers and translators
1379 at producing, updating and using translation files, mainly those
1380 PO files which are textual, editable files. This chapter explains
1381 the format of PO files.
1383 A PO file is made up of many entries, each entry holding the relation
1384 between an original untranslated string and its corresponding
1385 translation. All entries in a given PO file usually pertain
1386 to a single project, and all translations are expressed in a single
1387 target language. One PO file @dfn{entry} has the following schematic
1392 # @var{translator-comments}
1393 #. @var{extracted-comments}
1394 #: @var{reference}@dots{}
1395 #, @var{flag}@dots{}
1396 #| msgid @var{previous-untranslated-string}
1397 msgid @var{untranslated-string}
1398 msgstr @var{translated-string}
1401 The general structure of a PO file should be well understood by
1402 the translator. When using PO mode, very little has to be known
1403 about the format details, as PO mode takes care of them for her.
1405 A simple entry can look like this:
1409 msgid "Unknown system error"
1410 msgstr "Error desconegut del sistema"
1413 @cindex comments, translator
1414 @cindex comments, automatic
1415 @cindex comments, extracted
1416 Entries begin with some optional white space. Usually, when generated
1417 through GNU @code{gettext} tools, there is exactly one blank line
1418 between entries. Then comments follow, on lines all starting with the
1419 character @code{#}. There are two kinds of comments: those which have
1420 some white space immediately following the @code{#} - the @var{translator
1421 comments} -, which comments are created and maintained exclusively by the
1422 translator, and those which have some non-white character just after the
1423 @code{#} - the @var{automatic comments} -, which comments are created and
1424 maintained automatically by GNU @code{gettext} tools. Comment lines
1425 starting with @code{#.} contain comments given by the programmer, directed
1426 at the translator; these comments are called @var{extracted comments}
1427 because the @code{xgettext} program extracts them from the program's
1428 source code. Comment lines starting with @code{#:} contain references to
1429 the program's source code. Comment lines starting with @code{#,} contain
1430 flags; more about these below. Comment lines starting with @code{#|}
1431 contain the previous untranslated string for which the translator gave
1434 All comments, of either kind, are optional.
1438 After white space and comments, entries show two strings, namely
1439 first the untranslated string as it appears in the original program
1440 sources, and then, the translation of this string. The original
1441 string is introduced by the keyword @code{msgid}, and the translation,
1442 by @code{msgstr}. The two strings, untranslated and translated,
1443 are quoted in various ways in the PO file, using @code{"}
1444 delimiters and @code{\} escapes, but the translator does not really
1445 have to pay attention to the precise quoting format, as PO mode fully
1446 takes care of quoting for her.
1448 The @code{msgid} strings, as well as automatic comments, are produced
1449 and managed by other GNU @code{gettext} tools, and PO mode does not
1450 provide means for the translator to alter these. The most she can
1451 do is merely deleting them, and only by deleting the whole entry.
1452 On the other hand, the @code{msgstr} string, as well as translator
1453 comments, are really meant for the translator, and PO mode gives her
1454 the full control she needs.
1456 The comment lines beginning with @code{#,} are special because they are
1457 not completely ignored by the programs as comments generally are. The
1458 comma separated list of @var{flag}s is used by the @code{msgfmt}
1459 program to give the user some better diagnostic messages. Currently
1460 there are two forms of flags defined:
1464 @kwindex fuzzy@r{ flag}
1465 This flag can be generated by the @code{msgmerge} program or it can be
1466 inserted by the translator herself. It shows that the @code{msgstr}
1467 string might not be a correct translation (anymore). Only the translator
1468 can judge if the translation requires further modification, or is
1469 acceptable as is. Once satisfied with the translation, she then removes
1470 this @code{fuzzy} attribute. The @code{msgmerge} program inserts this
1471 when it combined the @code{msgid} and @code{msgstr} entries after fuzzy
1472 search only. @xref{Fuzzy Entries}.
1475 @kwindex c-format@r{ flag}
1477 @kwindex no-c-format@r{ flag}
1478 These flags should not be added by a human. Instead only the
1479 @code{xgettext} program adds them. In an automated PO file processing
1480 system as proposed here, the user's changes would be thrown away again as
1481 soon as the @code{xgettext} program generates a new template file.
1483 The @code{c-format} flag indicates that the untranslated string and the
1484 translation are supposed to be C format strings. The @code{no-c-format}
1485 flag indicates that they are not C format strings, even though the untranslated
1486 string happens to look like a C format string (with @samp{%} directives).
1488 When the @code{c-format} flag is given for a string the @code{msgfmt}
1489 program does some more tests to check the validity of the translation.
1490 @xref{msgfmt Invocation}, @ref{c-format Flag} and @ref{c-format}.
1493 @kwindex objc-format@r{ flag}
1494 @itemx no-objc-format
1495 @kwindex no-objc-format@r{ flag}
1496 Likewise for Objective C, see @ref{objc-format}.
1499 @kwindex sh-format@r{ flag}
1501 @kwindex no-sh-format@r{ flag}
1502 Likewise for Shell, see @ref{sh-format}.
1505 @kwindex python-format@r{ flag}
1506 @itemx no-python-format
1507 @kwindex no-python-format@r{ flag}
1508 Likewise for Python, see @ref{python-format}.
1510 @item python-brace-format
1511 @kwindex python-brace-format@r{ flag}
1512 @itemx no-python-brace-format
1513 @kwindex no-python-brace-format@r{ flag}
1514 Likewise for Python brace, see @ref{python-format}.
1517 @kwindex lisp-format@r{ flag}
1518 @itemx no-lisp-format
1519 @kwindex no-lisp-format@r{ flag}
1520 Likewise for Lisp, see @ref{lisp-format}.
1523 @kwindex elisp-format@r{ flag}
1524 @itemx no-elisp-format
1525 @kwindex no-elisp-format@r{ flag}
1526 Likewise for Emacs Lisp, see @ref{elisp-format}.
1529 @kwindex librep-format@r{ flag}
1530 @itemx no-librep-format
1531 @kwindex no-librep-format@r{ flag}
1532 Likewise for librep, see @ref{librep-format}.
1535 @kwindex scheme-format@r{ flag}
1536 @itemx no-scheme-format
1537 @kwindex no-scheme-format@r{ flag}
1538 Likewise for Scheme, see @ref{scheme-format}.
1540 @item smalltalk-format
1541 @kwindex smalltalk-format@r{ flag}
1542 @itemx no-smalltalk-format
1543 @kwindex no-smalltalk-format@r{ flag}
1544 Likewise for Smalltalk, see @ref{smalltalk-format}.
1547 @kwindex java-format@r{ flag}
1548 @itemx no-java-format
1549 @kwindex no-java-format@r{ flag}
1550 Likewise for Java, see @ref{java-format}.
1553 @kwindex csharp-format@r{ flag}
1554 @itemx no-csharp-format
1555 @kwindex no-csharp-format@r{ flag}
1556 Likewise for C#, see @ref{csharp-format}.
1559 @kwindex awk-format@r{ flag}
1560 @itemx no-awk-format
1561 @kwindex no-awk-format@r{ flag}
1562 Likewise for awk, see @ref{awk-format}.
1564 @item object-pascal-format
1565 @kwindex object-pascal-format@r{ flag}
1566 @itemx no-object-pascal-format
1567 @kwindex no-object-pascal-format@r{ flag}
1568 Likewise for Object Pascal, see @ref{object-pascal-format}.
1571 @kwindex ycp-format@r{ flag}
1572 @itemx no-ycp-format
1573 @kwindex no-ycp-format@r{ flag}
1574 Likewise for YCP, see @ref{ycp-format}.
1577 @kwindex tcl-format@r{ flag}
1578 @itemx no-tcl-format
1579 @kwindex no-tcl-format@r{ flag}
1580 Likewise for Tcl, see @ref{tcl-format}.
1583 @kwindex perl-format@r{ flag}
1584 @itemx no-perl-format
1585 @kwindex no-perl-format@r{ flag}
1586 Likewise for Perl, see @ref{perl-format}.
1588 @item perl-brace-format
1589 @kwindex perl-brace-format@r{ flag}
1590 @itemx no-perl-brace-format
1591 @kwindex no-perl-brace-format@r{ flag}
1592 Likewise for Perl brace, see @ref{perl-format}.
1595 @kwindex php-format@r{ flag}
1596 @itemx no-php-format
1597 @kwindex no-php-format@r{ flag}
1598 Likewise for PHP, see @ref{php-format}.
1600 @item gcc-internal-format
1601 @kwindex gcc-internal-format@r{ flag}
1602 @itemx no-gcc-internal-format
1603 @kwindex no-gcc-internal-format@r{ flag}
1604 Likewise for the GCC sources, see @ref{gcc-internal-format}.
1606 @item gfc-internal-format
1607 @kwindex gfc-internal-format@r{ flag}
1608 @itemx no-gfc-internal-format
1609 @kwindex no-gfc-internal-format@r{ flag}
1610 Likewise for the GNU Fortran Compiler sources, see @ref{gfc-internal-format}.
1613 @kwindex qt-format@r{ flag}
1615 @kwindex no-qt-format@r{ flag}
1616 Likewise for Qt, see @ref{qt-format}.
1618 @item qt-plural-format
1619 @kwindex qt-plural-format@r{ flag}
1620 @itemx no-qt-plural-format
1621 @kwindex no-qt-plural-format@r{ flag}
1622 Likewise for Qt plural forms, see @ref{qt-plural-format}.
1625 @kwindex kde-format@r{ flag}
1626 @itemx no-kde-format
1627 @kwindex no-kde-format@r{ flag}
1628 Likewise for KDE, see @ref{kde-format}.
1631 @kwindex boost-format@r{ flag}
1632 @itemx no-boost-format
1633 @kwindex no-boost-format@r{ flag}
1634 Likewise for Boost, see @ref{boost-format}.
1637 @kwindex lua-format@r{ flag}
1638 @itemx no-lua-format
1639 @kwindex no-lua-format@r{ flag}
1640 Likewise for Lua, see @ref{lua-format}.
1642 @item javascript-format
1643 @kwindex javascript-format@r{ flag}
1644 @itemx no-javascript-format
1645 @kwindex no-javascript-format@r{ flag}
1646 Likewise for JavaScript, see @ref{javascript-format}.
1651 @cindex context, in PO files
1652 It is also possible to have entries with a context specifier. They look like
1657 # @var{translator-comments}
1658 #. @var{extracted-comments}
1659 #: @var{reference}@dots{}
1660 #, @var{flag}@dots{}
1661 #| msgctxt @var{previous-context}
1662 #| msgid @var{previous-untranslated-string}
1663 msgctxt @var{context}
1664 msgid @var{untranslated-string}
1665 msgstr @var{translated-string}
1668 The context serves to disambiguate messages with the same
1669 @var{untranslated-string}. It is possible to have several entries with
1670 the same @var{untranslated-string} in a PO file, provided that they each
1671 have a different @var{context}. Note that an empty @var{context} string
1672 and an absent @code{msgctxt} line do not mean the same thing.
1674 @kwindex msgid_plural
1675 @cindex plural forms, in PO files
1676 A different kind of entries is used for translations which involve
1681 # @var{translator-comments}
1682 #. @var{extracted-comments}
1683 #: @var{reference}@dots{}
1684 #, @var{flag}@dots{}
1685 #| msgid @var{previous-untranslated-string-singular}
1686 #| msgid_plural @var{previous-untranslated-string-plural}
1687 msgid @var{untranslated-string-singular}
1688 msgid_plural @var{untranslated-string-plural}
1689 msgstr[0] @var{translated-string-case-0}
1691 msgstr[N] @var{translated-string-case-n}
1694 Such an entry can look like this:
1697 #: src/msgcmp.c:338 src/po-lex.c:699
1699 msgid "found %d fatal error"
1700 msgid_plural "found %d fatal errors"
1701 msgstr[0] "s'ha trobat %d error fatal"
1702 msgstr[1] "s'han trobat %d errors fatals"
1705 Here also, a @code{msgctxt} context can be specified before @code{msgid},
1708 Here, additional kinds of flags can be used:
1712 @kwindex range:@r{ flag}
1713 This flag is followed by a range of non-negative numbers, using the syntax
1714 @code{range: @var{minimum-value}..@var{maximum-value}}. It designates the
1715 possible values that the numeric parameter of the message can take. In some
1716 languages, translators may produce slightly better translations if they know
1717 that the value can only take on values between 0 and 10, for example.
1720 The @var{previous-untranslated-string} is optionally inserted by the
1721 @code{msgmerge} program, at the same time when it marks a message fuzzy.
1722 It helps the translator to see which changes were done by the developers
1723 on the @var{untranslated-string}.
1725 It happens that some lines, usually whitespace or comments, follow the
1726 very last entry of a PO file. Such lines are not part of any entry,
1727 and will be dropped when the PO file is processed by the tools, or may
1728 disturb some PO file editors.
1730 The remainder of this section may be safely skipped by those using
1731 a PO file editor, yet it may be interesting for everybody to have a better
1732 idea of the precise format of a PO file. On the other hand, those
1733 wishing to modify PO files by hand should carefully continue reading on.
1735 An empty @var{untranslated-string} is reserved to contain the header
1736 entry with the meta information (@pxref{Header Entry}). This header
1737 entry should be the first entry of the file. The empty
1738 @var{untranslated-string} is reserved for this purpose and must
1739 not be used anywhere else.
1741 Each of @var{untranslated-string} and @var{translated-string} respects
1742 the C syntax for a character string, including the surrounding quotes
1743 and embedded backslashed escape sequences. When the time comes
1744 to write multi-line strings, one should not use escaped newlines.
1745 Instead, a closing quote should follow the last character on the
1746 line to be continued, and an opening quote should resume the string
1747 at the beginning of the following PO file line. For example:
1751 "Here is an example of how one might continue a very long string\n"
1752 "for the common case the string represents multi-line output.\n"
1756 In this example, the empty string is used on the first line, to
1757 allow better alignment of the @code{H} from the word @samp{Here}
1758 over the @code{f} from the word @samp{for}. In this example, the
1759 @code{msgid} keyword is followed by three strings, which are meant
1760 to be concatenated. Concatenating the empty string does not change
1761 the resulting overall string, but it is a way for us to comply with
1762 the necessity of @code{msgid} to be followed by a string on the same
1763 line, while keeping the multi-line presentation left-justified, as
1764 we find this to be a cleaner disposition. The empty string could have
1765 been omitted, but only if the string starting with @samp{Here} was
1766 promoted on the first line, right after @code{msgid}.@footnote{This
1767 limitation is not imposed by GNU @code{gettext}, but is for compatibility
1768 with the @code{msgfmt} implementation on Solaris.} It was not really necessary
1769 either to switch between the two last quoted strings immediately after
1770 the newline @samp{\n}, the switch could have occurred after @emph{any}
1771 other character, we just did it this way because it is neater.
1773 @cindex newlines in PO files
1774 One should carefully distinguish between end of lines marked as
1775 @samp{\n} @emph{inside} quotes, which are part of the represented
1776 string, and end of lines in the PO file itself, outside string quotes,
1777 which have no incidence on the represented string.
1779 @cindex comments in PO files
1780 Outside strings, white lines and comments may be used freely.
1781 Comments start at the beginning of a line with @samp{#} and extend
1782 until the end of the PO file line. Comments written by translators
1783 should have the initial @samp{#} immediately followed by some white
1784 space. If the @samp{#} is not immediately followed by white space,
1785 this comment is most likely generated and managed by specialized GNU
1786 tools, and might disappear or be replaced unexpectedly when the PO
1787 file is given to @code{msgmerge}.
1789 @node Sources, Template, PO Files, Top
1790 @chapter Preparing Program Sources
1791 @cindex preparing programs for translation
1793 @c FIXME: Rewrite (the whole chapter).
1795 For the programmer, changes to the C source code fall into three
1796 categories. First, you have to make the localization functions
1797 known to all modules needing message translation. Second, you should
1798 properly trigger the operation of GNU @code{gettext} when the program
1799 initializes, usually from the @code{main} function. Last, you should
1800 identify, adjust and mark all constant strings in your program
1801 needing translation.
1804 * Importing:: Importing the @code{gettext} declaration
1805 * Triggering:: Triggering @code{gettext} Operations
1806 * Preparing Strings:: Preparing Translatable Strings
1807 * Mark Keywords:: How Marks Appear in Sources
1808 * Marking:: Marking Translatable Strings
1809 * c-format Flag:: Telling something about the following string
1810 * Special cases:: Special Cases of Translatable Strings
1811 * Bug Report Address:: Letting Users Report Translation Bugs
1812 * Names:: Marking Proper Names for Translation
1813 * Libraries:: Preparing Library Sources
1816 @node Importing, Triggering, Sources, Sources
1817 @section Importing the @code{gettext} declaration
1819 Presuming that your set of programs, or package, has been adjusted
1820 so all needed GNU @code{gettext} files are available, and your
1821 @file{Makefile} files are adjusted (@pxref{Maintainers}), each C module
1822 having translated C strings should contain the line:
1824 @cindex include file @file{libintl.h}
1826 #include <libintl.h>
1829 Similarly, each C module containing @code{printf()}/@code{fprintf()}/...
1830 calls with a format string that could be a translated C string (even if
1831 the C string comes from a different C module) should contain the line:
1834 #include <libintl.h>
1837 @node Triggering, Preparing Strings, Importing, Sources
1838 @section Triggering @code{gettext} Operations
1840 @cindex initialization
1841 The initialization of locale data should be done with more or less
1842 the same code in every program, as demonstrated below:
1847 main (int argc, char *argv[])
1850 setlocale (LC_ALL, "");
1851 bindtextdomain (PACKAGE, LOCALEDIR);
1852 textdomain (PACKAGE);
1858 @var{PACKAGE} and @var{LOCALEDIR} should be provided either by
1859 @file{config.h} or by the Makefile. For now consult the @code{gettext}
1860 or @code{hello} sources for more information.
1862 @cindex locale category, LC_ALL
1863 @cindex locale category, LC_CTYPE
1864 The use of @code{LC_ALL} might not be appropriate for you.
1865 @code{LC_ALL} includes all locale categories and especially
1866 @code{LC_CTYPE}. This latter category is responsible for determining
1867 character classes with the @code{isalnum} etc. functions from
1868 @file{ctype.h} which could especially for programs, which process some
1869 kind of input language, be wrong. For example this would mean that a
1870 source code using the @,{c} (c-cedilla character) is runnable in
1871 France but not in the U.S.
1873 Some systems also have problems with parsing numbers using the
1874 @code{scanf} functions if an other but the @code{LC_ALL} locale category is
1875 used. The standards say that additional formats but the one known in the
1876 @code{"C"} locale might be recognized. But some systems seem to reject
1877 numbers in the @code{"C"} locale format. In some situation, it might
1878 also be a problem with the notation itself which makes it impossible to
1879 recognize whether the number is in the @code{"C"} locale or the local
1880 format. This can happen if thousands separator characters are used.
1881 Some locales define this character according to the national
1882 conventions to @code{'.'} which is the same character used in the
1883 @code{"C"} locale to denote the decimal point.
1885 So it is sometimes necessary to replace the @code{LC_ALL} line in the
1886 code above by a sequence of @code{setlocale} lines
1892 setlocale (LC_CTYPE, "");
1893 setlocale (LC_MESSAGES, "");
1899 @cindex locale category, LC_CTYPE
1900 @cindex locale category, LC_COLLATE
1901 @cindex locale category, LC_MONETARY
1902 @cindex locale category, LC_NUMERIC
1903 @cindex locale category, LC_TIME
1904 @cindex locale category, LC_MESSAGES
1905 @cindex locale category, LC_RESPONSES
1907 On all POSIX conformant systems the locale categories @code{LC_CTYPE},
1908 @code{LC_MESSAGES}, @code{LC_COLLATE}, @code{LC_MONETARY},
1909 @code{LC_NUMERIC}, and @code{LC_TIME} are available. On some systems
1910 which are only ISO C compliant, @code{LC_MESSAGES} is missing, but
1911 a substitute for it is defined in GNU gettext's @code{<libintl.h>} and
1912 in GNU gnulib's @code{<locale.h>}.
1914 Note that changing the @code{LC_CTYPE} also affects the functions
1915 declared in the @code{<ctype.h>} standard header and some functions
1916 declared in the @code{<string.h>} and @code{<stdlib.h>} standard headers.
1918 desirable in your application (for example in a compiler's parser),
1919 you can use a set of substitute functions which hardwire the C locale,
1920 such as found in the modules @samp{c-ctype}, @samp{c-strcase},
1921 @samp{c-strcasestr}, @samp{c-strtod}, @samp{c-strtold} in the GNU gnulib
1922 source distribution.
1924 It is also possible to switch the locale forth and back between the
1925 environment dependent locale and the C locale, but this approach is
1926 normally avoided because a @code{setlocale} call is expensive,
1927 because it is tedious to determine the places where a locale switch
1928 is needed in a large program's source, and because switching a locale
1929 is not multithread-safe.
1931 @node Preparing Strings, Mark Keywords, Triggering, Sources
1932 @section Preparing Translatable Strings
1934 @cindex marking strings, preparations
1935 Before strings can be marked for translations, they sometimes need to
1936 be adjusted. Usually preparing a string for translation is done right
1937 before marking it, during the marking phase which is described in the
1938 next sections. What you have to keep in mind while doing that is the
1943 Decent English style.
1949 Split at paragraphs.
1952 Use format strings instead of string concatenation.
1955 Avoid unusual markup and unusual control characters.
1959 Let's look at some examples of these guidelines.
1962 Translatable strings should be in good English style. If slang language
1963 with abbreviations and shortcuts is used, often translators will not
1964 understand the message and will produce very inappropriate translations.
1967 "%s: is parameter\n"
1971 This is nearly untranslatable: Is the displayed item @emph{a} parameter or
1972 @emph{the} parameter?
1979 The ambiguity in this message makes it unintelligible: Is the program
1980 attempting to set something on fire? Does it mean "The given object does
1981 not match the template"? Does it mean "The template does not fit for any
1985 In both cases, adding more words to the message will help both the
1986 translator and the English speaking user.
1989 Translatable strings should be entire sentences. It is often not possible
1990 to translate single verbs or adjectives in a substitutable way.
1993 printf ("File %s is %s protected", filename, rw ? "write" : "read");
1997 Most translators will not look at the source and will thus only see the
1998 string @code{"File %s is %s protected"}, which is unintelligible. Change
2002 printf (rw ? "File %s is write protected" : "File %s is read protected",
2007 This way the translator will not only understand the message, she will
2008 also be able to find the appropriate grammatical construction. A French
2009 translator for example translates "write protected" like "protected
2012 Entire sentences are also important because in many languages, the
2013 declination of some word in a sentence depends on the gender or the
2014 number (singular/plural) of another part of the sentence. There are
2015 usually more interdependencies between words than in English. The
2016 consequence is that asking a translator to translate two half-sentences
2017 and then combining these two half-sentences through dumb string concatenation
2018 will not work, for many languages, even though it would work for English.
2019 That's why translators need to handle entire sentences.
2021 Often sentences don't fit into a single line. If a sentence is output
2022 using two subsequent @code{printf} statements, like this
2025 printf ("Locale charset \"%s\" is different from\n", lcharset);
2026 printf ("input file charset \"%s\".\n", fcharset);
2030 the translator would have to translate two half sentences, but nothing
2031 in the POT file would tell her that the two half sentences belong together.
2032 It is necessary to merge the two @code{printf} statements so that the
2033 translator can handle the entire sentence at once and decide at which
2034 place to insert a line break in the translation (if at all):
2037 printf ("Locale charset \"%s\" is different from\n\
2038 input file charset \"%s\".\n", lcharset, fcharset);
2041 You may now ask: how about two or more adjacent sentences? Like in this case:
2044 puts ("Apollo 13 scenario: Stack overflow handling failed.");
2045 puts ("On the next stack overflow we will crash!!!");
2049 Should these two statements merged into a single one? I would recommend to
2050 merge them if the two sentences are related to each other, because then it
2051 makes it easier for the translator to understand and translate both. On
2052 the other hand, if one of the two messages is a stereotypic one, occurring
2053 in other places as well, you will do a favour to the translator by not
2054 merging the two. (Identical messages occurring in several places are
2055 combined by xgettext, so the translator has to handle them once only.)
2058 Translatable strings should be limited to one paragraph; don't let a
2059 single message be longer than ten lines. The reason is that when the
2060 translatable string changes, the translator is faced with the task of
2061 updating the entire translated string. Maybe only a single word will
2062 have changed in the English string, but the translator doesn't see that
2063 (with the current translation tools), therefore she has to proofread
2067 Many GNU programs have a @samp{--help} output that extends over several
2068 screen pages. It is a courtesy towards the translators to split such a
2069 message into several ones of five to ten lines each. While doing that,
2070 you can also attempt to split the documented options into groups,
2071 such as the input options, the output options, and the informative
2072 output options. This will help every user to find the option he is
2075 @cindex string concatenation
2076 @cindex concatenation of strings
2077 Hardcoded string concatenation is sometimes used to construct English
2081 strcpy (s, "Replace ");
2082 strcat (s, object1);
2083 strcat (s, " with ");
2084 strcat (s, object2);
2089 In order to present to the translator only entire sentences, and also
2090 because in some languages the translator might want to swap the order
2091 of @code{object1} and @code{object2}, it is necessary to change this
2092 to use a format string:
2095 sprintf (s, "Replace %s with %s?", object1, object2);
2098 @cindex @code{inttypes.h}
2099 A similar case is compile time concatenation of strings. The ISO C 99
2100 include file @code{<inttypes.h>} contains a macro @code{PRId64} that
2101 can be used as a formatting directive for outputting an @samp{int64_t}
2102 integer through @code{printf}. It expands to a constant string, usually
2103 "d" or "ld" or "lld" or something like this, depending on the platform.
2104 Assume you have code like
2107 printf ("The amount is %0" PRId64 "\n", number);
2111 The @code{gettext} tools and library have special support for these
2112 @code{<inttypes.h>} macros. You can therefore simply write
2115 printf (gettext ("The amount is %0" PRId64 "\n"), number);
2119 The PO file will contain the string "The amount is %0<PRId64>\n".
2120 The translators will provide a translation containing "%0<PRId64>"
2121 as well, and at runtime the @code{gettext} function's result will
2122 contain the appropriate constant string, "d" or "ld" or "lld".
2124 This works only for the predefined @code{<inttypes.h>} macros. If
2125 you have defined your own similar macros, let's say @samp{MYPRId64},
2126 that are not known to @code{xgettext}, the solution for this problem
2127 is to change the code like this:
2131 sprintf (buf1, "%0" MYPRId64, number);
2132 printf (gettext ("The amount is %s\n"), buf1);
2135 This means, you put the platform dependent code in one statement, and the
2136 internationalization code in a different statement. Note that a buffer length
2137 of 100 is safe, because all available hardware integer types are limited to
2138 128 bits, and to print a 128 bit integer one needs at most 54 characters,
2139 regardless whether in decimal, octal or hexadecimal.
2141 @cindex Java, string concatenation
2142 @cindex C#, string concatenation
2143 All this applies to other programming languages as well. For example, in
2144 Java and C#, string concatenation is very frequently used, because it is a
2145 compiler built-in operator. Like in C, in Java, you would change
2148 System.out.println("Replace "+object1+" with "+object2+"?");
2152 into a statement involving a format string:
2156 MessageFormat.format("Replace @{0@} with @{1@}?",
2157 new Object[] @{ object1, object2 @}));
2161 Similarly, in C#, you would change
2164 Console.WriteLine("Replace "+object1+" with "+object2+"?");
2168 into a statement involving a format string:
2172 String.Format("Replace @{0@} with @{1@}?", object1, object2));
2176 @cindex control characters
2177 Unusual markup or control characters should not be used in translatable
2178 strings. Translators will likely not understand the particular meaning
2179 of the markup or control characters.
2181 For example, if you have a convention that @samp{|} delimits the
2182 left-hand and right-hand part of some GUI elements, translators will
2183 often not understand it without specific comments. It might be
2184 better to have the translator translate the left-hand and right-hand
2187 Another example is the @samp{argp} convention to use a single @samp{\v}
2188 (vertical tab) control character to delimit two sections inside a
2189 string. This is flawed. Some translators may convert it to a simple
2190 newline, some to blank lines. With some PO file editors it may not be
2191 easy to even enter a vertical tab control character. So, you cannot
2192 be sure that the translation will contain a @samp{\v} character, at the
2193 corresponding position. The solution is, again, to let the translator
2194 translate two separate strings and combine at run-time the two translated
2195 strings with the @samp{\v} required by the convention.
2197 HTML markup, however, is common enough that it's probably ok to use in
2198 translatable strings. But please bear in mind that the GNU gettext tools
2199 don't verify that the translations are well-formed HTML.
2201 @node Mark Keywords, Marking, Preparing Strings, Sources
2202 @section How Marks Appear in Sources
2203 @cindex marking strings that require translation
2205 All strings requiring translation should be marked in the C sources. Marking
2206 is done in such a way that each translatable string appears to be
2207 the sole argument of some function or preprocessor macro. There are
2208 only a few such possible functions or macros meant for translation,
2209 and their names are said to be marking keywords. The marking is
2210 attached to strings themselves, rather than to what we do with them.
2211 This approach has more uses. A blatant example is an error message
2212 produced by formatting. The format string needs translation, as
2213 well as some strings inserted through some @samp{%s} specification
2214 in the format, while the result from @code{sprintf} may have so many
2215 different instances that it is impractical to list them all in some
2216 @samp{error_string_out()} routine, say.
2218 This marking operation has two goals. The first goal of marking
2219 is for triggering the retrieval of the translation, at run time.
2220 The keyword is possibly resolved into a routine able to dynamically
2221 return the proper translation, as far as possible or wanted, for the
2222 argument string. Most localizable strings are found in executable
2223 positions, that is, attached to variables or given as parameters to
2224 functions. But this is not universal usage, and some translatable
2225 strings appear in structured initializations. @xref{Special cases}.
2227 The second goal of the marking operation is to help @code{xgettext}
2228 at properly extracting all translatable strings when it scans a set
2229 of program sources and produces PO file templates.
2231 The canonical keyword for marking translatable strings is
2232 @samp{gettext}, it gave its name to the whole GNU @code{gettext}
2233 package. For packages making only light use of the @samp{gettext}
2234 keyword, macro or function, it is easily used @emph{as is}. However,
2235 for packages using the @code{gettext} interface more heavily, it
2236 is usually more convenient to give the main keyword a shorter, less
2237 obtrusive name. Indeed, the keyword might appear on a lot of strings
2238 all over the package, and programmers usually do not want nor need
2239 their program sources to remind them forcefully, all the time, that they
2240 are internationalized. Further, a long keyword has the disadvantage
2241 of using more horizontal space, forcing more indentation work on
2242 sources for those trying to keep them within 79 or 80 columns.
2244 @cindex @code{_}, a macro to mark strings for translation
2245 Many packages use @samp{_} (a simple underline) as a keyword,
2246 and write @samp{_("Translatable string")} instead of @samp{gettext
2247 ("Translatable string")}. Further, the coding rule, from GNU standards,
2248 wanting that there is a space between the keyword and the opening
2249 parenthesis is relaxed, in practice, for this particular usage.
2250 So, the textual overhead per translatable string is reduced to
2251 only three characters: the underline and the two parentheses.
2252 However, even if GNU @code{gettext} uses this convention internally,
2253 it does not offer it officially. The real, genuine keyword is truly
2254 @samp{gettext} indeed. It is fairly easy for those wanting to use
2255 @samp{_} instead of @samp{gettext} to declare:
2258 #include <libintl.h>
2259 #define _(String) gettext (String)
2263 instead of merely using @samp{#include <libintl.h>}.
2265 The marking keywords @samp{gettext} and @samp{_} take the translatable
2266 string as sole argument. It is also possible to define marking functions
2267 that take it at another argument position. It is even possible to make
2268 the marked argument position depend on the total number of arguments of
2269 the function call; this is useful in C++. All this is achieved using
2270 @code{xgettext}'s @samp{--keyword} option. How to pass such an option
2271 to @code{xgettext}, assuming that @code{gettextize} is used, is described
2272 in @ref{po/Makevars} and @ref{AM_XGETTEXT_OPTION}.
2274 Note also that long strings can be split across lines, into multiple
2275 adjacent string tokens. Automatic string concatenation is performed
2276 at compile time according to ISO C and ISO C++; @code{xgettext} also
2277 supports this syntax.
2279 Later on, the maintenance is relatively easy. If, as a programmer,
2280 you add or modify a string, you will have to ask yourself if the
2281 new or altered string requires translation, and include it within
2282 @samp{_()} if you think it should be translated. For example, @samp{"%s"}
2283 is an example of string @emph{not} requiring translation. But
2284 @samp{"%s: %d"} @emph{does} require translation, because in French, unlike
2285 in English, it's customary to put a space before a colon.
2287 @node Marking, c-format Flag, Mark Keywords, Sources
2288 @section Marking Translatable Strings
2289 @emindex marking strings for translation
2291 In PO mode, one set of features is meant more for the programmer than
2292 for the translator, and allows him to interactively mark which strings,
2293 in a set of program sources, are translatable, and which are not.
2294 Even if it is a fairly easy job for a programmer to find and mark
2295 such strings by other means, using any editor of his choice, PO mode
2296 makes this work more comfortable. Further, this gives translators
2297 who feel a little like programmers, or programmers who feel a little
2298 like translators, a tool letting them work at marking translatable
2299 strings in the program sources, while simultaneously producing a set of
2300 translation in some language, for the package being internationalized.
2302 @emindex @code{etags}, using for marking strings
2303 The set of program sources, targeted by the PO mode commands describe
2304 here, should have an Emacs tags table constructed for your project,
2305 prior to using these PO file commands. This is easy to do. In any
2306 shell window, change the directory to the root of your project, then
2307 execute a command resembling:
2310 etags src/*.[hc] lib/*.[hc]
2314 presuming here you want to process all @file{.h} and @file{.c} files
2315 from the @file{src/} and @file{lib/} directories. This command will
2316 explore all said files and create a @file{TAGS} file in your root
2317 directory, somewhat summarizing the contents using a special file
2318 format Emacs can understand.
2320 @emindex @file{TAGS}, and marking translatable strings
2321 For packages following the GNU coding standards, there is
2322 a make goal @code{tags} or @code{TAGS} which constructs the tag files in
2323 all directories and for all files containing source code.
2325 Once your @file{TAGS} file is ready, the following commands assist
2326 the programmer at marking translatable strings in his set of sources.
2327 But these commands are necessarily driven from within a PO file
2328 window, and it is likely that you do not even have such a PO file yet.
2329 This is not a problem at all, as you may safely open a new, empty PO
2330 file, mainly for using these commands. This empty PO file will slowly
2331 fill in while you mark strings as translatable in your program sources.
2335 @efindex ,@r{, PO Mode command}
2336 Search through program sources for a string which looks like a
2337 candidate for translation (@code{po-tags-search}).
2340 @efindex M-,@r{, PO Mode command}
2341 Mark the last string found with @samp{_()} (@code{po-mark-translatable}).
2344 @efindex M-.@r{, PO Mode command}
2345 Mark the last string found with a keyword taken from a set of possible
2346 keywords. This command with a prefix allows some management of these
2347 keywords (@code{po-select-mark-and-mark}).
2351 @efindex po-tags-search@r{, PO Mode command}
2352 The @kbd{,} (@code{po-tags-search}) command searches for the next
2353 occurrence of a string which looks like a possible candidate for
2354 translation, and displays the program source in another Emacs window,
2355 positioned in such a way that the string is near the top of this other
2356 window. If the string is too big to fit whole in this window, it is
2357 positioned so only its end is shown. In any case, the cursor
2358 is left in the PO file window. If the shown string would be better
2359 presented differently in different native languages, you may mark it
2360 using @kbd{M-,} or @kbd{M-.}. Otherwise, you might rather ignore it
2361 and skip to the next string by merely repeating the @kbd{,} command.
2363 A string is a good candidate for translation if it contains a sequence
2364 of three or more letters. A string containing at most two letters in
2365 a row will be considered as a candidate if it has more letters than
2366 non-letters. The command disregards strings containing no letters,
2367 or isolated letters only. It also disregards strings within comments,
2368 or strings already marked with some keyword PO mode knows (see below).
2370 If you have never told Emacs about some @file{TAGS} file to use, the
2371 command will request that you specify one from the minibuffer, the
2372 first time you use the command. You may later change your @file{TAGS}
2373 file by using the regular Emacs command @w{@kbd{M-x visit-tags-table}},
2374 which will ask you to name the precise @file{TAGS} file you want
2375 to use. @xref{Tags, , Tag Tables, emacs, The Emacs Editor}.
2377 Each time you use the @kbd{,} command, the search resumes from where it was
2378 left by the previous search, and goes through all program sources,
2379 obeying the @file{TAGS} file, until all sources have been processed.
2380 However, by giving a prefix argument to the command @w{(@kbd{C-u
2381 ,})}, you may request that the search be restarted all over again
2382 from the first program source; but in this case, strings that you
2383 recently marked as translatable will be automatically skipped.
2385 Using this @kbd{,} command does not prevent using of other regular
2386 Emacs tags commands. For example, regular @code{tags-search} or
2387 @code{tags-query-replace} commands may be used without disrupting the
2388 independent @kbd{,} search sequence. However, as implemented, the
2389 @emph{initial} @kbd{,} command (or the @kbd{,} command is used with a
2390 prefix) might also reinitialize the regular Emacs tags searching to the
2391 first tags file, this reinitialization might be considered spurious.
2393 @efindex po-mark-translatable@r{, PO Mode command}
2394 @efindex po-select-mark-and-mark@r{, PO Mode command}
2395 The @kbd{M-,} (@code{po-mark-translatable}) command will mark the
2396 recently found string with the @samp{_} keyword. The @kbd{M-.}
2397 (@code{po-select-mark-and-mark}) command will request that you type
2398 one keyword from the minibuffer and use that keyword for marking
2399 the string. Both commands will automatically create a new PO file
2400 untranslated entry for the string being marked, and make it the
2401 current entry (making it easy for you to immediately proceed to its
2402 translation, if you feel like doing it right away). It is possible
2403 that the modifications made to the program source by @kbd{M-,} or
2404 @kbd{M-.} render some source line longer than 80 columns, forcing you
2405 to break and re-indent this line differently. You may use the @kbd{O}
2406 command from PO mode, or any other window changing command from
2407 Emacs, to break out into the program source window, and do any
2408 needed adjustments. You will have to use some regular Emacs command
2409 to return the cursor to the PO file window, if you want command
2410 @kbd{,} for the next string, say.
2412 The @kbd{M-.} command has a few built-in speedups, so you do not
2413 have to explicitly type all keywords all the time. The first such
2414 speedup is that you are presented with a @emph{preferred} keyword,
2415 which you may accept by merely typing @kbd{@key{RET}} at the prompt.
2416 The second speedup is that you may type any non-ambiguous prefix of the
2417 keyword you really mean, and the command will complete it automatically
2418 for you. This also means that PO mode has to @emph{know} all
2419 your possible keywords, and that it will not accept mistyped keywords.
2421 If you reply @kbd{?} to the keyword request, the command gives a
2422 list of all known keywords, from which you may choose. When the
2423 command is prefixed by an argument @w{(@kbd{C-u M-.})}, it inhibits
2424 updating any program source or PO file buffer, and does some simple
2425 keyword management instead. In this case, the command asks for a
2426 keyword, written in full, which becomes a new allowed keyword for
2427 later @kbd{M-.} commands. Moreover, this new keyword automatically
2428 becomes the @emph{preferred} keyword for later commands. By typing
2429 an already known keyword in response to @w{@kbd{C-u M-.}}, one merely
2430 changes the @emph{preferred} keyword and does nothing more.
2432 All keywords known for @kbd{M-.} are recognized by the @kbd{,} command
2433 when scanning for strings, and strings already marked by any of those
2434 known keywords are automatically skipped. If many PO files are opened
2435 simultaneously, each one has its own independent set of known keywords.
2436 There is no provision in PO mode, currently, for deleting a known
2437 keyword, you have to quit the file (maybe using @kbd{q}) and reopen
2438 it afresh. When a PO file is newly brought up in an Emacs window, only
2439 @samp{gettext} and @samp{_} are known as keywords, and @samp{gettext}
2440 is preferred for the @kbd{M-.} command. In fact, this is not useful to
2441 prefer @samp{_}, as this one is already built in the @kbd{M-,} command.
2443 @node c-format Flag, Special cases, Marking, Sources
2444 @section Special Comments preceding Keywords
2446 @c FIXME document c-format and no-c-format.
2448 @cindex format strings
2449 In C programs strings are often used within calls of functions from the
2450 @code{printf} family. The special thing about these format strings is
2451 that they can contain format specifiers introduced with @kbd{%}. Assume
2455 printf (gettext ("String `%s' has %d characters\n"), s, strlen (s));
2459 A possible German translation for the above string might be:
2462 "%d Zeichen lang ist die Zeichenkette `%s'"
2465 A C programmer, even if he cannot speak German, will recognize that
2466 there is something wrong here. The order of the two format specifiers
2467 is changed but of course the arguments in the @code{printf} don't have.
2468 This will most probably lead to problems because now the length of the
2469 string is regarded as the address.
2471 To prevent errors at runtime caused by translations the @code{msgfmt}
2472 tool can check statically whether the arguments in the original and the
2473 translation string match in type and number. If this is not the case
2474 and the @samp{-c} option has been passed to @code{msgfmt}, @code{msgfmt}
2475 will give an error and refuse to produce a MO file. Thus consequent
2476 use of @samp{msgfmt -c} will catch the error, so that it cannot cause
2477 cause problems at runtime.
2480 If the word order in the above German translation would be correct one
2484 "%2$d Zeichen lang ist die Zeichenkette `%1$s'"
2488 The routines in @code{msgfmt} know about this special notation.
2490 Because not all strings in a program must be format strings it is not
2491 useful for @code{msgfmt} to test all the strings in the @file{.po} file.
2492 This might cause problems because the string might contain what looks
2493 like a format specifier, but the string is not used in @code{printf}.
2495 Therefore the @code{xgettext} adds a special tag to those messages it
2496 thinks might be a format string. There is no absolute rule for this,
2497 only a heuristic. In the @file{.po} file the entry is marked using the
2498 @code{c-format} flag in the @code{#,} comment line (@pxref{PO Files}).
2500 @kwindex c-format@r{, and @code{xgettext}}
2501 @kwindex no-c-format@r{, and @code{xgettext}}
2502 The careful reader now might say that this again can cause problems.
2503 The heuristic might guess it wrong. This is true and therefore
2504 @code{xgettext} knows about a special kind of comment which lets
2505 the programmer take over the decision. If in the same line as or
2506 the immediately preceding line to the @code{gettext} keyword
2507 the @code{xgettext} program finds a comment containing the words
2508 @code{xgettext:c-format}, it will mark the string in any case with
2509 the @code{c-format} flag. This kind of comment should be used when
2510 @code{xgettext} does not recognize the string as a format string but
2511 it really is one and it should be tested. Please note that when the
2512 comment is in the same line as the @code{gettext} keyword, it must be
2513 before the string to be translated.
2515 This situation happens quite often. The @code{printf} function is often
2516 called with strings which do not contain a format specifier. Of course
2517 one would normally use @code{fputs} but it does happen. In this case
2518 @code{xgettext} does not recognize this as a format string but what
2519 happens if the translation introduces a valid format specifier? The
2520 @code{printf} function will try to access one of the parameters but none
2521 exists because the original code does not pass any parameters.
2523 @code{xgettext} of course could make a wrong decision the other way
2524 round, i.e.@: a string marked as a format string actually is not a format
2525 string. In this case the @code{msgfmt} might give too many warnings and
2526 would prevent translating the @file{.po} file. The method to prevent
2527 this wrong decision is similar to the one used above, only the comment
2528 to use must contain the string @code{xgettext:no-c-format}.
2530 If a string is marked with @code{c-format} and this is not correct the
2531 user can find out who is responsible for the decision. See
2532 @ref{xgettext Invocation} to see how the @code{--debug} option can be
2533 used for solving this problem.
2535 @node Special cases, Bug Report Address, c-format Flag, Sources
2536 @section Special Cases of Translatable Strings
2538 @cindex marking string initializers
2539 The attentive reader might now point out that it is not always possible
2540 to mark translatable string with @code{gettext} or something like this.
2541 Consider the following case:
2546 static const char *messages[] = @{
2547 "some very meaningful message",
2553 = index > 1 ? "a default message" : messages[index];
2561 While it is no problem to mark the string @code{"a default message"} it
2562 is not possible to mark the string initializers for @code{messages}.
2563 What is to be done? We have to fulfill two tasks. First we have to mark the
2564 strings so that the @code{xgettext} program (@pxref{xgettext Invocation})
2565 can find them, and second we have to translate the string at runtime
2566 before printing them.
2568 The first task can be fulfilled by creating a new keyword, which names a
2569 no-op. For the second we have to mark all access points to a string
2570 from the array. So one solution can look like this:
2574 #define gettext_noop(String) String
2577 static const char *messages[] = @{
2578 gettext_noop ("some very meaningful message"),
2579 gettext_noop ("and another one")
2584 = index > 1 ? gettext ("a default message") : gettext (messages[index]);
2592 Please convince yourself that the string which is written by
2593 @code{fputs} is translated in any case. How to get @code{xgettext} know
2594 the additional keyword @code{gettext_noop} is explained in @ref{xgettext
2597 The above is of course not the only solution. You could also come along
2598 with the following one:
2602 #define gettext_noop(String) String
2605 static const char *messages[] = @{
2606 gettext_noop ("some very meaningful message",
2607 gettext_noop ("and another one")
2612 = index > 1 ? gettext_noop ("a default message") : messages[index];
2614 fputs (gettext (string));
2620 But this has a drawback. The programmer has to take care that
2621 he uses @code{gettext_noop} for the string @code{"a default message"}.
2622 A use of @code{gettext} could have in rare cases unpredictable results.
2624 One advantage is that you need not make control flow analysis to make
2625 sure the output is really translated in any case. But this analysis is
2626 generally not very difficult. If it should be in any situation you can
2627 use this second method in this situation.
2629 @node Bug Report Address, Names, Special cases, Sources
2630 @section Letting Users Report Translation Bugs
2632 Code sometimes has bugs, but translations sometimes have bugs too. The
2633 users need to be able to report them. Reporting translation bugs to the
2634 programmer or maintainer of a package is not very useful, since the
2635 maintainer must never change a translation, except on behalf of the
2636 translator. Hence the translation bugs must be reported to the
2639 Here is a way to organize this so that the maintainer does not need to
2640 forward translation bug reports, nor even keep a list of the addresses of
2641 the translators or their translation teams.
2643 Every program has a place where is shows the bug report address. For
2644 GNU programs, it is the code which handles the ``--help'' option,
2645 typically in a function called ``usage''. In this place, instruct the
2646 translator to add her own bug reporting address. For example, if that
2647 code has a statement
2651 printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT);
2655 you can add some translator instructions like this:
2659 /* TRANSLATORS: The placeholder indicates the bug-reporting address
2660 for this package. Please add _another line_ saying
2661 "Report translation bugs to <...>\n" with the address for translation
2662 bugs (typically your translation team's web or email address). */
2663 printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT);
2667 These will be extracted by @samp{xgettext}, leading to a .pot file that
2672 #. TRANSLATORS: The placeholder indicates the bug-reporting address
2673 #. for this package. Please add _another line_ saying
2674 #. "Report translation bugs to <...>\n" with the address for translation
2675 #. bugs (typically your translation team's web or email address).
2678 msgid "Report bugs to <%s>.\n"
2683 @node Names, Libraries, Bug Report Address, Sources
2684 @section Marking Proper Names for Translation
2686 Should names of persons, cities, locations etc. be marked for translation
2687 or not? People who only know languages that can be written with Latin
2688 letters (English, Spanish, French, German, etc.) are tempted to say ``no'',
2689 because names usually do not change when transported between these languages.
2690 However, in general when translating from one script to another, names
2691 are translated too, usually phonetically or by transliteration. For
2692 example, Russian or Greek names are converted to the Latin alphabet when
2693 being translated to English, and English or French names are converted
2694 to the Katakana script when being translated to Japanese. This is
2695 necessary because the speakers of the target language in general cannot
2696 read the script the name is originally written in.
2698 As a programmer, you should therefore make sure that names are marked
2699 for translation, with a special comment telling the translators that it
2700 is a proper name and how to pronounce it. In its simple form, it looks
2705 printf (_("Written by %s.\n"),
2706 /* TRANSLATORS: This is a proper name. See the gettext
2707 manual, section Names. Note this is actually a non-ASCII
2708 name: The first name is (with Unicode escapes)
2709 "Fran\u00e7ois" or (with HTML entities) "François".
2710 Pronunciation is like "fraa-swa pee-nar". */
2711 _("Francois Pinard"));
2716 The GNU gnulib library offers a module @samp{propername}
2717 (@url{http://www.gnu.org/software/gnulib/MODULES.html#module=propername})
2718 which takes care to automatically append the original name, in parentheses,
2719 to the translated name. For names that cannot be written in ASCII, it
2720 also frees the translator from the task of entering the appropriate non-ASCII
2721 characters if no script change is needed. In this more comfortable form,
2726 printf (_("Written by %s and %s.\n"),
2727 proper_name ("Ulrich Drepper"),
2728 /* TRANSLATORS: This is a proper name. See the gettext
2729 manual, section Names. Note this is actually a non-ASCII
2730 name: The first name is (with Unicode escapes)
2731 "Fran\u00e7ois" or (with HTML entities) "François".
2732 Pronunciation is like "fraa-swa pee-nar". */
2733 proper_name_utf8 ("Francois Pinard", "Fran\303\247ois Pinard"));
2738 You can also write the original name directly in Unicode (rather than with
2739 Unicode escapes or HTML entities) and denote the pronunciation using the
2740 International Phonetic Alphabet (see
2741 @url{http://www.wikipedia.org/wiki/International_Phonetic_Alphabet}).
2743 As a translator, you should use some care when translating names, because
2744 it is frustrating if people see their names mutilated or distorted.
2746 If your language uses the Latin script, all you need to do is to reproduce
2747 the name as perfectly as you can within the usual character set of your
2748 language. In this particular case, this means to provide a translation
2749 containing the c-cedilla character. If your language uses a different
2750 script and the people speaking it don't usually read Latin words, it means
2751 transliteration. If the programmer used the simple case, you should still
2752 give, in parentheses, the original writing of the name -- for the sake of
2753 the people that do read the Latin script. If the programmer used the
2754 @samp{propername} module mentioned above, you don't need to give the original
2755 writing of the name in parentheses, because the program will already do so.
2756 Here is an example, using Greek as the target script:
2760 #. This is a proper name. See the gettext
2761 #. manual, section Names. Note this is actually a non-ASCII
2762 #. name: The first name is (with Unicode escapes)
2763 #. "Fran\u00e7ois" or (with HTML entities) "François".
2764 #. Pronunciation is like "fraa-swa pee-nar".
2765 msgid "Francois Pinard"
2766 msgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho"
2767 " (Francois Pinard)"
2771 Because translation of names is such a sensitive domain, it is a good
2772 idea to test your translation before submitting it.
2774 @node Libraries, , Names, Sources
2775 @section Preparing Library Sources
2777 When you are preparing a library, not a program, for the use of
2778 @code{gettext}, only a few details are different. Here we assume that
2779 the library has a translation domain and a POT file of its own. (If
2780 it uses the translation domain and POT file of the main program, then
2781 the previous sections apply without changes.)
2785 The library code doesn't call @code{setlocale (LC_ALL, "")}. It's the
2786 responsibility of the main program to set the locale. The library's
2787 documentation should mention this fact, so that developers of programs
2788 using the library are aware of it.
2791 The library code doesn't call @code{textdomain (PACKAGE)}, because it
2792 would interfere with the text domain set by the main program.
2795 The initialization code for a program was
2798 setlocale (LC_ALL, "");
2799 bindtextdomain (PACKAGE, LOCALEDIR);
2800 textdomain (PACKAGE);
2804 For a library it is reduced to
2807 bindtextdomain (PACKAGE, LOCALEDIR);
2811 If your library's API doesn't already have an initialization function,
2812 you need to create one, containing at least the @code{bindtextdomain}
2813 invocation. However, you usually don't need to export and document this
2814 initialization function: It is sufficient that all entry points of the
2815 library call the initialization function if it hasn't been called before.
2816 The typical idiom used to achieve this is a static boolean variable that
2817 indicates whether the initialization function has been called. Like this:
2821 static bool libfoo_initialized;
2824 libfoo_initialize (void)
2826 bindtextdomain (PACKAGE, LOCALEDIR);
2827 libfoo_initialized = true;
2830 /* This function is part of the exported API. */
2834 /* Must ensure the initialization is performed. */
2835 if (!libfoo_initialized)
2836 libfoo_initialize ();
2840 /* This function is part of the exported API. The argument must be
2841 non-NULL and have been created through create_foo(). */
2843 foo_refcount (struct foo *argument)
2845 /* No need to invoke the initialization function here, because
2846 create_foo() must already have been called before. */
2853 The usual declaration of the @samp{_} macro in each source file was
2856 #include <libintl.h>
2857 #define _(String) gettext (String)
2861 for a program. For a library, which has its own translation domain,
2865 #include <libintl.h>
2866 #define _(String) dgettext (PACKAGE, String)
2869 In other words, @code{dgettext} is used instead of @code{gettext}.
2870 Similarly, the @code{dngettext} function should be used in place of the
2871 @code{ngettext} function.
2874 @node Template, Creating, Sources, Top
2875 @chapter Making the PO Template File
2876 @cindex PO template file
2878 After preparing the sources, the programmer creates a PO template file.
2879 This section explains how to use @code{xgettext} for this purpose.
2881 @code{xgettext} creates a file named @file{@var{domainname}.po}. You
2882 should then rename it to @file{@var{domainname}.pot}. (Why doesn't
2883 @code{xgettext} create it under the name @file{@var{domainname}.pot}
2884 right away? The answer is: for historical reasons. When @code{xgettext}
2885 was specified, the distinction between a PO file and PO file template
2886 was fuzzy, and the suffix @samp{.pot} wasn't in use at that time.)
2891 * xgettext Invocation:: Invoking the @code{xgettext} Program
2894 @node xgettext Invocation, , Template, Template
2895 @section Invoking the @code{xgettext} Program
2897 @include xgettext.texi
2899 @node Creating, Updating, Template, Top
2900 @chapter Creating a New PO File
2901 @cindex creating a new PO file
2903 When starting a new translation, the translator creates a file called
2904 @file{@var{LANG}.po}, as a copy of the @file{@var{package}.pot} template
2905 file with modifications in the initial comments (at the beginning of the file)
2906 and in the header entry (the first entry, near the beginning of the file).
2908 The easiest way to do so is by use of the @samp{msginit} program.
2912 $ cd @var{PACKAGE}-@var{VERSION}
2917 The alternative way is to do the copy and modifications by hand.
2918 To do so, the translator copies @file{@var{package}.pot} to
2919 @file{@var{LANG}.po}. Then she modifies the initial comments and
2920 the header entry of this file.
2923 * msginit Invocation:: Invoking the @code{msginit} Program
2924 * Header Entry:: Filling in the Header Entry
2927 @node msginit Invocation, Header Entry, Creating, Creating
2928 @section Invoking the @code{msginit} Program
2930 @include msginit.texi
2932 @node Header Entry, , msginit Invocation, Creating
2933 @section Filling in the Header Entry
2934 @cindex header entry of a PO file
2936 The initial comments "SOME DESCRIPTIVE TITLE", "YEAR" and
2937 "FIRST AUTHOR <EMAIL@@ADDRESS>, YEAR" ought to be replaced by sensible
2938 information. This can be done in any text editor; if Emacs is used
2939 and it switched to PO mode automatically (because it has recognized
2940 the file's suffix), you can disable it by typing @kbd{M-x fundamental-mode}.
2942 Modifying the header entry can already be done using PO mode: in Emacs,
2943 type @kbd{M-x po-mode RET} and then @kbd{RET} again to start editing the
2944 entry. You should fill in the following fields.
2947 @item Project-Id-Version
2948 This is the name and version of the package. Fill it in if it has not
2949 already been filled in by @code{xgettext}.
2951 @item Report-Msgid-Bugs-To
2952 This has already been filled in by @code{xgettext}. It contains an email
2953 address or URL where you can report bugs in the untranslated strings:
2956 @item Strings which are not entire sentences, see the maintainer guidelines
2957 in @ref{Preparing Strings}.
2958 @item Strings which use unclear terms or require additional context to be
2960 @item Strings which make invalid assumptions about notation of date, time or
2962 @item Pluralisation problems.
2963 @item Incorrect English spelling.
2964 @item Incorrect formatting.
2967 @item POT-Creation-Date
2968 This has already been filled in by @code{xgettext}.
2970 @item PO-Revision-Date
2971 You don't need to fill this in. It will be filled by the PO file editor
2972 when you save the file.
2974 @item Last-Translator
2975 Fill in your name and email address (without double quotes).
2978 Fill in the English name of the language, and the email address or
2979 homepage URL of the language team you are part of.
2981 Before starting a translation, it is a good idea to get in touch with
2982 your translation team, not only to make sure you don't do duplicated work,
2983 but also to coordinate difficult linguistic issues.
2985 @cindex list of translation teams, where to find
2986 In the Free Translation Project, each translation team has its own mailing
2987 list. The up-to-date list of teams can be found at the Free Translation
2988 Project's homepage, @uref{http://translationproject.org/}, in the "Teams"
2992 @c The purpose of this field is to make it possible to automatically
2993 @c - convert PO files to translation memory,
2994 @c - initialize a spell checker based on the PO file,
2995 @c - perform language specific checks.
2996 Fill in the language code of the language. This can be in one of three
3001 @samp{@var{ll}}, an @w{ISO 639} two-letter language code (lowercase).
3002 See @ref{Language Codes} for the list of codes.
3005 @samp{@var{ll}_@var{CC}}, where @samp{@var{ll}} is an @w{ISO 639} two-letter
3006 language code (lowercase) and @samp{@var{CC}} is an @w{ISO 3166} two-letter
3007 country code (uppercase). The country code specification is not redundant:
3008 Some languages have dialects in different countries. For example,
3009 @samp{de_AT} is used for Austria, and @samp{pt_BR} for Brazil. The country
3010 code serves to distinguish the dialects. See @ref{Language Codes} and
3011 @ref{Country Codes} for the lists of codes.
3014 @samp{@var{ll}_@var{CC}@@@var{variant}}, where @samp{@var{ll}} is an
3015 @w{ISO 639} two-letter language code (lowercase), @samp{@var{CC}} is an
3016 @w{ISO 3166} two-letter country code (uppercase), and @samp{@var{variant}} is
3017 a variant designator. The variant designator (lowercase) can be a script
3018 designator, such as @samp{latin} or @samp{cyrillic}.
3021 The naming convention @samp{@var{ll}_@var{CC}} is also the way locales are
3022 named on systems based on GNU libc. But there are three important differences:
3026 In this PO file field, but not in locale names, @samp{@var{ll}_@var{CC}}
3027 combinations denoting a language's main dialect are abbreviated as
3028 @samp{@var{ll}}. For example, @samp{de} is equivalent to @samp{de_DE}
3029 (German as spoken in Germany), and @samp{pt} to @samp{pt_PT} (Portuguese as
3030 spoken in Portugal) in this context.
3033 In this PO file field, suffixes like @samp{.@var{encoding}} are not used.
3036 In this PO file field, variant designators that are not relevant to message
3037 translation, such as @samp{@@euro}, are not used.
3040 So, if your locale name is @samp{de_DE.UTF-8}, the language specification in
3041 PO files is just @samp{de}.
3044 @cindex encoding of PO files
3045 @cindex charset of PO files
3046 Replace @samp{CHARSET} with the character encoding used for your language,
3047 in your locale, or UTF-8. This field is needed for correct operation of the
3048 @code{msgmerge} and @code{msgfmt} programs, as well as for users whose
3049 locale's character encoding differs from yours (see @ref{Charset conversion}).
3051 @cindex @code{locale} program
3052 You get the character encoding of your locale by running the shell command
3053 @samp{locale charmap}. If the result is @samp{C} or @samp{ANSI_X3.4-1968},
3054 which is equivalent to @samp{ASCII} (= @samp{US-ASCII}), it means that your
3055 locale is not correctly configured. In this case, ask your translation
3056 team which charset to use. @samp{ASCII} is not usable for any language
3059 @cindex encoding list
3060 Because the PO files must be portable to operating systems with less advanced
3061 internationalization facilities, the character encodings that can be used
3062 are limited to those supported by both GNU @code{libc} and GNU
3063 @code{libiconv}. These are:
3064 @code{ASCII}, @code{ISO-8859-1}, @code{ISO-8859-2}, @code{ISO-8859-3},
3065 @code{ISO-8859-4}, @code{ISO-8859-5}, @code{ISO-8859-6}, @code{ISO-8859-7},
3066 @code{ISO-8859-8}, @code{ISO-8859-9}, @code{ISO-8859-13}, @code{ISO-8859-14},
3068 @code{KOI8-R}, @code{KOI8-U}, @code{KOI8-T},
3069 @code{CP850}, @code{CP866}, @code{CP874},
3070 @code{CP932}, @code{CP949}, @code{CP950}, @code{CP1250}, @code{CP1251},
3071 @code{CP1252}, @code{CP1253}, @code{CP1254}, @code{CP1255}, @code{CP1256},
3072 @code{CP1257}, @code{GB2312}, @code{EUC-JP}, @code{EUC-KR}, @code{EUC-TW},
3073 @code{BIG5}, @code{BIG5-HKSCS}, @code{GBK}, @code{GB18030}, @code{SHIFT_JIS},
3074 @code{JOHAB}, @code{TIS-620}, @code{VISCII}, @code{GEORGIAN-PS}, @code{UTF-8}.
3076 @c This data is taken from glibc/localedata/SUPPORTED.
3078 In the GNU system, the following encodings are frequently used for the
3079 corresponding languages.
3081 @cindex encoding for your language
3083 @item @code{ISO-8859-1} for
3084 Afrikaans, Albanian, Basque, Breton, Catalan, Cornish, Danish, Dutch,
3085 English, Estonian, Faroese, Finnish, French, Galician, German,
3086 Greenlandic, Icelandic, Indonesian, Irish, Italian, Malay, Manx,
3087 Norwegian, Occitan, Portuguese, Spanish, Swedish, Tagalog, Uzbek,
3089 @item @code{ISO-8859-2} for
3090 Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak,
3092 @item @code{ISO-8859-3} for Maltese,
3093 @item @code{ISO-8859-5} for Macedonian, Serbian,
3094 @item @code{ISO-8859-6} for Arabic,
3095 @item @code{ISO-8859-7} for Greek,
3096 @item @code{ISO-8859-8} for Hebrew,
3097 @item @code{ISO-8859-9} for Turkish,
3098 @item @code{ISO-8859-13} for Latvian, Lithuanian, Maori,
3099 @item @code{ISO-8859-14} for Welsh,
3100 @item @code{ISO-8859-15} for
3101 Basque, Catalan, Dutch, English, Finnish, French, Galician, German, Irish,
3102 Italian, Portuguese, Spanish, Swedish, Walloon,
3103 @item @code{KOI8-R} for Russian,
3104 @item @code{KOI8-U} for Ukrainian,
3105 @item @code{KOI8-T} for Tajik,
3106 @item @code{CP1251} for Bulgarian, Belarusian,
3107 @item @code{GB2312}, @code{GBK}, @code{GB18030}
3108 for simplified writing of Chinese,
3109 @item @code{BIG5}, @code{BIG5-HKSCS}
3110 for traditional writing of Chinese,
3111 @item @code{EUC-JP} for Japanese,
3112 @item @code{EUC-KR} for Korean,
3113 @item @code{TIS-620} for Thai,
3114 @item @code{GEORGIAN-PS} for Georgian,
3115 @item @code{UTF-8} for any language, including those listed above.
3118 @cindex quote characters, use in PO files
3119 @cindex quotation marks
3120 When single quote characters or double quote characters are used in
3121 translations for your language, and your locale's encoding is one of the
3122 ISO-8859-* charsets, it is best if you create your PO files in UTF-8
3123 encoding, instead of your locale's encoding. This is because in UTF-8
3124 the real quote characters can be represented (single quote characters:
3125 U+2018, U+2019, double quote characters: U+201C, U+201D), whereas none of
3126 ISO-8859-* charsets has them all. Users in UTF-8 locales will see the
3127 real quote characters, whereas users in ISO-8859-* locales will see the
3128 vertical apostrophe and the vertical double quote instead (because that's
3129 what the character set conversion will transliterate them to).
3131 @cindex @code{xmodmap} program, and typing quotation marks
3132 To enter such quote characters under X11, you can change your keyboard
3133 mapping using the @code{xmodmap} program. The X11 names of the quote
3134 characters are "leftsinglequotemark", "rightsinglequotemark",
3135 "leftdoublequotemark", "rightdoublequotemark", "singlelowquotemark",
3136 "doublelowquotemark".
3138 Note that only recent versions of GNU Emacs support the UTF-8 encoding:
3139 Emacs 20 with Mule-UCS, and Emacs 21. As of January 2001, XEmacs doesn't
3140 support the UTF-8 encoding.
3142 The character encoding name can be written in either upper or lower case.
3143 Usually upper case is preferred.
3145 @item Content-Transfer-Encoding
3146 Set this to @code{8bit}.
3149 This field is optional. It is only needed if the PO file has plural forms.
3150 You can find them by searching for the @samp{msgid_plural} keyword. The
3151 format of the plural forms field is described in @ref{Plural forms} and
3152 @ref{Translating plural forms}.
3155 @node Updating, Editing, Creating, Top
3156 @chapter Updating Existing PO Files
3159 * msgmerge Invocation:: Invoking the @code{msgmerge} Program
3162 @node msgmerge Invocation, , Updating, Updating
3163 @section Invoking the @code{msgmerge} Program
3165 @include msgmerge.texi
3167 @node Editing, Manipulating, Updating, Top
3168 @chapter Editing PO Files
3169 @cindex Editing PO Files
3172 * KBabel:: KDE's PO File Editor
3173 * Gtranslator:: GNOME's PO File Editor
3174 * PO Mode:: Emacs's PO File Editor
3175 * Compendium:: Using Translation Compendia
3178 @node KBabel, Gtranslator, Editing, Editing
3179 @section KDE's PO File Editor
3180 @cindex KDE PO file editor
3182 @node Gtranslator, PO Mode, KBabel, Editing
3183 @section GNOME's PO File Editor
3184 @cindex GNOME PO file editor
3186 @node PO Mode, Compendium, Gtranslator, Editing
3187 @section Emacs's PO File Editor
3188 @cindex Emacs PO Mode
3192 For those of you being
3193 the lucky users of Emacs, PO mode has been specifically created
3194 for providing a cozy environment for editing or modifying PO files.
3195 While editing a PO file, PO mode allows for the easy browsing of
3196 auxiliary and compendium PO files, as well as for following references into
3197 the set of C program sources from which PO files have been derived.
3198 It has a few special features, among which are the interactive marking
3199 of program strings as translatable, and the validation of PO files
3200 with easy repositioning to PO file lines showing errors.
3202 For the beginning, besides main PO mode commands
3203 (@pxref{Main PO Commands}), you should know how to move between entries
3204 (@pxref{Entry Positioning}), and how to handle untranslated entries
3205 (@pxref{Untranslated Entries}).
3208 * Installation:: Completing GNU @code{gettext} Installation
3209 * Main PO Commands:: Main Commands
3210 * Entry Positioning:: Entry Positioning
3211 * Normalizing:: Normalizing Strings in Entries
3212 * Translated Entries:: Translated Entries
3213 * Fuzzy Entries:: Fuzzy Entries
3214 * Untranslated Entries:: Untranslated Entries
3215 * Obsolete Entries:: Obsolete Entries
3216 * Modifying Translations:: Modifying Translations
3217 * Modifying Comments:: Modifying Comments
3218 * Subedit:: Mode for Editing Translations
3219 * C Sources Context:: C Sources Context
3220 * Auxiliary:: Consulting Auxiliary PO Files
3223 @node Installation, Main PO Commands, PO Mode, PO Mode
3224 @subsection Completing GNU @code{gettext} Installation
3226 @cindex installing @code{gettext}
3227 @cindex @code{gettext} installation
3228 Once you have received, unpacked, configured and compiled the GNU
3229 @code{gettext} distribution, the @samp{make install} command puts in
3230 place the programs @code{xgettext}, @code{msgfmt}, @code{gettext}, and
3231 @code{msgmerge}, as well as their available message catalogs. To
3232 top off a comfortable installation, you might also want to make the
3233 PO mode available to your Emacs users.
3235 @emindex @file{.emacs} customizations
3236 @emindex installing PO mode
3237 During the installation of the PO mode, you might want to modify your
3238 file @file{.emacs}, once and for all, so it contains a few lines looking
3242 (setq auto-mode-alist
3243 (cons '("\\.po\\'\\|\\.po\\." . po-mode) auto-mode-alist))
3244 (autoload 'po-mode "po-mode" "Major mode for translators to edit PO files" t)
3247 Later, whenever you edit some @file{.po}
3248 file, or any file having the string @samp{.po.} within its name,
3249 Emacs loads @file{po-mode.elc} (or @file{po-mode.el}) as needed, and
3250 automatically activates PO mode commands for the associated buffer.
3251 The string @emph{PO} appears in the mode line for any buffer for
3252 which PO mode is active. Many PO files may be active at once in a
3253 single Emacs session.
3255 If you are using Emacs version 20 or newer, and have already installed
3256 the appropriate international fonts on your system, you may also tell
3257 Emacs how to determine automatically the coding system of every PO file.
3258 This will often (but not always) cause the necessary fonts to be loaded
3259 and used for displaying the translations on your Emacs screen. For this
3260 to happen, add the lines:
3263 (modify-coding-system-alist 'file "\\.po\\'\\|\\.po\\."
3264 'po-find-file-coding-system)
3265 (autoload 'po-find-file-coding-system "po-mode")
3269 to your @file{.emacs} file. If, with this, you still see boxes instead
3270 of international characters, try a different font set (via Shift Mouse
3273 @node Main PO Commands, Entry Positioning, Installation, PO Mode
3274 @subsection Main PO mode Commands
3276 @cindex PO mode (Emacs) commands
3278 After setting up Emacs with something similar to the lines in
3279 @ref{Installation}, PO mode is activated for a window when Emacs finds a
3280 PO file in that window. This puts the window read-only and establishes a
3281 po-mode-map, which is a genuine Emacs mode, in a way that is not derived
3282 from text mode in any way. Functions found on @code{po-mode-hook},
3283 if any, will be executed.
3285 When PO mode is active in a window, the letters @samp{PO} appear
3286 in the mode line for that window. The mode line also displays how
3287 many entries of each kind are held in the PO file. For example,
3288 the string @samp{132t+3f+10u+2o} would tell the translator that the
3289 PO mode contains 132 translated entries (@pxref{Translated Entries},
3290 3 fuzzy entries (@pxref{Fuzzy Entries}), 10 untranslated entries
3291 (@pxref{Untranslated Entries}) and 2 obsolete entries (@pxref{Obsolete
3292 Entries}). Zero-coefficients items are not shown. So, in this example, if
3293 the fuzzy entries were unfuzzied, the untranslated entries were translated
3294 and the obsolete entries were deleted, the mode line would merely display
3295 @samp{145t} for the counters.
3297 The main PO commands are those which do not fit into the other categories of
3298 subsequent sections. These allow for quitting PO mode or for managing windows
3303 @efindex _@r{, PO Mode command}
3304 Undo last modification to the PO file (@code{po-undo}).
3307 @efindex Q@r{, PO Mode command}
3308 Quit processing and save the PO file (@code{po-quit}).
3311 @efindex q@r{, PO Mode command}
3312 Quit processing, possibly after confirmation (@code{po-confirm-and-quit}).
3315 @efindex 0@r{, PO Mode command}
3316 Temporary leave the PO file window (@code{po-other-window}).
3320 @efindex ?@r{, PO Mode command}
3321 @efindex h@r{, PO Mode command}
3322 Show help about PO mode (@code{po-help}).
3325 @efindex =@r{, PO Mode command}
3326 Give some PO file statistics (@code{po-statistics}).
3329 @efindex V@r{, PO Mode command}
3330 Batch validate the format of the whole PO file (@code{po-validate}).
3334 @efindex _@r{, PO Mode command}
3335 @efindex po-undo@r{, PO Mode command}
3336 The command @kbd{_} (@code{po-undo}) interfaces to the Emacs
3337 @emph{undo} facility. @xref{Undo, , Undoing Changes, emacs, The Emacs
3338 Editor}. Each time @kbd{_} is typed, modifications which the translator
3339 did to the PO file are undone a little more. For the purpose of
3340 undoing, each PO mode command is atomic. This is especially true for
3341 the @kbd{@key{RET}} command: the whole edition made by using a single
3342 use of this command is undone at once, even if the edition itself
3343 implied several actions. However, while in the editing window, one
3344 can undo the edition work quite parsimoniously.
3346 @efindex Q@r{, PO Mode command}
3347 @efindex q@r{, PO Mode command}
3348 @efindex po-quit@r{, PO Mode command}
3349 @efindex po-confirm-and-quit@r{, PO Mode command}
3350 The commands @kbd{Q} (@code{po-quit}) and @kbd{q}
3351 (@code{po-confirm-and-quit}) are used when the translator is done with the
3352 PO file. The former is a bit less verbose than the latter. If the file
3353 has been modified, it is saved to disk first. In both cases, and prior to
3354 all this, the commands check if any untranslated messages remain in the
3355 PO file and, if so, the translator is asked if she really wants to leave
3356 off working with this PO file. This is the preferred way of getting rid
3357 of an Emacs PO file buffer. Merely killing it through the usual command
3358 @w{@kbd{C-x k}} (@code{kill-buffer}) is not the tidiest way to proceed.
3360 @efindex 0@r{, PO Mode command}
3361 @efindex po-other-window@r{, PO Mode command}
3362 The command @kbd{0} (@code{po-other-window}) is another, softer way,
3363 to leave PO mode, temporarily. It just moves the cursor to some other
3364 Emacs window, and pops one if necessary. For example, if the translator
3365 just got PO mode to show some source context in some other, she might
3366 discover some apparent bug in the program source that needs correction.
3367 This command allows the translator to change sex, become a programmer,
3368 and have the cursor right into the window containing the program she
3369 (or rather @emph{he}) wants to modify. By later getting the cursor back
3370 in the PO file window, or by asking Emacs to edit this file once again,
3371 PO mode is then recovered.
3373 @efindex ?@r{, PO Mode command}
3374 @efindex h@r{, PO Mode command}
3375 @efindex po-help@r{, PO Mode command}
3376 The command @kbd{h} (@code{po-help}) displays a summary of all available PO
3377 mode commands. The translator should then type any character to resume
3378 normal PO mode operations. The command @kbd{?} has the same effect
3381 @efindex =@r{, PO Mode command}
3382 @efindex po-statistics@r{, PO Mode command}
3383 The command @kbd{=} (@code{po-statistics}) computes the total number of
3384 entries in the PO file, the ordinal of the current entry (counted from
3385 1), the number of untranslated entries, the number of obsolete entries,
3386 and displays all these numbers.
3388 @efindex V@r{, PO Mode command}
3389 @efindex po-validate@r{, PO Mode command}
3390 The command @kbd{V} (@code{po-validate}) launches @code{msgfmt} in
3391 checking and verbose
3392 mode over the current PO file. This command first offers to save the
3393 current PO file on disk. The @code{msgfmt} tool, from GNU @code{gettext},
3394 has the purpose of creating a MO file out of a PO file, and PO mode uses
3395 the features of this program for checking the overall format of a PO file,
3396 as well as all individual entries.
3398 @efindex next-error@r{, stepping through PO file validation results}
3399 The program @code{msgfmt} runs asynchronously with Emacs, so the
3400 translator regains control immediately while her PO file is being studied.
3401 Error output is collected in the Emacs @samp{*compilation*} buffer,
3402 displayed in another window. The regular Emacs command @kbd{C-x`}
3403 (@code{next-error}), as well as other usual compile commands, allow the
3404 translator to reposition quickly to the offending parts of the PO file.
3405 Once the cursor is on the line in error, the translator may decide on
3406 any PO mode action which would help correcting the error.
3408 @node Entry Positioning, Normalizing, Main PO Commands, PO Mode
3409 @subsection Entry Positioning
3411 @emindex current entry of a PO file
3412 The cursor in a PO file window is almost always part of
3413 an entry. The only exceptions are the special case when the cursor
3414 is after the last entry in the file, or when the PO file is
3415 empty. The entry where the cursor is found to be is said to be the
3416 current entry. Many PO mode commands operate on the current entry,
3417 so moving the cursor does more than allowing the translator to browse
3418 the PO file, this also selects on which entry commands operate.
3420 @emindex moving through a PO file
3421 Some PO mode commands alter the position of the cursor in a specialized
3422 way. A few of those special purpose positioning are described here,
3423 the others are described in following sections (for a complete list try
3429 @efindex .@r{, PO Mode command}
3430 Redisplay the current entry (@code{po-current-entry}).
3433 @efindex n@r{, PO Mode command}
3434 Select the entry after the current one (@code{po-next-entry}).
3437 @efindex p@r{, PO Mode command}
3438 Select the entry before the current one (@code{po-previous-entry}).
3441 @efindex <@r{, PO Mode command}
3442 Select the first entry in the PO file (@code{po-first-entry}).
3445 @efindex >@r{, PO Mode command}
3446 Select the last entry in the PO file (@code{po-last-entry}).
3449 @efindex m@r{, PO Mode command}
3450 Record the location of the current entry for later use
3451 (@code{po-push-location}).
3454 @efindex r@r{, PO Mode command}
3455 Return to a previously saved entry location (@code{po-pop-location}).
3458 @efindex x@r{, PO Mode command}
3459 Exchange the current entry location with the previously saved one
3460 (@code{po-exchange-location}).
3464 @efindex .@r{, PO Mode command}
3465 @efindex po-current-entry@r{, PO Mode command}
3466 Any Emacs command able to reposition the cursor may be used
3467 to select the current entry in PO mode, including commands which
3468 move by characters, lines, paragraphs, screens or pages, and search
3469 commands. However, there is a kind of standard way to display the
3470 current entry in PO mode, which usual Emacs commands moving
3471 the cursor do not especially try to enforce. The command @kbd{.}
3472 (@code{po-current-entry}) has the sole purpose of redisplaying the
3473 current entry properly, after the current entry has been changed by
3474 means external to PO mode, or the Emacs screen otherwise altered.
3476 It is yet to be decided if PO mode helps the translator, or otherwise
3477 irritates her, by forcing a rigid window disposition while she
3478 is doing her work. We originally had quite precise ideas about
3479 how windows should behave, but on the other hand, anyone used to
3480 Emacs is often happy to keep full control. Maybe a fixed window
3481 disposition might be offered as a PO mode option that the translator
3482 might activate or deactivate at will, so it could be offered on an
3483 experimental basis. If nobody feels a real need for using it, or
3484 a compulsion for writing it, we should drop this whole idea.
3485 The incentive for doing it should come from translators rather than
3486 programmers, as opinions from an experienced translator are surely
3487 more worth to me than opinions from programmers @emph{thinking} about
3488 how @emph{others} should do translation.
3490 @efindex n@r{, PO Mode command}
3491 @efindex po-next-entry@r{, PO Mode command}
3492 @efindex p@r{, PO Mode command}
3493 @efindex po-previous-entry@r{, PO Mode command}
3494 The commands @kbd{n} (@code{po-next-entry}) and @kbd{p}
3495 (@code{po-previous-entry}) move the cursor the entry following,
3496 or preceding, the current one. If @kbd{n} is given while the
3497 cursor is on the last entry of the PO file, or if @kbd{p}
3498 is given while the cursor is on the first entry, no move is done.
3500 @efindex <@r{, PO Mode command}
3501 @efindex po-first-entry@r{, PO Mode command}
3502 @efindex >@r{, PO Mode command}
3503 @efindex po-last-entry@r{, PO Mode command}
3504 The commands @kbd{<} (@code{po-first-entry}) and @kbd{>}
3505 (@code{po-last-entry}) move the cursor to the first entry, or last
3506 entry, of the PO file. When the cursor is located past the last
3507 entry in a PO file, most PO mode commands will return an error saying
3508 @samp{After last entry}. Moreover, the commands @kbd{<} and @kbd{>}
3509 have the special property of being able to work even when the cursor
3510 is not into some PO file entry, and one may use them for nicely
3511 correcting this situation. But even these commands will fail on a
3512 truly empty PO file. There are development plans for the PO mode for it
3513 to interactively fill an empty PO file from sources. @xref{Marking}.
3515 The translator may decide, before working at the translation of
3516 a particular entry, that she needs to browse the remainder of the
3517 PO file, maybe for finding the terminology or phraseology used
3518 in related entries. She can of course use the standard Emacs idioms
3519 for saving the current cursor location in some register, and use that
3520 register for getting back, or else, use the location ring.
3522 @efindex m@r{, PO Mode command}
3523 @efindex po-push-location@r{, PO Mode command}
3524 @efindex r@r{, PO Mode command}
3525 @efindex po-pop-location@r{, PO Mode command}
3526 PO mode offers another approach, by which cursor locations may be saved
3527 onto a special stack. The command @kbd{m} (@code{po-push-location})
3528 merely adds the location of current entry to the stack, pushing
3529 the already saved locations under the new one. The command
3530 @kbd{r} (@code{po-pop-location}) consumes the top stack element and
3531 repositions the cursor to the entry associated with that top element.
3532 This position is then lost, for the next @kbd{r} will move the cursor
3533 to the previously saved location, and so on until no locations remain
3536 If the translator wants the position to be kept on the location stack,
3537 maybe for taking a look at the entry associated with the top
3538 element, then go elsewhere with the intent of getting back later, she
3539 ought to use @kbd{m} immediately after @kbd{r}.
3541 @efindex x@r{, PO Mode command}
3542 @efindex po-exchange-location@r{, PO Mode command}
3543 The command @kbd{x} (@code{po-exchange-location}) simultaneously
3544 repositions the cursor to the entry associated with the top element of
3545 the stack of saved locations, and replaces that top element with the
3546 location of the current entry before the move. Consequently, repeating
3547 the @kbd{x} command toggles alternatively between two entries.
3548 For achieving this, the translator will position the cursor on the
3549 first entry, use @kbd{m}, then position to the second entry, and
3550 merely use @kbd{x} for making the switch.
3552 @node Normalizing, Translated Entries, Entry Positioning, PO Mode
3553 @subsection Normalizing Strings in Entries
3554 @cindex string normalization in entries
3556 There are many different ways for encoding a particular string into a
3557 PO file entry, because there are so many different ways to split and
3558 quote multi-line strings, and even, to represent special characters
3559 by backslashed escaped sequences. Some features of PO mode rely on
3560 the ability for PO mode to scan an already existing PO file for a
3561 particular string encoded into the @code{msgid} field of some entry.
3562 Even if PO mode has internally all the built-in machinery for
3563 implementing this recognition easily, doing it fast is technically
3564 difficult. To facilitate a solution to this efficiency problem,
3565 we decided on a canonical representation for strings.
3567 A conventional representation of strings in a PO file is currently
3568 under discussion, and PO mode experiments with a canonical representation.
3569 Having both @code{xgettext} and PO mode converging towards a uniform
3570 way of representing equivalent strings would be useful, as the internal
3571 normalization needed by PO mode could be automatically satisfied
3572 when using @code{xgettext} from GNU @code{gettext}. An explicit
3573 PO mode normalization should then be only necessary for PO files
3574 imported from elsewhere, or for when the convention itself evolves.
3576 So, for achieving normalization of at least the strings of a given
3577 PO file needing a canonical representation, the following PO mode
3578 command is available:
3580 @emindex string normalization in entries
3582 @item M-x po-normalize
3583 @efindex po-normalize@r{, PO Mode command}
3584 Tidy the whole PO file by making entries more uniform.
3588 The special command @kbd{M-x po-normalize}, which has no associated
3589 keys, revises all entries, ensuring that strings of both original
3590 and translated entries use uniform internal quoting in the PO file.
3591 It also removes any crumb after the last entry. This command may be
3592 useful for PO files freshly imported from elsewhere, or if we ever
3593 improve on the canonical quoting format we use. This canonical format
3594 is not only meant for getting cleaner PO files, but also for greatly
3595 speeding up @code{msgid} string lookup for some other PO mode commands.
3597 @kbd{M-x po-normalize} presently makes three passes over the entries.
3598 The first implements heuristics for converting PO files for GNU
3599 @code{gettext} 0.6 and earlier, in which @code{msgid} and @code{msgstr}
3600 fields were using K&R style C string syntax for multi-line strings.
3601 These heuristics may fail for comments not related to obsolete
3602 entries and ending with a backslash; they also depend on subsequent
3603 passes for finalizing the proper commenting of continued lines for
3604 obsolete entries. This first pass might disappear once all oldish PO
3605 files would have been adjusted. The second and third pass normalize
3606 all @code{msgid} and @code{msgstr} strings respectively. They also
3607 clean out those trailing backslashes used by XView's @code{msgfmt}
3608 for continued lines.
3610 @cindex importing PO files
3611 Having such an explicit normalizing command allows for importing PO
3612 files from other sources, but also eases the evolution of the current
3613 convention, evolution driven mostly by aesthetic concerns, as of now.
3614 It is easy to make suggested adjustments at a later time, as the
3615 normalizing command and eventually, other GNU @code{gettext} tools
3616 should greatly automate conformance. A description of the canonical
3617 string format is given below, for the particular benefit of those not
3618 having Emacs handy, and who would nevertheless want to handcraft
3619 their PO files in nice ways.
3621 @cindex multi-line strings
3622 Right now, in PO mode, strings are single line or multi-line. A string
3623 goes multi-line if and only if it has @emph{embedded} newlines, that
3624 is, if it matches @samp{[^\n]\n+[^\n]}. So, we would have:
3627 msgstr "\n\nHello, world!\n\n\n"
3630 but, replacing the space by a newline, this becomes:
3642 We are deliberately using a caricatural example, here, to make the
3643 point clearer. Usually, multi-lines are not that bad looking.
3644 It is probable that we will implement the following suggestion.
3645 We might lump together all initial newlines into the empty string,
3646 and also all newlines introducing empty lines (that is, for @w{@var{n}
3647 > 1}, the @var{n}-1'th last newlines would go together on a separate
3648 string), so making the previous example appear:
3657 There are a few yet undecided little points about string normalization,
3658 to be documented in this manual, once these questions settle.
3660 @node Translated Entries, Fuzzy Entries, Normalizing, PO Mode
3661 @subsection Translated Entries
3662 @cindex translated entries
3664 Each PO file entry for which the @code{msgstr} field has been filled with
3665 a translation, and which is not marked as fuzzy (@pxref{Fuzzy Entries}),
3666 is said to be a @dfn{translated} entry. Only translated entries will
3667 later be compiled by GNU @code{msgfmt} and become usable in programs.
3668 Other entry types will be excluded; translation will not occur for them.
3670 @emindex moving by translated entries
3671 Some commands are more specifically related to translated entry processing.
3675 @efindex t@r{, PO Mode command}
3676 Find the next translated entry (@code{po-next-translated-entry}).
3679 @efindex T@r{, PO Mode command}
3680 Find the previous translated entry (@code{po-previous-translated-entry}).
3684 @efindex t@r{, PO Mode command}
3685 @efindex po-next-translated-entry@r{, PO Mode command}
3686 @efindex T@r{, PO Mode command}
3687 @efindex po-previous-translated-entry@r{, PO Mode command}
3688 The commands @kbd{t} (@code{po-next-translated-entry}) and @kbd{T}
3689 (@code{po-previous-translated-entry}) move forwards or backwards, chasing
3690 for an translated entry. If none is found, the search is extended and
3691 wraps around in the PO file buffer.
3693 @evindex po-auto-fuzzy-on-edit@r{, PO Mode variable}
3694 Translated entries usually result from the translator having edited in
3695 a translation for them, @ref{Modifying Translations}. However, if the
3696 variable @code{po-auto-fuzzy-on-edit} is not @code{nil}, the entry having
3697 received a new translation first becomes a fuzzy entry, which ought to
3698 be later unfuzzied before becoming an official, genuine translated entry.
3699 @xref{Fuzzy Entries}.
3701 @node Fuzzy Entries, Untranslated Entries, Translated Entries, PO Mode
3702 @subsection Fuzzy Entries
3703 @cindex fuzzy entries
3705 @cindex attributes of a PO file entry
3706 @cindex attribute, fuzzy
3707 Each PO file entry may have a set of @dfn{attributes}, which are
3708 qualities given a name and explicitly associated with the translation,
3709 using a special system comment. One of these attributes
3710 has the name @code{fuzzy}, and entries having this attribute are said
3711 to have a fuzzy translation. They are called fuzzy entries, for short.
3713 Fuzzy entries, even if they account for translated entries for
3714 most other purposes, usually call for revision by the translator.
3715 Those may be produced by applying the program @code{msgmerge} to
3716 update an older translated PO files according to a new PO template
3717 file, when this tool hypothesises that some new @code{msgid} has
3718 been modified only slightly out of an older one, and chooses to pair
3719 what it thinks to be the old translation for the new modified entry.
3720 The slight alteration in the original string (the @code{msgid} string)
3721 should often be reflected in the translated string, and this requires
3722 the intervention of the translator. For this reason, @code{msgmerge}
3723 might mark some entries as being fuzzy.
3725 @emindex moving by fuzzy entries
3726 Also, the translator may decide herself to mark an entry as fuzzy
3727 for her own convenience, when she wants to remember that the entry
3728 has to be later revisited. So, some commands are more specifically
3729 related to fuzzy entry processing.
3733 @efindex f@r{, PO Mode command}
3734 @c better append "-entry" all the time. -ke-
3735 Find the next fuzzy entry (@code{po-next-fuzzy-entry}).
3738 @efindex F@r{, PO Mode command}
3739 Find the previous fuzzy entry (@code{po-previous-fuzzy-entry}).
3742 @efindex TAB@r{, PO Mode command}
3743 Remove the fuzzy attribute of the current entry (@code{po-unfuzzy}).
3747 @efindex f@r{, PO Mode command}
3748 @efindex po-next-fuzzy-entry@r{, PO Mode command}
3749 @efindex F@r{, PO Mode command}
3750 @efindex po-previous-fuzzy-entry@r{, PO Mode command}
3751 The commands @kbd{f} (@code{po-next-fuzzy-entry}) and @kbd{F}
3752 (@code{po-previous-fuzzy-entry}) move forwards or backwards, chasing for
3753 a fuzzy entry. If none is found, the search is extended and wraps
3754 around in the PO file buffer.
3756 @efindex TAB@r{, PO Mode command}
3757 @efindex po-unfuzzy@r{, PO Mode command}
3758 @evindex po-auto-select-on-unfuzzy@r{, PO Mode variable}
3759 The command @kbd{@key{TAB}} (@code{po-unfuzzy}) removes the fuzzy
3760 attribute associated with an entry, usually leaving it translated.
3761 Further, if the variable @code{po-auto-select-on-unfuzzy} has not
3762 the @code{nil} value, the @kbd{@key{TAB}} command will automatically chase
3763 for another interesting entry to work on. The initial value of
3764 @code{po-auto-select-on-unfuzzy} is @code{nil}.
3766 The initial value of @code{po-auto-fuzzy-on-edit} is @code{nil}. However,
3767 if the variable @code{po-auto-fuzzy-on-edit} is set to @code{t}, any entry
3768 edited through the @kbd{@key{RET}} command is marked fuzzy, as a way to
3769 ensure some kind of double check, later. In this case, the usual paradigm
3770 is that an entry becomes fuzzy (if not already) whenever the translator
3771 modifies it. If she is satisfied with the translation, she then uses
3772 @kbd{@key{TAB}} to pick another entry to work on, clearing the fuzzy attribute
3773 on the same blow. If she is not satisfied yet, she merely uses @kbd{@key{SPC}}
3774 to chase another entry, leaving the entry fuzzy.
3776 @efindex DEL@r{, PO Mode command}
3777 @efindex po-fade-out-entry@r{, PO Mode command}
3778 The translator may also use the @kbd{@key{DEL}} command
3779 (@code{po-fade-out-entry}) over any translated entry to mark it as being
3780 fuzzy, when she wants to easily leave a trace she wants to later return
3781 working at this entry.
3783 Also, when time comes to quit working on a PO file buffer with the @kbd{q}
3784 command, the translator is asked for confirmation, if fuzzy string
3787 @node Untranslated Entries, Obsolete Entries, Fuzzy Entries, PO Mode
3788 @subsection Untranslated Entries
3789 @cindex untranslated entries
3791 When @code{xgettext} originally creates a PO file, unless told
3792 otherwise, it initializes the @code{msgid} field with the untranslated
3793 string, and leaves the @code{msgstr} string to be empty. Such entries,
3794 having an empty translation, are said to be @dfn{untranslated} entries.
3795 Later, when the programmer slightly modifies some string right in
3796 the program, this change is later reflected in the PO file
3797 by the appearance of a new untranslated entry for the modified string.
3799 The usual commands moving from entry to entry consider untranslated
3800 entries on the same level as active entries. Untranslated entries
3801 are easily recognizable by the fact they end with @w{@samp{msgstr ""}}.
3803 @emindex moving by untranslated entries
3804 The work of the translator might be (quite naively) seen as the process
3805 of seeking for an untranslated entry, editing a translation for
3806 it, and repeating these actions until no untranslated entries remain.
3807 Some commands are more specifically related to untranslated entry
3812 @efindex u@r{, PO Mode command}
3813 Find the next untranslated entry (@code{po-next-untranslated-entry}).
3816 @efindex U@r{, PO Mode command}
3817 Find the previous untranslated entry (@code{po-previous-untransted-entry}).
3820 @efindex k@r{, PO Mode command}
3821 Turn the current entry into an untranslated one (@code{po-kill-msgstr}).
3825 @efindex u@r{, PO Mode command}
3826 @efindex po-next-untranslated-entry@r{, PO Mode command}
3827 @efindex U@r{, PO Mode command}
3828 @efindex po-previous-untransted-entry@r{, PO Mode command}
3829 The commands @kbd{u} (@code{po-next-untranslated-entry}) and @kbd{U}
3830 (@code{po-previous-untransted-entry}) move forwards or backwards,
3831 chasing for an untranslated entry. If none is found, the search is
3832 extended and wraps around in the PO file buffer.
3834 @efindex k@r{, PO Mode command}
3835 @efindex po-kill-msgstr@r{, PO Mode command}
3836 An entry can be turned back into an untranslated entry by
3837 merely emptying its translation, using the command @kbd{k}
3838 (@code{po-kill-msgstr}). @xref{Modifying Translations}.
3840 Also, when time comes to quit working on a PO file buffer
3841 with the @kbd{q} command, the translator is asked for confirmation,
3842 if some untranslated string still exists.
3844 @node Obsolete Entries, Modifying Translations, Untranslated Entries, PO Mode
3845 @subsection Obsolete Entries
3846 @cindex obsolete entries
3848 By @dfn{obsolete} PO file entries, we mean those entries which are
3849 commented out, usually by @code{msgmerge} when it found that the
3850 translation is not needed anymore by the package being localized.
3852 The usual commands moving from entry to entry consider obsolete
3853 entries on the same level as active entries. Obsolete entries are
3854 easily recognizable by the fact that all their lines start with
3855 @code{#}, even those lines containing @code{msgid} or @code{msgstr}.
3857 Commands exist for emptying the translation or reinitializing it
3858 to the original untranslated string. Commands interfacing with the
3859 kill ring may force some previously saved text into the translation.
3860 The user may interactively edit the translation. All these commands
3861 may apply to obsolete entries, carefully leaving the entry obsolete
3864 @emindex moving by obsolete entries
3865 Moreover, some commands are more specifically related to obsolete
3870 @efindex o@r{, PO Mode command}
3871 Find the next obsolete entry (@code{po-next-obsolete-entry}).
3874 @efindex O@r{, PO Mode command}
3875 Find the previous obsolete entry (@code{po-previous-obsolete-entry}).
3878 @efindex DEL@r{, PO Mode command}
3879 Make an active entry obsolete, or zap out an obsolete entry
3880 (@code{po-fade-out-entry}).
3884 @efindex o@r{, PO Mode command}
3885 @efindex po-next-obsolete-entry@r{, PO Mode command}
3886 @efindex O@r{, PO Mode command}
3887 @efindex po-previous-obsolete-entry@r{, PO Mode command}
3888 The commands @kbd{o} (@code{po-next-obsolete-entry}) and @kbd{O}
3889 (@code{po-previous-obsolete-entry}) move forwards or backwards,
3890 chasing for an obsolete entry. If none is found, the search is
3891 extended and wraps around in the PO file buffer.
3893 PO mode does not provide ways for un-commenting an obsolete entry
3894 and making it active, because this would reintroduce an original
3895 untranslated string which does not correspond to any marked string
3896 in the program sources. This goes with the philosophy of never
3897 introducing useless @code{msgid} values.
3899 @efindex DEL@r{, PO Mode command}
3900 @efindex po-fade-out-entry@r{, PO Mode command}
3901 @emindex obsolete active entry
3902 @emindex comment out PO file entry
3903 However, it is possible to comment out an active entry, so making
3904 it obsolete. GNU @code{gettext} utilities will later react to the
3905 disappearance of a translation by using the untranslated string.
3906 The command @kbd{@key{DEL}} (@code{po-fade-out-entry}) pushes the current entry
3907 a little further towards annihilation. If the entry is active (it is a
3908 translated entry), then it is first made fuzzy. If it is already fuzzy,
3909 then the entry is merely commented out, with confirmation. If the entry
3910 is already obsolete, then it is completely deleted from the PO file.
3911 It is easy to recycle the translation so deleted into some other PO file
3912 entry, usually one which is untranslated. @xref{Modifying Translations}.
3914 Here is a quite interesting problem to solve for later development of
3915 PO mode, for those nights you are not sleepy. The idea would be that
3916 PO mode might become bright enough, one of these days, to make good
3917 guesses at retrieving the most probable candidate, among all obsolete
3918 entries, for initializing the translation of a newly appeared string.
3919 I think it might be a quite hard problem to do this algorithmically, as
3920 we have to develop good and efficient measures of string similarity.
3921 Right now, PO mode completely lets the decision to the translator,
3922 when the time comes to find the adequate obsolete translation, it
3923 merely tries to provide handy tools for helping her to do so.
3925 @node Modifying Translations, Modifying Comments, Obsolete Entries, PO Mode
3926 @subsection Modifying Translations
3927 @cindex editing translations
3928 @emindex editing translations
3930 PO mode prevents direct modification of the PO file, by the usual
3931 means Emacs gives for altering a buffer's contents. By doing so,
3932 it pretends helping the translator to avoid little clerical errors
3933 about the overall file format, or the proper quoting of strings,
3934 as those errors would be easily made. Other kinds of errors are
3935 still possible, but some may be caught and diagnosed by the batch
3936 validation process, which the translator may always trigger by the
3937 @kbd{V} command. For all other errors, the translator has to rely on
3938 her own judgment, and also on the linguistic reports submitted to her
3939 by the users of the translated package, having the same mother tongue.
3941 When the time comes to create a translation, correct an error diagnosed
3942 mechanically or reported by a user, the translators have to resort to
3943 using the following commands for modifying the translations.
3947 @efindex RET@r{, PO Mode command}
3948 Interactively edit the translation (@code{po-edit-msgstr}).
3952 @efindex LFD@r{, PO Mode command}
3953 @efindex C-j@r{, PO Mode command}
3954 Reinitialize the translation with the original, untranslated string
3955 (@code{po-msgid-to-msgstr}).
3958 @efindex k@r{, PO Mode command}
3959 Save the translation on the kill ring, and delete it (@code{po-kill-msgstr}).
3962 @efindex w@r{, PO Mode command}
3963 Save the translation on the kill ring, without deleting it
3964 (@code{po-kill-ring-save-msgstr}).
3967 @efindex y@r{, PO Mode command}
3968 Replace the translation, taking the new from the kill ring
3969 (@code{po-yank-msgstr}).
3973 @efindex RET@r{, PO Mode command}
3974 @efindex po-edit-msgstr@r{, PO Mode command}
3975 The command @kbd{@key{RET}} (@code{po-edit-msgstr}) opens a new Emacs
3976 window meant to edit in a new translation, or to modify an already existing
3977 translation. The new window contains a copy of the translation taken from
3978 the current PO file entry, all ready for edition, expunged of all quoting
3979 marks, fully modifiable and with the complete extent of Emacs modifying
3980 commands. When the translator is done with her modifications, she may use
3981 @w{@kbd{C-c C-c}} to close the subedit window with the automatically requoted
3982 results, or @w{@kbd{C-c C-k}} to abort her modifications. @xref{Subedit},
3983 for more information.
3985 @efindex LFD@r{, PO Mode command}
3986 @efindex C-j@r{, PO Mode command}
3987 @efindex po-msgid-to-msgstr@r{, PO Mode command}
3988 The command @kbd{@key{LFD}} (@code{po-msgid-to-msgstr}) initializes, or
3989 reinitializes the translation with the original string. This command is
3990 normally used when the translator wants to redo a fresh translation of
3991 the original string, disregarding any previous work.
3993 @evindex po-auto-edit-with-msgid@r{, PO Mode variable}
3994 It is possible to arrange so, whenever editing an untranslated
3995 entry, the @kbd{@key{LFD}} command be automatically executed. If you set
3996 @code{po-auto-edit-with-msgid} to @code{t}, the translation gets
3997 initialised with the original string, in case none exists already.
3998 The default value for @code{po-auto-edit-with-msgid} is @code{nil}.
4000 @emindex starting a string translation
4001 In fact, whether it is best to start a translation with an empty
4002 string, or rather with a copy of the original string, is a matter of
4003 taste or habit. Sometimes, the source language and the
4004 target language are so different that is simply best to start writing
4005 on an empty page. At other times, the source and target languages
4006 are so close that it would be a waste to retype a number of words
4007 already being written in the original string. A translator may also
4008 like having the original string right under her eyes, as she will
4009 progressively overwrite the original text with the translation, even
4010 if this requires some extra editing work to get rid of the original.
4012 @emindex cut and paste for translated strings
4013 @efindex k@r{, PO Mode command}
4014 @efindex po-kill-msgstr@r{, PO Mode command}
4015 @efindex w@r{, PO Mode command}
4016 @efindex po-kill-ring-save-msgstr@r{, PO Mode command}
4017 The command @kbd{k} (@code{po-kill-msgstr}) merely empties the
4018 translation string, so turning the entry into an untranslated
4019 one. But while doing so, its previous contents is put apart in
4020 a special place, known as the kill ring. The command @kbd{w}
4021 (@code{po-kill-ring-save-msgstr}) has also the effect of taking a
4022 copy of the translation onto the kill ring, but it otherwise leaves
4023 the entry alone, and does @emph{not} remove the translation from the
4024 entry. Both commands use exactly the Emacs kill ring, which is shared
4025 between buffers, and which is well known already to Emacs lovers.
4027 The translator may use @kbd{k} or @kbd{w} many times in the course
4028 of her work, as the kill ring may hold several saved translations.
4029 From the kill ring, strings may later be reinserted in various
4030 Emacs buffers. In particular, the kill ring may be used for moving
4031 translation strings between different entries of a single PO file
4032 buffer, or if the translator is handling many such buffers at once,
4033 even between PO files.
4035 To facilitate exchanges with buffers which are not in PO mode, the
4036 translation string put on the kill ring by the @kbd{k} command is fully
4037 unquoted before being saved: external quotes are removed, multi-line
4038 strings are concatenated, and backslash escaped sequences are turned
4039 into their corresponding characters. In the special case of obsolete
4040 entries, the translation is also uncommented prior to saving.
4042 @efindex y@r{, PO Mode command}
4043 @efindex po-yank-msgstr@r{, PO Mode command}
4044 The command @kbd{y} (@code{po-yank-msgstr}) completely replaces the
4045 translation of the current entry by a string taken from the kill ring.
4046 Following Emacs terminology, we then say that the replacement
4047 string is @dfn{yanked} into the PO file buffer.
4048 @xref{Yanking, , , emacs, The Emacs Editor}.
4049 The first time @kbd{y} is used, the translation receives the value of
4050 the most recent addition to the kill ring. If @kbd{y} is typed once
4051 again, immediately, without intervening keystrokes, the translation
4052 just inserted is taken away and replaced by the second most recent
4053 addition to the kill ring. By repeating @kbd{y} many times in a row,
4054 the translator may travel along the kill ring for saved strings,
4055 until she finds the string she really wanted.
4057 When a string is yanked into a PO file entry, it is fully and
4058 automatically requoted for complying with the format PO files should
4059 have. Further, if the entry is obsolete, PO mode then appropriately
4060 push the inserted string inside comments. Once again, translators
4061 should not burden themselves with quoting considerations besides, of
4062 course, the necessity of the translated string itself respective to
4063 the program using it.
4065 Note that @kbd{k} or @kbd{w} are not the only commands pushing strings
4066 on the kill ring, as almost any PO mode command replacing translation
4067 strings (or the translator comments) automatically saves the old string
4068 on the kill ring. The main exceptions to this general rule are the
4069 yanking commands themselves.
4071 @emindex using obsolete translations to make new entries
4072 To better illustrate the operation of killing and yanking, let's
4073 use an actual example, taken from a common situation. When the
4074 programmer slightly modifies some string right in the program, his
4075 change is later reflected in the PO file by the appearance
4076 of a new untranslated entry for the modified string, and the fact
4077 that the entry translating the original or unmodified string becomes
4078 obsolete. In many cases, the translator might spare herself some work
4079 by retrieving the unmodified translation from the obsolete entry,
4080 then initializing the untranslated entry @code{msgstr} field with
4081 this retrieved translation. Once this done, the obsolete entry is
4082 not wanted anymore, and may be safely deleted.
4084 When the translator finds an untranslated entry and suspects that a
4085 slight variant of the translation exists, she immediately uses @kbd{m}
4086 to mark the current entry location, then starts chasing obsolete
4087 entries with @kbd{o}, hoping to find some translation corresponding
4088 to the unmodified string. Once found, she uses the @kbd{@key{DEL}} command
4089 for deleting the obsolete entry, knowing that @kbd{@key{DEL}} also @emph{kills}
4090 the translation, that is, pushes the translation on the kill ring.
4091 Then, @kbd{r} returns to the initial untranslated entry, and @kbd{y}
4092 then @emph{yanks} the saved translation right into the @code{msgstr}
4093 field. The translator is then free to use @kbd{@key{RET}} for fine
4094 tuning the translation contents, and maybe to later use @kbd{u},
4095 then @kbd{m} again, for going on with the next untranslated string.
4097 When some sequence of keys has to be typed over and over again, the
4098 translator may find it useful to become better acquainted with the Emacs
4099 capability of learning these sequences and playing them back under request.
4100 @xref{Keyboard Macros, , , emacs, The Emacs Editor}.
4102 @node Modifying Comments, Subedit, Modifying Translations, PO Mode
4103 @subsection Modifying Comments
4104 @cindex editing comments in PO files
4105 @emindex editing comments
4107 Any translation work done seriously will raise many linguistic
4108 difficulties, for which decisions have to be made, and the choices
4109 further documented. These documents may be saved within the
4110 PO file in form of translator comments, which the translator
4111 is free to create, delete, or modify at will. These comments may
4112 be useful to herself when she returns to this PO file after a while.
4114 Comments not having whitespace after the initial @samp{#}, for example,
4115 those beginning with @samp{#.} or @samp{#:}, are @emph{not} translator
4116 comments, they are exclusively created by other @code{gettext} tools.
4117 So, the commands below will never alter such system added comments,
4118 they are not meant for the translator to modify. @xref{PO Files}.
4120 The following commands are somewhat similar to those modifying translations,
4121 so the general indications given for those apply here. @xref{Modifying
4127 @efindex #@r{, PO Mode command}
4128 Interactively edit the translator comments (@code{po-edit-comment}).
4131 @efindex K@r{, PO Mode command}
4132 Save the translator comments on the kill ring, and delete it
4133 (@code{po-kill-comment}).
4136 @efindex W@r{, PO Mode command}
4137 Save the translator comments on the kill ring, without deleting it
4138 (@code{po-kill-ring-save-comment}).
4141 @efindex Y@r{, PO Mode command}
4142 Replace the translator comments, taking the new from the kill ring
4143 (@code{po-yank-comment}).
4147 These commands parallel PO mode commands for modifying the translation
4148 strings, and behave much the same way as they do, except that they handle
4149 this part of PO file comments meant for translator usage, rather
4150 than the translation strings. So, if the descriptions given below are
4151 slightly succinct, it is because the full details have already been given.
4152 @xref{Modifying Translations}.
4154 @efindex #@r{, PO Mode command}
4155 @efindex po-edit-comment@r{, PO Mode command}
4156 The command @kbd{#} (@code{po-edit-comment}) opens a new Emacs window
4157 containing a copy of the translator comments on the current PO file entry.
4158 If there are no such comments, PO mode understands that the translator wants
4159 to add a comment to the entry, and she is presented with an empty screen.
4160 Comment marks (@code{#}) and the space following them are automatically
4161 removed before edition, and reinstated after. For translator comments
4162 pertaining to obsolete entries, the uncommenting and recommenting operations
4163 are done twice. Once in the editing window, the keys @w{@kbd{C-c C-c}}
4164 allow the translator to tell she is finished with editing the comment.
4165 @xref{Subedit}, for further details.
4167 @evindex po-subedit-mode-hook@r{, PO Mode variable}
4168 Functions found on @code{po-subedit-mode-hook}, if any, are executed after
4169 the string has been inserted in the edit buffer.
4171 @efindex K@r{, PO Mode command}
4172 @efindex po-kill-comment@r{, PO Mode command}
4173 @efindex W@r{, PO Mode command}
4174 @efindex po-kill-ring-save-comment@r{, PO Mode command}
4175 @efindex Y@r{, PO Mode command}
4176 @efindex po-yank-comment@r{, PO Mode command}
4177 The command @kbd{K} (@code{po-kill-comment}) gets rid of all
4178 translator comments, while saving those comments on the kill ring.
4179 The command @kbd{W} (@code{po-kill-ring-save-comment}) takes
4180 a copy of the translator comments on the kill ring, but leaves
4181 them undisturbed in the current entry. The command @kbd{Y}
4182 (@code{po-yank-comment}) completely replaces the translator comments
4183 by a string taken at the front of the kill ring. When this command
4184 is immediately repeated, the comments just inserted are withdrawn,
4185 and replaced by other strings taken along the kill ring.
4187 On the kill ring, all strings have the same nature. There is no
4188 distinction between @emph{translation} strings and @emph{translator
4189 comments} strings. So, for example, let's presume the translator
4190 has just finished editing a translation, and wants to create a new
4191 translator comment to document why the previous translation was
4192 not good, just to remember what was the problem. Foreseeing that she
4193 will do that in her documentation, the translator may want to quote
4194 the previous translation in her translator comments. To do so, she
4195 may initialize the translator comments with the previous translation,
4196 still at the head of the kill ring. Because editing already pushed the
4197 previous translation on the kill ring, she merely has to type @kbd{M-w}
4198 prior to @kbd{#}, and the previous translation will be right there,
4199 all ready for being introduced by some explanatory text.
4201 On the other hand, presume there are some translator comments already
4202 and that the translator wants to add to those comments, instead
4203 of wholly replacing them. Then, she should edit the comment right
4204 away with @kbd{#}. Once inside the editing window, she can use the
4205 regular Emacs commands @kbd{C-y} (@code{yank}) and @kbd{M-y}
4206 (@code{yank-pop}) to get the previous translation where she likes.
4208 @node Subedit, C Sources Context, Modifying Comments, PO Mode
4209 @subsection Details of Sub Edition
4210 @emindex subedit minor mode
4212 The PO subedit minor mode has a few peculiarities worth being described
4213 in fuller detail. It installs a few commands over the usual editing set
4214 of Emacs, which are described below.
4218 @efindex C-c C-c@r{, PO Mode command}
4219 Complete edition (@code{po-subedit-exit}).
4222 @efindex C-c C-k@r{, PO Mode command}
4223 Abort edition (@code{po-subedit-abort}).
4226 @efindex C-c C-a@r{, PO Mode command}
4227 Consult auxiliary PO files (@code{po-subedit-cycle-auxiliary}).
4231 @emindex exiting PO subedit
4232 @efindex C-c C-c@r{, PO Mode command}
4233 @efindex po-subedit-exit@r{, PO Mode command}
4234 The window's contents represents a translation for a given message,
4235 or a translator comment. The translator may modify this window to
4236 her heart's content. Once this is done, the command @w{@kbd{C-c C-c}}
4237 (@code{po-subedit-exit}) may be used to return the edited translation into
4238 the PO file, replacing the original translation, even if it moved out of
4239 sight or if buffers were switched.
4241 @efindex C-c C-k@r{, PO Mode command}
4242 @efindex po-subedit-abort@r{, PO Mode command}
4243 If the translator becomes unsatisfied with her translation or comment,
4244 to the extent she prefers keeping what was existent prior to the
4245 @kbd{@key{RET}} or @kbd{#} command, she may use the command @w{@kbd{C-c C-k}}
4246 (@code{po-subedit-abort}) to merely get rid of edition, while preserving
4247 the original translation or comment. Another way would be for her to exit
4248 normally with @w{@kbd{C-c C-c}}, then type @code{U} once for undoing the
4249 whole effect of last edition.
4251 @efindex C-c C-a@r{, PO Mode command}
4252 @efindex po-subedit-cycle-auxiliary@r{, PO Mode command}
4253 The command @w{@kbd{C-c C-a}} (@code{po-subedit-cycle-auxiliary})
4254 allows for glancing through translations
4255 already achieved in other languages, directly while editing the current
4256 translation. This may be quite convenient when the translator is fluent
4257 at many languages, but of course, only makes sense when such completed
4258 auxiliary PO files are already available to her (@pxref{Auxiliary}).
4260 Functions found on @code{po-subedit-mode-hook}, if any, are executed after
4261 the string has been inserted in the edit buffer.
4263 While editing her translation, the translator should pay attention to not
4264 inserting unwanted @kbd{@key{RET}} (newline) characters at the end of
4265 the translated string if those are not meant to be there, or to removing
4266 such characters when they are required. Since these characters are not
4267 visible in the editing buffer, they are easily introduced by mistake.
4268 To help her, @kbd{@key{RET}} automatically puts the character @code{<}
4269 at the end of the string being edited, but this @code{<} is not really
4270 part of the string. On exiting the editing window with @w{@kbd{C-c C-c}},
4271 PO mode automatically removes such @kbd{<} and all whitespace added after
4272 it. If the translator adds characters after the terminating @code{<}, it
4273 looses its delimiting property and integrally becomes part of the string.
4274 If she removes the delimiting @code{<}, then the edited string is taken
4275 @emph{as is}, with all trailing newlines, even if invisible. Also, if
4276 the translated string ought to end itself with a genuine @code{<}, then
4277 the delimiting @code{<} may not be removed; so the string should appear,
4278 in the editing window, as ending with two @code{<} in a row.
4280 @emindex editing multiple entries
4281 When a translation (or a comment) is being edited, the translator may move
4282 the cursor back into the PO file buffer and freely move to other entries,
4283 browsing at will. If, with an edition pending, the translator wanders in the
4284 PO file buffer, she may decide to start modifying another entry. Each entry
4285 being edited has its own subedit buffer. It is possible to simultaneously
4286 edit the translation @emph{and} the comment of a single entry, or to
4287 edit entries in different PO files, all at once. Typing @kbd{@key{RET}}
4288 on a field already being edited merely resumes that particular edit. Yet,
4289 the translator should better be comfortable at handling many Emacs windows!
4291 @emindex pending subedits
4292 Pending subedits may be completed or aborted in any order, regardless
4293 of how or when they were started. When many subedits are pending and the
4294 translator asks for quitting the PO file (with the @kbd{q} command), subedits
4295 are automatically resumed one at a time, so she may decide for each of them.
4297 @node C Sources Context, Auxiliary, Subedit, PO Mode
4298 @subsection C Sources Context
4299 @emindex consulting program sources
4300 @emindex looking at the source to aid translation
4301 @emindex use the source, Luke
4303 PO mode is particularly powerful when used with PO files
4304 created through GNU @code{gettext} utilities, as those utilities
4305 insert special comments in the PO files they generate.
4306 Some of these special comments relate the PO file entry to
4307 exactly where the untranslated string appears in the program sources.
4309 When the translator gets to an untranslated entry, she is fairly
4310 often faced with an original string which is not as informative as
4311 it normally should be, being succinct, cryptic, or otherwise ambiguous.
4312 Before choosing how to translate the string, she needs to understand
4313 better what the string really means and how tight the translation has
4314 to be. Most of the time, when problems arise, the only way left to make
4315 her judgment is looking at the true program sources from where this
4316 string originated, searching for surrounding comments the programmer
4317 might have put in there, and looking around for helping clues of
4320 Surely, when looking at program sources, the translator will receive
4321 more help if she is a fluent programmer. However, even if she is
4322 not versed in programming and feels a little lost in C code, the
4323 translator should not be shy at taking a look, once in a while.
4324 It is most probable that she will still be able to find some of the
4325 hints she needs. She will learn quickly to not feel uncomfortable
4326 in program code, paying more attention to programmer's comments,
4327 variable and function names (if he dared choosing them well), and
4328 overall organization, than to the program code itself.
4330 @emindex find source fragment for a PO file entry
4331 The following commands are meant to help the translator at getting
4332 program source context for a PO file entry.
4336 @efindex s@r{, PO Mode command}
4337 Resume the display of a program source context, or cycle through them
4338 (@code{po-cycle-source-reference}).
4341 @efindex M-s@r{, PO Mode command}
4342 Display of a program source context selected by menu
4343 (@code{po-select-source-reference}).
4346 @efindex S@r{, PO Mode command}
4347 Add a directory to the search path for source files
4348 (@code{po-consider-source-path}).
4351 @efindex M-S@r{, PO Mode command}
4352 Delete a directory from the search path for source files
4353 (@code{po-ignore-source-path}).
4357 @efindex s@r{, PO Mode command}
4358 @efindex po-cycle-source-reference@r{, PO Mode command}
4359 @efindex M-s@r{, PO Mode command}
4360 @efindex po-select-source-reference@r{, PO Mode command}
4361 The commands @kbd{s} (@code{po-cycle-source-reference}) and @kbd{M-s}
4362 (@code{po-select-source-reference}) both open another window displaying
4363 some source program file, and already positioned in such a way that
4364 it shows an actual use of the string to be translated. By doing
4365 so, the command gives source program context for the string. But if
4366 the entry has no source context references, or if all references
4367 are unresolved along the search path for program sources, then the
4368 command diagnoses this as an error.
4370 Even if @kbd{s} (or @kbd{M-s}) opens a new window, the cursor stays
4371 in the PO file window. If the translator really wants to
4372 get into the program source window, she ought to do it explicitly,
4373 maybe by using command @kbd{O}.
4375 When @kbd{s} is typed for the first time, or for a PO file entry which
4376 is different of the last one used for getting source context, then the
4377 command reacts by giving the first context available for this entry,
4378 if any. If some context has already been recently displayed for the
4379 current PO file entry, and the translator wandered off to do other
4380 things, typing @kbd{s} again will merely resume, in another window,
4381 the context last displayed. In particular, if the translator moved
4382 the cursor away from the context in the source file, the command will
4383 bring the cursor back to the context. By using @kbd{s} many times
4384 in a row, with no other commands intervening, PO mode will cycle to
4385 the next available contexts for this particular entry, getting back
4386 to the first context once the last has been shown.
4388 The command @kbd{M-s} behaves differently. Instead of cycling through
4389 references, it lets the translator choose a particular reference among
4390 many, and displays that reference. It is best used with completion,
4391 if the translator types @kbd{@key{TAB}} immediately after @kbd{M-s}, in
4392 response to the question, she will be offered a menu of all possible
4393 references, as a reminder of which are the acceptable answers.
4394 This command is useful only where there are really many contexts
4395 available for a single string to translate.
4397 @efindex S@r{, PO Mode command}
4398 @efindex po-consider-source-path@r{, PO Mode command}
4399 @efindex M-S@r{, PO Mode command}
4400 @efindex po-ignore-source-path@r{, PO Mode command}
4401 Program source files are usually found relative to where the PO
4402 file stands. As a special provision, when this fails, the file is
4403 also looked for, but relative to the directory immediately above it.
4404 Those two cases take proper care of most PO files. However, it might
4405 happen that a PO file has been moved, or is edited in a different
4406 place than its normal location. When this happens, the translator
4407 should tell PO mode in which directory normally sits the genuine PO
4408 file. Many such directories may be specified, and all together, they
4409 constitute what is called the @dfn{search path} for program sources.
4410 The command @kbd{S} (@code{po-consider-source-path}) is used to interactively
4411 enter a new directory at the front of the search path, and the command
4412 @kbd{M-S} (@code{po-ignore-source-path}) is used to select, with completion,
4413 one of the directories she does not want anymore on the search path.
4415 @node Auxiliary, , C Sources Context, PO Mode
4416 @subsection Consulting Auxiliary PO Files
4417 @emindex consulting translations to other languages
4419 PO mode is able to help the knowledgeable translator, being fluent in
4420 many languages, at taking advantage of translations already achieved
4421 in other languages she just happens to know. It provides these other
4422 language translations as additional context for her own work. Moreover,
4423 it has features to ease the production of translations for many languages
4424 at once, for translators preferring to work in this way.
4426 @cindex auxiliary PO file
4427 @emindex auxiliary PO file
4428 An @dfn{auxiliary} PO file is an existing PO file meant for the same
4429 package the translator is working on, but targeted to a different mother
4430 tongue language. Commands exist for declaring and handling auxiliary
4431 PO files, and also for showing contexts for the entry under work.
4433 Here are the auxiliary file commands available in PO mode.
4437 @efindex a@r{, PO Mode command}
4438 Seek auxiliary files for another translation for the same entry
4439 (@code{po-cycle-auxiliary}).
4442 @efindex C-c C-a@r{, PO Mode command}
4443 Switch to a particular auxiliary file (@code{po-select-auxiliary}).
4446 @efindex A@r{, PO Mode command}
4447 Declare this PO file as an auxiliary file (@code{po-consider-as-auxiliary}).
4450 @efindex M-A@r{, PO Mode command}
4451 Remove this PO file from the list of auxiliary files
4452 (@code{po-ignore-as-auxiliary}).
4456 @efindex A@r{, PO Mode command}
4457 @efindex po-consider-as-auxiliary@r{, PO Mode command}
4458 @efindex M-A@r{, PO Mode command}
4459 @efindex po-ignore-as-auxiliary@r{, PO Mode command}
4460 Command @kbd{A} (@code{po-consider-as-auxiliary}) adds the current
4461 PO file to the list of auxiliary files, while command @kbd{M-A}
4462 (@code{po-ignore-as-auxiliary} just removes it.
4464 @efindex a@r{, PO Mode command}
4465 @efindex po-cycle-auxiliary@r{, PO Mode command}
4466 The command @kbd{a} (@code{po-cycle-auxiliary}) seeks all auxiliary PO
4467 files, round-robin, searching for a translated entry in some other language
4468 having an @code{msgid} field identical as the one for the current entry.
4469 The found PO file, if any, takes the place of the current PO file in
4470 the display (its window gets on top). Before doing so, the current PO
4471 file is also made into an auxiliary file, if not already. So, @kbd{a}
4472 in this newly displayed PO file will seek another PO file, and so on,
4473 so repeating @kbd{a} will eventually yield back the original PO file.
4475 @efindex C-c C-a@r{, PO Mode command}
4476 @efindex po-select-auxiliary@r{, PO Mode command}
4477 The command @kbd{C-c C-a} (@code{po-select-auxiliary}) asks the translator
4478 for her choice of a particular auxiliary file, with completion, and
4479 then switches to that selected PO file. The command also checks if
4480 the selected file has an @code{msgid} field identical as the one for
4481 the current entry, and if yes, this entry becomes current. Otherwise,
4482 the cursor of the selected file is left undisturbed.
4484 For all this to work fully, auxiliary PO files will have to be normalized,
4485 in that way that @code{msgid} fields should be written @emph{exactly}
4486 the same way. It is possible to write @code{msgid} fields in various
4487 ways for representing the same string, different writing would break the
4488 proper behaviour of the auxiliary file commands of PO mode. This is not
4489 expected to be much a problem in practice, as most existing PO files have
4490 their @code{msgid} entries written by the same GNU @code{gettext} tools.
4492 @efindex normalize@r{, PO Mode command}
4493 However, PO files initially created by PO mode itself, while marking
4494 strings in source files, are normalised differently. So are PO
4495 files resulting of the @samp{M-x normalize} command. Until these
4496 discrepancies between PO mode and other GNU @code{gettext} tools get
4497 fully resolved, the translator should stay aware of normalisation issues.
4499 @node Compendium, , PO Mode, Editing
4500 @section Using Translation Compendia
4501 @emindex using translation compendia
4504 A @dfn{compendium} is a special PO file containing a set of
4505 translations recurring in many different packages. The translator can
4506 use gettext tools to build a new compendium, to add entries to her
4507 compendium, and to initialize untranslated entries, or to update
4508 already translated entries, from translations kept in the compendium.
4511 * Creating Compendia:: Merging translations for later use
4512 * Using Compendia:: Using older translations if they fit
4515 @node Creating Compendia, Using Compendia, Compendium, Compendium
4516 @subsection Creating Compendia
4517 @cindex creating compendia
4518 @cindex compendium, creating
4520 Basically every PO file consisting of translated entries only can be
4521 declared as a valid compendium. Often the translator wants to have
4522 special compendia; let's consider two cases: @cite{concatenating PO
4523 files} and @cite{extracting a message subset from a PO file}.
4525 @subsubsection Concatenate PO Files
4527 @cindex concatenating PO files into a compendium
4528 @cindex accumulating translations
4529 To concatenate several valid PO files into one compendium file you can
4530 use @samp{msgcomm} or @samp{msgcat} (the latter preferred):
4533 msgcat -o compendium.po file1.po file2.po
4536 By default, @code{msgcat} will accumulate divergent translations
4537 for the same string. Those occurrences will be marked as @code{fuzzy}
4538 and highly visible decorated; calling @code{msgcat} on
4544 msgid "Report bugs to <%s>.\n"
4545 msgstr "Comunicar `bugs' a <%s>.\n"
4549 and @file{file2.po}:
4554 msgid "Report bugs to <%s>.\n"
4555 msgstr "Comunicar \"bugs\" a <%s>.\n"
4562 #: src/hello.c:200 src/bye.c:100
4564 msgid "Report bugs to <%s>.\n"
4566 "#-#-#-#-# file1.po #-#-#-#-#\n"
4567 "Comunicar `bugs' a <%s>.\n"
4568 "#-#-#-#-# file2.po #-#-#-#-#\n"
4569 "Comunicar \"bugs\" a <%s>.\n"
4573 The translator will have to resolve this ``conflict'' manually; she
4574 has to decide whether the first or the second version is appropriate
4575 (or provide a new translation), to delete the ``marker lines'', and
4576 finally to remove the @code{fuzzy} mark.
4578 If the translator knows in advance the first found translation of a
4579 message is always the best translation she can make use to the
4580 @samp{--use-first} switch:
4583 msgcat --use-first -o compendium.po file1.po file2.po
4586 A good compendium file must not contain @code{fuzzy} or untranslated
4587 entries. If input files are ``dirty'' you must preprocess the input
4588 files or postprocess the result using @samp{msgattrib --translated --no-fuzzy}.
4590 @subsubsection Extract a Message Subset from a PO File
4591 @cindex extracting parts of a PO file into a compendium
4593 Nobody wants to translate the same messages again and again; thus you
4594 may wish to have a compendium file containing @file{getopt.c} messages.
4596 To extract a message subset (e.g., all @file{getopt.c} messages) from an
4597 existing PO file into one compendium file you can use @samp{msggrep}:
4600 msggrep --location src/getopt.c -o compendium.po file.po
4603 @node Using Compendia, , Creating Compendia, Compendium
4604 @subsection Using Compendia
4606 You can use a compendium file to initialize a translation from scratch
4607 or to update an already existing translation.
4609 @subsubsection Initialize a New Translation File
4610 @cindex initialize translations from a compendium
4612 Since a PO file with translations does not exist the translator can
4613 merely use @file{/dev/null} to fake the ``old'' translation file.
4616 msgmerge --compendium compendium.po -o file.po /dev/null file.pot
4619 @subsubsection Update an Existing Translation File
4620 @cindex update translations from a compendium
4622 Concatenate the compendium file(s) and the existing PO, merge the
4623 result with the POT file and remove the obsolete entries (optional,
4624 here done using @samp{msgattrib}):
4627 msgcat --use-first -o update.po compendium1.po compendium2.po file.po
4628 msgmerge update.po file.pot | msgattrib --no-obsolete > file.po
4631 @node Manipulating, Binaries, Editing, Top
4632 @chapter Manipulating PO Files
4633 @cindex manipulating PO files
4635 Sometimes it is necessary to manipulate PO files in a way that is better
4636 performed automatically than by hand. GNU @code{gettext} includes a
4637 complete set of tools for this purpose.
4639 @cindex merging two PO files
4640 When merging two packages into a single package, the resulting POT file
4641 will be the concatenation of the two packages' POT files. Thus the
4642 maintainer must concatenate the two existing package translations into
4643 a single translation catalog, for each language. This is best performed
4644 using @samp{msgcat}. It is then the translators' duty to deal with any
4645 possible conflicts that arose during the merge.
4647 @cindex encoding conversion
4648 When a translator takes over the translation job from another translator,
4649 but she uses a different character encoding in her locale, she will
4650 convert the catalog to her character encoding. This is best done through
4651 the @samp{msgconv} program.
4653 When a maintainer takes a source file with tagged messages from another
4654 package, he should also take the existing translations for this source
4655 file (and not let the translators do the same job twice). One way to do
4656 this is through @samp{msggrep}, another is to create a POT file for
4657 that source file and use @samp{msgmerge}.
4661 When a translator wants to adjust some translation catalog for a special
4662 dialect or orthography --- for example, German as written in Switzerland
4663 versus German as written in Germany --- she needs to apply some text
4664 processing to every message in the catalog. The tool for doing this is
4667 Another use of @code{msgfilter} is to produce approximately the POT file for
4668 which a given PO file was made. This can be done through a filter command
4669 like @samp{msgfilter sed -e d | sed -e '/^# /d'}. Note that the original
4670 POT file may have had different comments and different plural message counts,
4671 that's why it's better to use the original POT file if available.
4673 @cindex checking of translations
4674 When a translator wants to check her translations, for example according
4675 to orthography rules or using a non-interactive spell checker, she can do
4676 so using the @samp{msgexec} program.
4678 @cindex duplicate elimination
4679 When third party tools create PO or POT files, sometimes duplicates cannot
4680 be avoided. But the GNU @code{gettext} tools give an error when they
4681 encounter duplicate msgids in the same file and in the same domain.
4682 To merge duplicates, the @samp{msguniq} program can be used.
4684 @samp{msgcomm} is a more general tool for keeping or throwing away
4685 duplicates, occurring in different files.
4687 @samp{msgcmp} can be used to check whether a translation catalog is
4688 completely translated.
4690 @cindex attributes, manipulating
4691 @samp{msgattrib} can be used to select and extract only the fuzzy
4692 or untranslated messages of a translation catalog.
4694 @samp{msgen} is useful as a first step for preparing English translation
4695 catalogs. It copies each message's msgid to its msgstr.
4697 Finally, for those applications where all these various programs are not
4698 sufficient, a library @samp{libgettextpo} is provided that can be used to
4699 write other specialized programs that process PO files.
4702 * msgcat Invocation:: Invoking the @code{msgcat} Program
4703 * msgconv Invocation:: Invoking the @code{msgconv} Program
4704 * msggrep Invocation:: Invoking the @code{msggrep} Program
4705 * msgfilter Invocation:: Invoking the @code{msgfilter} Program
4706 * msguniq Invocation:: Invoking the @code{msguniq} Program
4707 * msgcomm Invocation:: Invoking the @code{msgcomm} Program
4708 * msgcmp Invocation:: Invoking the @code{msgcmp} Program
4709 * msgattrib Invocation:: Invoking the @code{msgattrib} Program
4710 * msgen Invocation:: Invoking the @code{msgen} Program
4711 * msgexec Invocation:: Invoking the @code{msgexec} Program
4712 * Colorizing:: Highlighting parts of PO files
4713 * libgettextpo:: Writing your own programs that process PO files
4716 @node msgcat Invocation, msgconv Invocation, Manipulating, Manipulating
4717 @section Invoking the @code{msgcat} Program
4719 @include msgcat.texi
4721 @node msgconv Invocation, msggrep Invocation, msgcat Invocation, Manipulating
4722 @section Invoking the @code{msgconv} Program
4724 @include msgconv.texi
4726 @node msggrep Invocation, msgfilter Invocation, msgconv Invocation, Manipulating
4727 @section Invoking the @code{msggrep} Program
4729 @include msggrep.texi
4731 @node msgfilter Invocation, msguniq Invocation, msggrep Invocation, Manipulating
4732 @section Invoking the @code{msgfilter} Program
4734 @include msgfilter.texi
4736 @node msguniq Invocation, msgcomm Invocation, msgfilter Invocation, Manipulating
4737 @section Invoking the @code{msguniq} Program
4739 @include msguniq.texi
4741 @node msgcomm Invocation, msgcmp Invocation, msguniq Invocation, Manipulating
4742 @section Invoking the @code{msgcomm} Program
4744 @include msgcomm.texi
4746 @node msgcmp Invocation, msgattrib Invocation, msgcomm Invocation, Manipulating
4747 @section Invoking the @code{msgcmp} Program
4749 @include msgcmp.texi
4751 @node msgattrib Invocation, msgen Invocation, msgcmp Invocation, Manipulating
4752 @section Invoking the @code{msgattrib} Program
4754 @include msgattrib.texi
4756 @node msgen Invocation, msgexec Invocation, msgattrib Invocation, Manipulating
4757 @section Invoking the @code{msgen} Program
4761 @node msgexec Invocation, Colorizing, msgen Invocation, Manipulating
4762 @section Invoking the @code{msgexec} Program
4764 @include msgexec.texi
4766 @node Colorizing, libgettextpo, msgexec Invocation, Manipulating
4767 @section Highlighting parts of PO files
4769 Translators are usually only interested in seeing the untranslated and
4770 fuzzy messages of a PO file. Also, when a message is set fuzzy because
4771 the msgid changed, they want to see the differences between the previous
4772 msgid and the current one (especially if the msgid is long and only few
4773 words in it have changed). Finally, it's always welcome to highlight the
4774 different sections of a message in a PO file (comments, msgid, msgstr, etc.).
4776 Such highlighting is possible through the @code{msgcat} options
4777 @samp{--color} and @samp{--style}.
4780 * The --color option:: Triggering colorized output
4781 * The TERM variable:: The environment variable @code{TERM}
4782 * The --style option:: The @code{--style} option
4783 * Style rules:: Style rules for PO files
4784 * Customizing less:: Customizing @code{less} for viewing PO files
4787 @node The --color option, The TERM variable, , Colorizing
4788 @subsection The @code{--color} option
4790 @opindex --color@r{, @code{msgcat} option}
4791 The @samp{--color=@var{when}} option specifies under which conditions
4792 colorized output should be generated. The @var{when} part can be one of
4798 The output will be colorized.
4802 The output will not be colorized.
4806 The output will be colorized if the output device is a tty, i.e.@: when the
4807 output goes directly to a text screen or terminal emulator window.
4810 The output will be colorized and be in HTML format.
4814 @samp{--color} is equivalent to @samp{--color=yes}. The default is
4815 @samp{--color=auto}.
4817 Thus, a command like @samp{msgcat vi.po} will produce colorized output
4818 when called by itself in a command window. Whereas in a pipe, such as
4819 @samp{msgcat vi.po | less -R}, it will not produce colorized output. To
4820 get colorized output in this situation nevertheless, use the command
4821 @samp{msgcat --color vi.po | less -R}.
4823 The @samp{--color=html} option will produce output that can be viewed in
4824 a browser. This can be useful, for example, for Indic languages,
4825 because the renderic of Indic scripts in browser is usually better than
4826 in terminal emulators.
4828 Note that the output produced with the @code{--color} option is @emph{not}
4829 a valid PO file in itself. It contains additional terminal-specific escape
4830 sequences or HTML tags. A PO file reader will give a syntax error when
4831 confronted with such content. Except for the @samp{--color=html} case,
4832 you therefore normally don't need to save output produced with the
4833 @code{--color} option in a file.
4835 @node The TERM variable, The --style option, The --color option, Colorizing
4836 @subsection The environment variable @code{TERM}
4838 @vindex TERM@r{, environment variable}
4839 The environment variable @code{TERM} contains a identifier for the text
4840 window's capabilities. You can get a detailed list of these cababilities
4841 by using the @samp{infocmp} command, using @samp{man 5 terminfo} as a
4844 When producing text with embedded color directives, @code{msgcat} looks
4845 at the @code{TERM} variable. Text windows today typically support at least
4846 8 colors. Often, however, the text window supports 16 or more colors,
4847 even though the @code{TERM} variable is set to a identifier denoting only
4848 8 supported colors. It can be worth setting the @code{TERM} variable to
4849 a different value in these cases:
4853 @code{xterm} is in most cases built with support for 16 colors. It can also
4854 be built with support for 88 or 256 colors (but not both). You can try to
4855 set @code{TERM} to either @code{xterm-16color}, @code{xterm-88color}, or
4856 @code{xterm-256color}.
4859 @code{rxvt} is often built with support for 16 colors. You can try to set
4860 @code{TERM} to @code{rxvt-16color}.
4863 @code{konsole} too is often built with support for 16 colors. You can try to
4864 set @code{TERM} to @code{konsole-16color} or @code{xterm-16color}.
4867 After setting @code{TERM}, you can verify it by invoking
4868 @samp{msgcat --color=test} and seeing whether the output looks like a
4869 reasonable color map.
4871 @node The --style option, Style rules, The TERM variable, Colorizing
4872 @subsection The @code{--style} option
4874 @opindex --style@r{, @code{msgcat} option}
4875 The @samp{--style=@var{style_file}} option specifies the style file to use
4876 when colorizing. It has an effect only when the @code{--color} option is
4879 @vindex PO_STYLE@r{, environment variable}
4880 If the @code{--style} option is not specified, the environment variable
4881 @code{PO_STYLE} is considered. It is meant to point to the user's
4882 preferred style for PO files.
4884 The default style file is @file{$prefix/share/gettext/styles/po-default.css},
4885 where @code{$prefix} is the installation location.
4887 A few style files are predefined:
4890 This style imitates the look used by vim 7.
4892 @item po-emacs-x.css
4893 This style imitates the look used by GNU Emacs 21 and 22 in an X11 window.
4895 @item po-emacs-xterm.css
4896 @itemx po-emacs-xterm16.css
4897 @itemx po-emacs-xterm256.css
4898 This style imitates the look used by GNU Emacs 22 in a terminal of type
4899 @samp{xterm} (8 colors) or @samp{xterm-16color} (16 colors) or
4900 @samp{xterm-256color} (256 colors), respectively.
4904 You can use these styles without specifying a directory. They are actually
4905 located in @file{$prefix/share/gettext/styles/}, where @code{$prefix} is the
4906 installation location.
4908 You can also design your own styles. This is described in the next section.
4911 @node Style rules, Customizing less, The --style option, Colorizing
4912 @subsection Style rules for PO files
4914 The same style file can be used for styling of a PO file, for terminal
4915 output and for HTML output. It is written in CSS (Cascading Style Sheet)
4916 syntax. See @url{http://www.w3.org/TR/css2/cover.html} for a formal
4917 definition of CSS. Many HTML authoring tutorials also contain explanations
4920 In the case of HTML output, the style file is embedded in the HTML output.
4921 In the case of text output, the style file is interpreted by the
4922 @code{msgcat} program. This means, in particular, that when
4923 @code{@@import} is used with relative file names, the file names are
4927 relative to the resulting HTML file, in the case of HTML output,
4930 relative to the style sheet containing the @code{@@import}, in the case of
4931 text output. (Actually, @code{@@import}s are not yet supported in this case,
4932 due to a limitation in @code{libcroco}.)
4935 CSS rules are built up from selectors and declarations. The declarations
4936 specify graphical properties; the selectors specify specify when they apply.
4938 In PO files, the following simple selectors (based on "CSS classes", see
4939 the CSS2 spec, section 5.8.3) are supported.
4943 Selectors that apply to entire messages:
4947 This matches the header entry of a PO file.
4950 This matches a translated message.
4953 This matches an untranslated message (i.e.@: a message with empty translation).
4956 This matches a fuzzy message (i.e.@: a message which has a translation that
4957 needs review by the translator).
4960 This matches an obsolete message (i.e.@: a message that was translated but is
4961 not needed by the current POT file any more).
4965 Selectors that apply to parts of a message in PO syntax. Recall the general
4966 structure of a message in PO syntax:
4970 # @var{translator-comments}
4971 #. @var{extracted-comments}
4972 #: @var{reference}@dots{}
4973 #, @var{flag}@dots{}
4974 #| msgid @var{previous-untranslated-string}
4975 msgid @var{untranslated-string}
4976 msgstr @var{translated-string}
4981 This matches all comments (translator comments, extracted comments,
4982 source file reference comments, flag comments, previous message comments,
4983 as well as the entire obsolete messages).
4985 @item .translator-comment
4986 This matches the translator comments.
4988 @item .extracted-comment
4989 This matches the extracted comments, i.e.@: the comments placed by the
4990 programmer at the attention of the translator.
4992 @item .reference-comment
4993 This matches the source file reference comments (entire lines).
4996 This matches the individual source file references inside the source file
4997 reference comment lines.
5000 This matches the flag comment lines (entire lines).
5003 This matches the individual flags inside flag comment lines.
5006 This matches the `fuzzy' flag inside flag comment lines.
5008 @item .previous-comment
5009 This matches the comments containing the previous untranslated string (entire
5013 This matches the previous untranslated string including the string delimiters,
5014 the associated keywords (@code{msgid} etc.) and the spaces between them.
5017 This matches the untranslated string including the string delimiters,
5018 the associated keywords (@code{msgid} etc.) and the spaces between them.
5021 This matches the translated string including the string delimiters,
5022 the associated keywords (@code{msgstr} etc.) and the spaces between them.
5025 This matches the keywords (@code{msgid}, @code{msgstr}, etc.).
5028 This matches strings, including the string delimiters (double quotes).
5032 Selectors that apply to parts of strings:
5036 This matches the entire contents of a string (excluding the string delimiters,
5037 i.e.@: the double quotes).
5039 @item .escape-sequence
5040 This matches an escape sequence (starting with a backslash).
5042 @item .format-directive
5043 This matches a format string directive (starting with a @samp{%} sign in the
5044 case of most programming languages, with a @samp{@{} in the case of
5045 @code{java-format} and @code{csharp-format}, with a @samp{~} in the case of
5046 @code{lisp-format} and @code{scheme-format}, or with @samp{$} in the case of
5049 @item .invalid-format-directive
5050 This matches an invalid format string directive.
5053 In an untranslated string, this matches a part of the string that was not
5054 present in the previous untranslated string. (Not yet implemented in this
5058 In an untranslated string or in a previous untranslated string, this matches
5059 a part of the string that is changed or replaced. (Not yet implemented in
5063 In a previous untranslated string, this matches a part of the string that
5064 is not present in the current untranslated string. (Not yet implemented in
5069 These selectors can be combined to hierarchical selectors. For example,
5072 .msgstr .invalid-format-directive @{ color: red; @}
5076 will highlight the invalid format directives in the translated strings.
5078 In text mode, pseudo-classes (CSS2 spec, section 5.11) and pseudo-elements
5079 (CSS2 spec, section 5.12) are not supported.
5081 The declarations in HTML mode are not limited; any graphical attribute
5082 supported by the browsers can be used.
5084 The declarations in text mode are limited to the following properties. Other
5085 properties will be silently ignored.
5088 @item @code{color} (CSS2 spec, section 14.1)
5089 @itemx @code{background-color} (CSS2 spec, section 14.2.1)
5090 These properties is supported. Colors will be adjusted to match the terminal's
5091 capabilities. Note that many terminals support only 8 colors.
5093 @item @code{font-weight} (CSS2 spec, section 15.2.3)
5094 This property is supported, but most terminals can only render two different
5095 weights: @code{normal} and @code{bold}. Values >= 600 are rendered as
5098 @item @code{font-style} (CSS2 spec, section 15.2.3)
5099 This property is supported. The values @code{italic} and @code{oblique} are
5100 rendered the same way.
5102 @item @code{text-decoration} (CSS2 spec, section 16.3.1)
5103 This property is supported, limited to the values @code{none} and
5107 @node Customizing less, , Style rules, Colorizing
5108 @subsection Customizing @code{less} for viewing PO files
5110 The @samp{less} program is a popular text file browser for use in a text
5111 screen or terminal emulator. It also supports text with embedded escape
5112 sequences for colors and text decorations.
5114 You can use @code{less} to view a PO file like this (assuming an UTF-8
5118 msgcat --to-code=UTF-8 --color xyz.po | less -R
5121 You can simplify this to this simple command:
5128 after these three preparations:
5132 Add the options @samp{-R} and @samp{-f} to the @code{LESS} environment
5133 variable. In sh shells:
5135 $ LESS="$LESS -R -f"
5140 If your system does not already have the @file{lessopen.sh} and
5141 @file{lessclose.sh} scripts, create them and set the @code{LESSOPEN} and
5142 @code{LESSCLOSE} environment variables, as indicated in the manual page
5146 Add to @file{lessopen.sh} a piece of script that recognizes PO files
5147 through their file extension and invokes @code{msgcat} on them, producing
5148 a temporary file. Like this:
5153 tmpfile=`mktemp "$@{TMPDIR-/tmp@}/less.XXXXXX"`
5154 msgcat --to-code=UTF-8 --color "$1" > "$tmpfile"
5162 @node libgettextpo, , Colorizing, Manipulating
5163 @section Writing your own programs that process PO files
5165 For the tasks for which a combination of @samp{msgattrib}, @samp{msgcat} etc.
5166 is not sufficient, a set of C functions is provided in a library, to make it
5167 possible to process PO files in your own programs. When you use this library,
5168 you don't need to write routines to parse the PO file; instead, you retrieve
5169 a pointer in memory to each of messages contained in the PO file. Functions
5170 for writing PO files are not provided at this time.
5172 The functions are declared in the header file @samp{<gettext-po.h>}, and are
5173 defined in a library called @samp{libgettextpo}.
5175 @deftp {Data Type} po_file_t
5176 This is a pointer type that refers to the contents of a PO file, after it has
5177 been read into memory.
5180 @deftp {Data Type} po_message_iterator_t
5181 This is a pointer type that refers to an iterator that produces a sequence of
5185 @deftp {Data Type} po_message_t
5186 This is a pointer type that refers to a message of a PO file, including its
5190 @deftypefun po_file_t po_file_read (const char *@var{filename})
5191 The @code{po_file_read} function reads a PO file into memory. The file name
5192 is given as argument. The return value is a handle to the PO file's contents,
5193 valid until @code{po_file_free} is called on it. In case of error, the return
5194 value is @code{NULL}, and @code{errno} is set.
5197 @deftypefun void po_file_free (po_file_t @var{file})
5198 The @code{po_file_free} function frees a PO file's contents from memory,
5199 including all messages that are only implicitly accessible through iterators.
5202 @deftypefun {const char * const *} po_file_domains (po_file_t @var{file})
5203 The @code{po_file_domains} function returns the domains for which the given
5204 PO file has messages. The return value is a @code{NULL} terminated array
5205 which is valid as long as the @var{file} handle is valid. For PO files which
5206 contain no @samp{domain} directive, the return value contains only one domain,
5207 namely the default domain @code{"messages"}.
5210 @deftypefun po_message_iterator_t po_message_iterator (po_file_t @var{file}, const char *@var{domain})
5211 The @code{po_message_iterator} returns an iterator that will produce the
5212 messages of @var{file} that belong to the given @var{domain}. If @var{domain}
5213 is @code{NULL}, the default domain is used instead. To list the messages,
5214 use the function @code{po_next_message} repeatedly.
5217 @deftypefun void po_message_iterator_free (po_message_iterator_t @var{iterator})
5218 The @code{po_message_iterator_free} function frees an iterator previously
5219 allocated through the @code{po_message_iterator} function.
5222 @deftypefun po_message_t po_next_message (po_message_iterator_t @var{iterator})
5223 The @code{po_next_message} function returns the next message from
5224 @var{iterator} and advances the iterator. It returns @code{NULL} when the
5225 iterator has reached the end of its message list.
5228 The following functions returns details of a @code{po_message_t}. Recall
5229 that the results are valid as long as the @var{file} handle is valid.
5231 @deftypefun {const char *} po_message_msgid (po_message_t @var{message})
5232 The @code{po_message_msgid} function returns the @code{msgid} (untranslated
5233 English string) of a message. This is guaranteed to be non-@code{NULL}.
5236 @deftypefun {const char *} po_message_msgid_plural (po_message_t @var{message})
5237 The @code{po_message_msgid_plural} function returns the @code{msgid_plural}
5238 (untranslated English plural string) of a message with plurals, or @code{NULL}
5239 for a message without plural.
5242 @deftypefun {const char *} po_message_msgstr (po_message_t @var{message})
5243 The @code{po_message_msgstr} function returns the @code{msgstr} (translation)
5244 of a message. For an untranslated message, the return value is an empty
5248 @deftypefun {const char *} po_message_msgstr_plural (po_message_t @var{message}, int @var{index})
5249 The @code{po_message_msgstr_plural} function returns the
5250 @code{msgstr[@var{index}]} of a message with plurals, or @code{NULL} when
5251 the @var{index} is out of range or for a message without plural.
5254 Here is an example code how these functions can be used.
5257 const char *filename = @dots{};
5258 po_file_t file = po_file_read (filename);
5261 error (EXIT_FAILURE, errno, "couldn't open the PO file %s", filename);
5263 const char * const *domains = po_file_domains (file);
5264 const char * const *domainp;
5266 for (domainp = domains; *domainp; domainp++)
5268 const char *domain = *domainp;
5269 po_message_iterator_t iterator = po_message_iterator (file, domain);
5273 po_message_t *message = po_next_message (iterator);
5275 if (message == NULL)
5278 const char *msgid = po_message_msgid (message);
5279 const char *msgstr = po_message_msgstr (message);
5284 po_message_iterator_free (iterator);
5287 po_file_free (file);
5290 @node Binaries, Programmers, Manipulating, Top
5291 @chapter Producing Binary MO Files
5296 * msgfmt Invocation:: Invoking the @code{msgfmt} Program
5297 * msgunfmt Invocation:: Invoking the @code{msgunfmt} Program
5298 * MO Files:: The Format of GNU MO Files
5301 @node msgfmt Invocation, msgunfmt Invocation, Binaries, Binaries
5302 @section Invoking the @code{msgfmt} Program
5304 @include msgfmt.texi
5306 @node msgunfmt Invocation, MO Files, msgfmt Invocation, Binaries
5307 @section Invoking the @code{msgunfmt} Program
5309 @include msgunfmt.texi
5311 @node MO Files, , msgunfmt Invocation, Binaries
5312 @section The Format of GNU MO Files
5313 @cindex MO file's format
5314 @cindex file format, @file{.mo}
5316 The format of the generated MO files is best described by a picture,
5317 which appears below.
5319 @cindex magic signature of MO files
5320 The first two words serve the identification of the file. The magic
5321 number will always signal GNU MO files. The number is stored in the
5322 byte order of the generating machine, so the magic number really is
5323 two numbers: @code{0x950412de} and @code{0xde120495}.
5325 The second word describes the current revision of the file format,
5326 composed of a major and a minor revision number. The revision numbers
5327 ensure that the readers of MO files can distinguish new formats from
5328 old ones and handle their contents, as far as possible. For now the
5329 major revision is 0 or 1, and the minor revision is also 0 or 1. More
5330 revisions might be added in the future. A program seeing an unexpected
5331 major revision number should stop reading the MO file entirely; whereas
5332 an unexpected minor revision number means that the file can be read but
5333 will not reveal its full contents, when parsed by a program that
5334 supports only smaller minor revision numbers.
5337 separate from the magic number, instead of using different magic
5338 numbers for different formats, mainly because @file{/etc/magic} is
5341 Follow a number of pointers to later tables in the file, allowing
5342 for the extension of the prefix part of MO files without having to
5343 recompile programs reading them. This might become useful for later
5344 inserting a few flag bits, indication about the charset used, new
5345 tables, or other things.
5347 Then, at offset @var{O} and offset @var{T} in the picture, two tables
5348 of string descriptors can be found. In both tables, each string
5349 descriptor uses two 32 bits integers, one for the string length,
5350 another for the offset of the string in the MO file, counting in bytes
5351 from the start of the file. The first table contains descriptors
5352 for the original strings, and is sorted so the original strings
5353 are in increasing lexicographical order. The second table contains
5354 descriptors for the translated strings, and is parallel to the first
5355 table: to find the corresponding translation one has to access the
5356 array slot in the second array with the same index.
5358 Having the original strings sorted enables the use of simple binary
5359 search, for when the MO file does not contain an hashing table, or
5360 for when it is not practical to use the hashing table provided in
5361 the MO file. This also has another advantage, as the empty string
5362 in a PO file GNU @code{gettext} is usually @emph{translated} into
5363 some system information attached to that particular MO file, and the
5364 empty string necessarily becomes the first in both the original and
5365 translated tables, making the system information very easy to find.
5367 @cindex hash table, inside MO files
5368 The size @var{S} of the hash table can be zero. In this case, the
5369 hash table itself is not contained in the MO file. Some people might
5370 prefer this because a precomputed hashing table takes disk space, and
5371 does not win @emph{that} much speed. The hash table contains indices
5372 to the sorted array of strings in the MO file. Conflict resolution is
5373 done by double hashing. The precise hashing algorithm used is fairly
5374 dependent on GNU @code{gettext} code, and is not documented here.
5376 As for the strings themselves, they follow the hash file, and each
5377 is terminated with a @key{NUL}, and this @key{NUL} is not counted in
5378 the length which appears in the string descriptor. The @code{msgfmt}
5379 program has an option selecting the alignment for MO file strings.
5380 With this option, each string is separately aligned so it starts at
5381 an offset which is a multiple of the alignment value. On some RISC
5382 machines, a correct alignment will speed things up.
5384 @cindex context, in MO files
5385 Contexts are stored by storing the concatenation of the context, a
5386 @key{EOT} byte, and the original string, instead of the original string.
5388 @cindex plural forms, in MO files
5389 Plural forms are stored by letting the plural of the original string
5390 follow the singular of the original string, separated through a
5391 @key{NUL} byte. The length which appears in the string descriptor
5392 includes both. However, only the singular of the original string
5393 takes part in the hash table lookup. The plural variants of the
5394 translation are all stored consecutively, separated through a
5395 @key{NUL} byte. Here also, the length in the string descriptor
5396 includes all of them.
5398 Nothing prevents a MO file from having embedded @key{NUL}s in strings.
5399 However, the program interface currently used already presumes
5400 that strings are @key{NUL} terminated, so embedded @key{NUL}s are
5401 somewhat useless. But the MO file format is general enough so other
5402 interfaces would be later possible, if for example, we ever want to
5403 implement wide characters right in MO files, where @key{NUL} bytes may
5404 accidentally appear. (No, we don't want to have wide characters in MO
5405 files. They would make the file unnecessarily large, and the
5406 @samp{wchar_t} type being platform dependent, MO files would be
5407 platform dependent as well.)
5409 This particular issue has been strongly debated in the GNU
5410 @code{gettext} development forum, and it is expectable that MO file
5411 format will evolve or change over time. It is even possible that many
5412 formats may later be supported concurrently. But surely, we have to
5413 start somewhere, and the MO file format described here is a good start.
5414 Nothing is cast in concrete, and the format may later evolve fairly
5415 easily, so we should feel comfortable with the current approach.
5420 +------------------------------------------+
5421 0 | magic number = 0x950412de |
5423 4 | file format revision = 0 |
5425 8 | number of strings | == N
5427 12 | offset of table with original strings | == O
5429 16 | offset of table with translation strings | == T
5431 20 | size of hashing table | == S
5433 24 | offset of hashing table | == H
5436 . (possibly more entries later) .
5439 O | length & offset 0th string ----------------.
5440 O + 8 | length & offset 1st string ------------------.
5442 O + ((N-1)*8)| length & offset (N-1)th string | | |
5444 T | length & offset 0th translation ---------------.
5445 T + 8 | length & offset 1st translation -----------------.
5447 T + ((N-1)*8)| length & offset (N-1)th translation | | | | |
5449 H | start hash table | | | | |
5451 H + S * 4 | end hash table | | | | |
5453 | NUL terminated 0th string <----------------' | | |
5455 | NUL terminated 1st string <------------------' | |
5459 | NUL terminated 0th translation <---------------' |
5461 | NUL terminated 1st translation <-----------------'
5465 +------------------------------------------+
5469 @node Programmers, Translators, Binaries, Top
5470 @chapter The Programmer's View
5472 @c FIXME: Reorganize whole chapter.
5474 One aim of the current message catalog implementation provided by
5475 GNU @code{gettext} was to use the system's message catalog handling, if the
5476 installer wishes to do so. So we perhaps should first take a look at
5477 the solutions we know about. The people in the POSIX committee did not
5478 manage to agree on one of the semi-official standards which we'll
5479 describe below. In fact they couldn't agree on anything, so they decided
5480 only to include an example of an interface. The major Unix vendors
5481 are split in the usage of the two most important specifications: X/Open's
5482 catgets vs. Uniforum's gettext interface. We'll describe them both and
5483 later explain our solution of this dilemma.
5486 * catgets:: About @code{catgets}
5487 * gettext:: About @code{gettext}
5488 * Comparison:: Comparing the two interfaces
5489 * Using libintl.a:: Using libintl.a in own programs
5490 * gettext grok:: Being a @code{gettext} grok
5491 * Temp Programmers:: Temporary Notes for the Programmers Chapter
5494 @node catgets, gettext, Programmers, Programmers
5495 @section About @code{catgets}
5496 @cindex @code{catgets}, X/Open specification
5498 The @code{catgets} implementation is defined in the X/Open Portability
5499 Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the
5500 process of creating this standard seemed to be too slow for some of
5501 the Unix vendors so they created their implementations on preliminary
5502 versions of the standard. Of course this leads again to problems while
5503 writing platform independent programs: even the usage of @code{catgets}
5504 does not guarantee a unique interface.
5506 Another, personal comment on this that only a bunch of committee members
5507 could have made this interface. They never really tried to program
5508 using this interface. It is a fast, memory-saving implementation, an
5509 user can happily live with it. But programmers hate it (at least I and
5510 some others do@dots{})
5512 But we must not forget one point: after all the trouble with transferring
5513 the rights on Unix(tm) they at last came to X/Open, the very same who
5514 published this specification. This leads me to making the prediction
5515 that this interface will be in future Unix standards (e.g.@: Spec1170) and
5516 therefore part of all Unix implementation (implementations, which are
5517 @emph{allowed} to wear this name).
5520 * Interface to catgets:: The interface
5521 * Problems with catgets:: Problems with the @code{catgets} interface?!
5524 @node Interface to catgets, Problems with catgets, catgets, catgets
5525 @subsection The Interface
5526 @cindex interface to @code{catgets}
5528 The interface to the @code{catgets} implementation consists of three
5529 functions which correspond to those used in file access: @code{catopen}
5530 to open the catalog for using, @code{catgets} for accessing the message
5531 tables, and @code{catclose} for closing after work is done. Prototypes
5532 for the functions and the needed definitions are in the
5533 @code{<nl_types.h>} header file.
5535 @cindex @code{catopen}, a @code{catgets} function
5536 @code{catopen} is used like in this:
5539 nl_catd catd = catopen ("catalog_name", 0);
5542 The function takes as the argument the name of the catalog. This usual
5543 refers to the name of the program or the package. The second parameter
5544 is not further specified in the standard. I don't even know whether it
5545 is implemented consistently among various systems. So the common advice
5546 is to use @code{0} as the value. The return value is a handle to the
5547 message catalog, equivalent to handles to file returned by @code{open}.
5549 @cindex @code{catgets}, a @code{catgets} function
5550 This handle is of course used in the @code{catgets} function which can
5554 char *translation = catgets (catd, set_no, msg_id, "original string");
5557 The first parameter is this catalog descriptor. The second parameter
5558 specifies the set of messages in this catalog, in which the message
5559 described by @code{msg_id} is obtained. @code{catgets} therefore uses a
5560 three-stage addressing:
5563 catalog name @result{} set number @result{} message ID @result{} translation
5566 @c Anybody else loving Haskell??? :-) -- Uli
5568 The fourth argument is not used to address the translation. It is given
5569 as a default value in case when one of the addressing stages fail. One
5570 important thing to remember is that although the return type of catgets
5571 is @code{char *} the resulting string @emph{must not} be changed. It
5572 should better be @code{const char *}, but the standard is published in
5573 1988, one year before ANSI C.
5576 @cindex @code{catclose}, a @code{catgets} function
5577 The last of these functions is used and behaves as expected:
5583 After this no @code{catgets} call using the descriptor is legal anymore.
5585 @node Problems with catgets, , Interface to catgets, catgets
5586 @subsection Problems with the @code{catgets} Interface?!
5587 @cindex problems with @code{catgets} interface
5589 Now that this description seemed to be really easy --- where are the
5590 problems we speak of? In fact the interface could be used in a
5591 reasonable way, but constructing the message catalogs is a pain. The
5592 reason for this lies in the third argument of @code{catgets}: the unique
5593 message ID. This has to be a numeric value for all messages in a single
5594 set. Perhaps you could imagine the problems keeping such a list while
5595 changing the source code. Add a new message here, remove one there. Of
5596 course there have been developed a lot of tools helping to organize this
5597 chaos but one as the other fails in one aspect or the other. We don't
5598 want to say that the other approach has no problems but they are far
5599 more easy to manage.
5601 @node gettext, Comparison, catgets, Programmers
5602 @section About @code{gettext}
5603 @cindex @code{gettext}, a programmer's view
5605 The definition of the @code{gettext} interface comes from a Uniforum
5606 proposal. It was submitted there by Sun, who had implemented the
5607 @code{gettext} function in SunOS 4, around 1990. Nowadays, the
5608 @code{gettext} interface is specified by the OpenI18N standard.
5610 The main point about this solution is that it does not follow the
5611 method of normal file handling (open-use-close) and that it does not
5612 burden the programmer with so many tasks, especially the unique key handling.
5613 Of course here also a unique key is needed, but this key is the message
5614 itself (how long or short it is). See @ref{Comparison} for a more
5615 detailed comparison of the two methods.
5617 The following section contains a rather detailed description of the
5618 interface. We make it that detailed because this is the interface
5619 we chose for the GNU @code{gettext} Library. Programmers interested
5620 in using this library will be interested in this description.
5623 * Interface to gettext:: The interface
5624 * Ambiguities:: Solving ambiguities
5625 * Locating Catalogs:: Locating message catalog files
5626 * Charset conversion:: How to request conversion to Unicode
5627 * Contexts:: Solving ambiguities in GUI programs
5628 * Plural forms:: Additional functions for handling plurals
5629 * Optimized gettext:: Optimization of the *gettext functions
5632 @node Interface to gettext, Ambiguities, gettext, gettext
5633 @subsection The Interface
5634 @cindex @code{gettext} interface
5636 The minimal functionality an interface must have is a) to select a
5637 domain the strings are coming from (a single domain for all programs is
5638 not reasonable because its construction and maintenance is difficult,
5639 perhaps impossible) and b) to access a string in a selected domain.
5641 This is principally the description of the @code{gettext} interface. It
5642 has a global domain which unqualified usages reference. Of course this
5643 domain is selectable by the user.
5646 char *textdomain (const char *domain_name);
5649 This provides the possibility to change or query the current status of
5650 the current global domain of the @code{LC_MESSAGE} category. The
5651 argument is a null-terminated string, whose characters must be legal in
5652 the use in filenames. If the @var{domain_name} argument is @code{NULL},
5653 the function returns the current value. If no value has been set
5654 before, the name of the default domain is returned: @emph{messages}.
5655 Please note that although the return value of @code{textdomain} is of
5656 type @code{char *} no changing is allowed. It is also important to know
5657 that no checks of the availability are made. If the name is not
5658 available you will see this by the fact that no translations are provided.
5661 To use a domain set by @code{textdomain} the function
5664 char *gettext (const char *msgid);
5668 is to be used. This is the simplest reasonable form one can imagine.
5669 The translation of the string @var{msgid} is returned if it is available
5670 in the current domain. If it is not available, the argument itself is
5671 returned. If the argument is @code{NULL} the result is undefined.
5673 One thing which should come into mind is that no explicit dependency to
5674 the used domain is given. The current value of the domain is used.
5675 If this changes between two
5676 executions of the same @code{gettext} call in the program, both calls
5677 reference a different message catalog.
5679 For the easiest case, which is normally used in internationalized
5680 packages, once at the beginning of execution a call to @code{textdomain}
5681 is issued, setting the domain to a unique name, normally the package
5682 name. In the following code all strings which have to be translated are
5683 filtered through the gettext function. That's all, the package speaks
5686 @node Ambiguities, Locating Catalogs, Interface to gettext, gettext
5687 @subsection Solving Ambiguities
5688 @cindex several domains
5689 @cindex domain ambiguities
5690 @cindex large package
5692 While this single name domain works well for most applications there
5693 might be the need to get translations from more than one domain. Of
5694 course one could switch between different domains with calls to
5695 @code{textdomain}, but this is really not convenient nor is it fast. A
5696 possible situation could be one case subject to discussion during this
5698 error messages of functions in the set of common used functions should
5699 go into a separate domain @code{error}. By this mean we would only need
5700 to translate them once.
5701 Another case are messages from a library, as these @emph{have} to be
5702 independent of the current domain set by the application.
5705 For this reasons there are two more functions to retrieve strings:
5708 char *dgettext (const char *domain_name, const char *msgid);
5709 char *dcgettext (const char *domain_name, const char *msgid,
5713 Both take an additional argument at the first place, which corresponds
5714 to the argument of @code{textdomain}. The third argument of
5715 @code{dcgettext} allows to use another locale category but @code{LC_MESSAGES}.
5716 But I really don't know where this can be useful. If the
5717 @var{domain_name} is @code{NULL} or @var{category} has an value beside
5718 the known ones, the result is undefined. It should also be noted that
5719 this function is not part of the second known implementation of this
5720 function family, the one found in Solaris.
5722 A second ambiguity can arise by the fact, that perhaps more than one
5723 domain has the same name. This can be solved by specifying where the
5724 needed message catalog files can be found.
5727 char *bindtextdomain (const char *domain_name,
5728 const char *dir_name);
5731 Calling this function binds the given domain to a file in the specified
5732 directory (how this file is determined follows below). Especially a
5733 file in the systems default place is not favored against the specified
5734 file anymore (as it would be by solely using @code{textdomain}). A
5735 @code{NULL} pointer for the @var{dir_name} parameter returns the binding
5736 associated with @var{domain_name}. If @var{domain_name} itself is
5737 @code{NULL} nothing happens and a @code{NULL} pointer is returned. Here
5738 again as for all the other functions is true that none of the return
5739 value must be changed!
5741 It is important to remember that relative path names for the
5742 @var{dir_name} parameter can be trouble. Since the path is always
5743 computed relative to the current directory different results will be
5744 achieved when the program executes a @code{chdir} command. Relative
5745 paths should always be avoided to avoid dependencies and
5748 @node Locating Catalogs, Charset conversion, Ambiguities, gettext
5749 @subsection Locating Message Catalog Files
5750 @cindex message catalog files location
5752 Because many different languages for many different packages have to be
5753 stored we need some way to add these information to file message catalog
5754 files. The way usually used in Unix environments is have this encoding
5755 in the file name. This is also done here. The directory name given in
5756 @code{bindtextdomain}s second argument (or the default directory),
5757 followed by the name of the locale, the locale category, and the domain name
5761 @var{dir_name}/@var{locale}/LC_@var{category}/@var{domain_name}.mo
5764 The default value for @var{dir_name} is system specific. For the GNU
5765 library, and for packages adhering to its conventions, it's:
5767 /usr/local/share/locale
5771 @var{locale} is the name of the locale category which is designated by
5772 @code{LC_@var{category}}. For @code{gettext} and @code{dgettext} this
5773 @code{LC_@var{category}} is always @code{LC_MESSAGES}.@footnote{Some
5774 system, e.g.@: mingw, don't have @code{LC_MESSAGES}. Here we use a more or
5775 less arbitrary value for it, namely 1729, the smallest positive integer
5776 which can be represented in two different ways as the sum of two cubes.}
5777 The name of the locale category is determined through
5778 @code{setlocale (LC_@var{category}, NULL)}.
5779 @footnote{When the system does not support @code{setlocale} its behavior
5780 in setting the locale values is simulated by looking at the environment
5782 When using the function @code{dcgettext}, you can specify the locale category
5783 through the third argument.
5785 @node Charset conversion, Contexts, Locating Catalogs, gettext
5786 @subsection How to specify the output character set @code{gettext} uses
5787 @cindex charset conversion at runtime
5788 @cindex encoding conversion at runtime
5790 @code{gettext} not only looks up a translation in a message catalog. It
5791 also converts the translation on the fly to the desired output character
5792 set. This is useful if the user is working in a different character set
5793 than the translator who created the message catalog, because it avoids
5794 distributing variants of message catalogs which differ only in the
5797 The output character set is, by default, the value of @code{nl_langinfo
5798 (CODESET)}, which depends on the @code{LC_CTYPE} part of the current
5799 locale. But programs which store strings in a locale independent way
5800 (e.g.@: UTF-8) can request that @code{gettext} and related functions
5801 return the translations in that encoding, by use of the
5802 @code{bind_textdomain_codeset} function.
5804 Note that the @var{msgid} argument to @code{gettext} is not subject to
5805 character set conversion. Also, when @code{gettext} does not find a
5806 translation for @var{msgid}, it returns @var{msgid} unchanged --
5807 independently of the current output character set. It is therefore
5808 recommended that all @var{msgid}s be US-ASCII strings.
5810 @deftypefun {char *} bind_textdomain_codeset (const char *@var{domainname}, const char *@var{codeset})
5811 The @code{bind_textdomain_codeset} function can be used to specify the
5812 output character set for message catalogs for domain @var{domainname}.
5813 The @var{codeset} argument must be a valid codeset name which can be used
5814 for the @code{iconv_open} function, or a null pointer.
5816 If the @var{codeset} parameter is the null pointer,
5817 @code{bind_textdomain_codeset} returns the currently selected codeset
5818 for the domain with the name @var{domainname}. It returns @code{NULL} if
5819 no codeset has yet been selected.
5821 The @code{bind_textdomain_codeset} function can be used several times.
5822 If used multiple times with the same @var{domainname} argument, the
5823 later call overrides the settings made by the earlier one.
5825 The @code{bind_textdomain_codeset} function returns a pointer to a
5826 string containing the name of the selected codeset. The string is
5827 allocated internally in the function and must not be changed by the
5828 user. If the system went out of core during the execution of
5829 @code{bind_textdomain_codeset}, the return value is @code{NULL} and the
5830 global variable @var{errno} is set accordingly.
5833 @node Contexts, Plural forms, Charset conversion, gettext
5834 @subsection Using contexts for solving ambiguities
5836 @cindex GUI programs
5837 @cindex translating menu entries
5838 @cindex menu entries
5840 One place where the @code{gettext} functions, if used normally, have big
5841 problems is within programs with graphical user interfaces (GUIs). The
5842 problem is that many of the strings which have to be translated are very
5843 short. They have to appear in pull-down menus which restricts the
5844 length. But strings which are not containing entire sentences or at
5845 least large fragments of a sentence may appear in more than one
5846 situation in the program but might have different translations. This is
5847 especially true for the one-word strings which are frequently used in
5850 As a consequence many people say that the @code{gettext} approach is
5851 wrong and instead @code{catgets} should be used which indeed does not
5852 have this problem. But there is a very simple and powerful method to
5853 handle this kind of problems with the @code{gettext} functions.
5855 Contexts can be added to strings to be translated. A context dependent
5856 translation lookup is when a translation for a given string is searched,
5857 that is limited to a given context. The translation for the same string
5858 in a different context can be different. The different translations of
5859 the same string in different contexts can be stored in the in the same
5860 MO file, and can be edited by the translator in the same PO file.
5862 The @file{gettext.h} include file contains the lookup macros for strings
5863 with contexts. They are implemented as thin macros and inline functions
5864 over the functions from @code{<libintl.h>}.
5868 const char *pgettext (const char *msgctxt, const char *msgid);
5871 In a call of this macro, @var{msgctxt} and @var{msgid} must be string
5872 literals. The macro returns the translation of @var{msgid}, restricted
5873 to the context given by @var{msgctxt}.
5875 The @var{msgctxt} string is visible in the PO file to the translator.
5876 You should try to make it somehow canonical and never changing. Because
5877 every time you change an @var{msgctxt}, the translator will have to review
5878 the translation of @var{msgid}.
5880 Finding a canonical @var{msgctxt} string that doesn't change over time can
5881 be hard. But you shouldn't use the file name or class name containing the
5882 @code{pgettext} call -- because it is a common development task to rename
5883 a file or a class, and it shouldn't cause translator work. Also you shouldn't
5884 use a comment in the form of a complete English sentence as @var{msgctxt} --
5885 because orthography or grammar changes are often applied to such sentences,
5886 and again, it shouldn't force the translator to do a review.
5888 The @samp{p} in @samp{pgettext} stands for ``particular'': @code{pgettext}
5889 fetches a particular translation of the @var{msgid}.
5894 const char *dpgettext (const char *domain_name,
5895 const char *msgctxt, const char *msgid);
5896 const char *dcpgettext (const char *domain_name,
5897 const char *msgctxt, const char *msgid,
5901 These are generalizations of @code{pgettext}. They behave similarly to
5902 @code{dgettext} and @code{dcgettext}, respectively. The @var{domain_name}
5903 argument defines the translation domain. The @var{category} argument
5904 allows to use another locale category than @code{LC_MESSAGES}.
5906 As as example consider the following fictional situation. A GUI program
5907 has a menu bar with the following entries:
5910 +------------+------------+--------------------------------------+
5911 | File | Printer | |
5912 +------------+------------+--------------------------------------+
5915 +----------+ | Connect |
5919 To have the strings @code{File}, @code{Printer}, @code{Open},
5920 @code{New}, @code{Select}, and @code{Connect} translated there has to be
5921 at some point in the code a call to a function of the @code{gettext}
5922 family. But in two places the string passed into the function would be
5923 @code{Open}. The translations might not be the same and therefore we
5924 are in the dilemma described above.
5926 What distinguishes the two places is the menu path from the menu root to
5927 the particular menu entries:
5936 Menu|Printer|Connect
5939 The context is thus the menu path without its last part. So, the calls
5943 pgettext ("Menu|", "File")
5944 pgettext ("Menu|", "Printer")
5945 pgettext ("Menu|File|", "Open")
5946 pgettext ("Menu|File|", "New")
5947 pgettext ("Menu|Printer|", "Select")
5948 pgettext ("Menu|Printer|", "Open")
5949 pgettext ("Menu|Printer|", "Connect")
5952 Whether or not to use the @samp{|} character at the end of the context is a
5955 For more complex cases, where the @var{msgctxt} or @var{msgid} are not
5956 string literals, more general macros are available:
5958 @findex pgettext_expr
5959 @findex dpgettext_expr
5960 @findex dcpgettext_expr
5962 const char *pgettext_expr (const char *msgctxt, const char *msgid);
5963 const char *dpgettext_expr (const char *domain_name,
5964 const char *msgctxt, const char *msgid);
5965 const char *dcpgettext_expr (const char *domain_name,
5966 const char *msgctxt, const char *msgid,
5970 Here @var{msgctxt} and @var{msgid} can be arbitrary string-valued expressions.
5971 These macros are more general. But in the case that both argument expressions
5972 are string literals, the macros without the @samp{_expr} suffix are more
5975 @node Plural forms, Optimized gettext, Contexts, gettext
5976 @subsection Additional functions for plural forms
5977 @cindex plural forms
5979 The functions of the @code{gettext} family described so far (and all the
5980 @code{catgets} functions as well) have one problem in the real world
5981 which have been neglected completely in all existing approaches. What
5982 is meant here is the handling of plural forms.
5984 Looking through Unix source code before the time anybody thought about
5985 internationalization (and, sadly, even afterwards) one can often find
5986 code similar to the following:
5989 printf ("%d file%s deleted", n, n == 1 ? "" : "s");
5993 After the first complaints from people internationalizing the code people
5994 either completely avoided formulations like this or used strings like
5995 @code{"file(s)"}. Both look unnatural and should be avoided. First
5996 tries to solve the problem correctly looked like this:
6000 printf ("%d file deleted", n);
6002 printf ("%d files deleted", n);
6005 But this does not solve the problem. It helps languages where the
6006 plural form of a noun is not simply constructed by adding an
6013 but that is all. Once again people fell into the trap of believing the
6014 rules their language is using are universal. But the handling of plural
6015 forms differs widely between the language families. For example,
6016 Rafal Maszkowski @code{<rzm@@mat.uni.torun.pl>} reports:
6019 In Polish we use e.g.@: plik (file) this way:
6027 and so on (o' means 8859-2 oacute which should be rather okreska,
6028 similar to aogonek).
6031 There are two things which can differ between languages (and even inside
6036 The form how plural forms are built differs. This is a problem with
6037 languages which have many irregularities. German, for instance, is a
6038 drastic case. Though English and German are part of the same language
6039 family (Germanic), the almost regular forming of plural noun forms
6047 is hardly found in German.
6050 The number of plural forms differ. This is somewhat surprising for
6051 those who only have experiences with Romanic and Germanic languages
6052 since here the number is the same (there are two).
6054 But other language families have only one form or many forms. More
6055 information on this in an extra section.
6058 The consequence of this is that application writers should not try to
6059 solve the problem in their code. This would be localization since it is
6060 only usable for certain, hardcoded language environments. Instead the
6061 extended @code{gettext} interface should be used.
6063 These extra functions are taking instead of the one key string two
6064 strings and a numerical argument. The idea behind this is that using
6065 the numerical argument and the first string as a key, the implementation
6066 can select using rules specified by the translator the right plural
6067 form. The two string arguments then will be used to provide a return
6068 value in case no message catalog is found (similar to the normal
6069 @code{gettext} behavior). In this case the rules for Germanic language
6070 is used and it is assumed that the first string argument is the singular
6071 form, the second the plural form.
6073 This has the consequence that programs without language catalogs can
6074 display the correct strings only if the program itself is written using
6075 a Germanic language. This is a limitation but since the GNU C library
6076 (as well as the GNU @code{gettext} package) are written as part of the
6077 GNU package and the coding standards for the GNU project require program
6078 being written in English, this solution nevertheless fulfills its
6081 @deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
6082 The @code{ngettext} function is similar to the @code{gettext} function
6083 as it finds the message catalogs in the same way. But it takes two
6084 extra arguments. The @var{msgid1} parameter must contain the singular
6085 form of the string to be converted. It is also used as the key for the
6086 search in the catalog. The @var{msgid2} parameter is the plural form.
6087 The parameter @var{n} is used to determine the plural form. If no
6088 message catalog is found @var{msgid1} is returned if @code{n == 1},
6089 otherwise @code{msgid2}.
6091 An example for the use of this function is:
6094 printf (ngettext ("%d file removed", "%d files removed", n), n);
6097 Please note that the numeric value @var{n} has to be passed to the
6098 @code{printf} function as well. It is not sufficient to pass it only to
6101 In the English singular case, the number -- always 1 -- can be replaced with
6105 printf (ngettext ("One file removed", "%d files removed", n), n);
6109 This works because the @samp{printf} function discards excess arguments that
6110 are not consumed by the format string.
6112 If this function is meant to yield a format string that takes two or more
6113 arguments, you can not use it like this:
6116 printf (ngettext ("%d file removed from directory %s",
6117 "%d files removed from directory %s",
6123 because in many languages the translators want to replace the @samp{%d}
6124 with an explicit word in the singular case, just like ``one'' in English,
6125 and C format strings cannot consume the second argument but skip the first
6126 argument. Instead, you have to reorder the arguments so that @samp{n}
6130 printf (ngettext ("%$2d file removed from directory %$1s",
6131 "%$2d files removed from directory %$1s",
6137 See @ref{c-format} for details about this argument reordering syntax.
6139 When you know that the value of @code{n} is within a given range, you can
6140 specify it as a comment directed to the @code{xgettext} tool. This
6141 information may help translators to use more adequate translations. Like
6145 if (days > 7 && days < 14)
6146 /* xgettext: range: 1..6 */
6147 printf (ngettext ("one week and one day", "one week and %d days",
6152 It is also possible to use this function when the strings don't contain a
6156 puts (ngettext ("Delete the selected file?",
6157 "Delete the selected files?",
6161 In this case the number @var{n} is only used to choose the plural form.
6164 @deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
6165 The @code{dngettext} is similar to the @code{dgettext} function in the
6166 way the message catalog is selected. The difference is that it takes
6167 two extra parameter to provide the correct plural form. These two
6168 parameters are handled in the same way @code{ngettext} handles them.
6171 @deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category})
6172 The @code{dcngettext} is similar to the @code{dcgettext} function in the
6173 way the message catalog is selected. The difference is that it takes
6174 two extra parameter to provide the correct plural form. These two
6175 parameters are handled in the same way @code{ngettext} handles them.
6178 Now, how do these functions solve the problem of the plural forms?
6179 Without the input of linguists (which was not available) it was not
6180 possible to determine whether there are only a few different forms in
6181 which plural forms are formed or whether the number can increase with
6182 every new supported language.
6184 Therefore the solution implemented is to allow the translator to specify
6185 the rules of how to select the plural form. Since the formula varies
6186 with every language this is the only viable solution except for
6187 hardcoding the information in the code (which still would require the
6188 possibility of extensions to not prevent the use of new languages).
6190 @cindex specifying plural form in a PO file
6191 @kwindex nplurals@r{, in a PO file header}
6192 @kwindex plural@r{, in a PO file header}
6193 The information about the plural form selection has to be stored in the
6194 header entry of the PO file (the one with the empty @code{msgid} string).
6195 The plural form information looks like this:
6198 Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
6201 The @code{nplurals} value must be a decimal number which specifies how
6202 many different plural forms exist for this language. The string
6203 following @code{plural} is an expression which is using the C language
6204 syntax. Exceptions are that no negative numbers are allowed, numbers
6205 must be decimal, and the only variable allowed is @code{n}. Spaces are
6206 allowed in the expression, but backslash-newlines are not; in the
6207 examples below the backslash-newlines are present for formatting purposes
6208 only. This expression will be evaluated whenever one of the functions
6209 @code{ngettext}, @code{dngettext}, or @code{dcngettext} is called. The
6210 numeric value passed to these functions is then substituted for all uses
6211 of the variable @code{n} in the expression. The resulting value then
6212 must be greater or equal to zero and smaller than the value given as the
6213 value of @code{nplurals}.
6216 @cindex plural form formulas
6217 The following rules are known at this point. The language with families
6218 are listed. But this does not necessarily mean the information can be
6219 generalized for the whole family (as can be easily seen in the table
6220 below).@footnote{Additions are welcome. Send appropriate information to
6221 @email{bug-gnu-gettext@@gnu.org} and @email{bug-glibc-manual@@gnu.org}.}
6224 @item Only one form:
6225 Some languages only require one single form. There is no distinction
6226 between the singular and plural form. An appropriate header entry
6227 would look like this:
6230 Plural-Forms: nplurals=1; plural=0;
6234 Languages with this property include:
6238 Japanese, @c 122.1 million speakers
6239 Vietnamese, @c 68.6 million speakers
6240 Korean @c 66.3 million speakers
6243 @item Two forms, singular used for one only
6244 This is the form used in most existing programs since it is what English
6245 is using. A header entry would look like this:
6248 Plural-Forms: nplurals=2; plural=n != 1;
6251 (Note: this uses the feature of C expressions that boolean expressions
6252 have to value zero or one.)
6255 Languages with this property include:
6258 @item Germanic family
6259 English, @c 328.0 million speakers
6260 German, @c 96.9 million speakers
6261 Dutch, @c 21.7 million speakers
6262 Swedish, @c 8.3 million speakers
6263 Danish, @c 5.6 million speakers
6264 Norwegian, @c 4.6 million speakers
6265 Faroese @c 0.05 million speakers
6266 @item Romanic family
6267 Spanish, @c 328.5 million speakers
6268 Portuguese, @c 178.0 million speakers - 163 million Brazilian Portuguese
6269 Italian, @c 61.7 million speakers
6270 Bulgarian @c 9.1 million speakers
6271 @item Latin/Greek family
6272 Greek @c 13.1 million speakers
6273 @item Finno-Ugric family
6274 Finnish, @c 5.0 million speakers
6275 Estonian @c 1.0 million speakers
6276 @item Semitic family
6277 Hebrew @c 5.3 million speakers
6279 Esperanto @c 2 million speakers
6283 Other languages using the same header entry are:
6286 @item Finno-Ugric family
6287 Hungarian @c 12.5 million speakers
6288 @item Turkic/Altaic family
6289 Turkish @c 50.8 million speakers
6292 Hungarian does not appear to have a plural if you look at sentences involving
6293 cardinal numbers. For example, ``1 apple'' is ``1 alma'', and ``123 apples'' is
6294 ``123 alma''. But when the number is not explicit, the distinction between
6295 singular and plural exists: ``the apple'' is ``az alma'', and ``the apples'' is
6296 ``az alm@'{a}k''. Since @code{ngettext} has to support both types of sentences,
6297 it is classified here, under ``two forms''.
6299 The same holds for Turkish: ``1 apple'' is ``1 elma'', and ``123 apples'' is
6300 ``123 elma''. But when the number is omitted, the distinction between singular
6301 and plural exists: ``the apple'' is ``elma'', and ``the apples'' is
6304 @item Two forms, singular used for zero and one
6305 Exceptional case in the language family. The header entry would be:
6308 Plural-Forms: nplurals=2; plural=n>1;
6312 Languages with this property include:
6315 @item Romanic family
6316 Brazilian Portuguese, @c 163 million speakers
6317 French @c 67.8 million speakers
6320 @item Three forms, special case for zero
6321 The header entry would be:
6324 Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2;
6328 Languages with this property include:
6332 Latvian @c 1.5 million speakers
6335 @item Three forms, special cases for one and two
6336 The header entry would be:
6339 Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;
6343 Languages with this property include:
6347 Gaeilge (Irish) @c 0.4 million speakers
6350 @item Three forms, special case for numbers ending in 00 or [2-9][0-9]
6351 The header entry would be:
6354 Plural-Forms: nplurals=3; \
6355 plural=n==1 ? 0 : (n==0 || (n%100 > 0 && n%100 < 20)) ? 1 : 2;
6359 Languages with this property include:
6362 @item Romanic family
6363 Romanian @c 23.4 million speakers
6366 @item Three forms, special case for numbers ending in 1[2-9]
6367 The header entry would look like this:
6370 Plural-Forms: nplurals=3; \
6371 plural=n%10==1 && n%100!=11 ? 0 : \
6372 n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2;
6376 Languages with this property include:
6380 Lithuanian @c 3.2 million speakers
6383 @item Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]
6384 The header entry would look like this:
6387 Plural-Forms: nplurals=3; \
6388 plural=n%10==1 && n%100!=11 ? 0 : \
6389 n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
6393 Languages with this property include:
6397 Russian, @c 143.6 million speakers
6398 Ukrainian, @c 37.0 million speakers
6399 Belarusian, @c 8.6 million speakers
6400 Serbian, @c 7.0 million speakers
6401 Croatian @c 5.5 million speakers
6404 @item Three forms, special cases for 1 and 2, 3, 4
6405 The header entry would look like this:
6408 Plural-Forms: nplurals=3; \
6409 plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2;
6413 Languages with this property include:
6417 Czech, @c 9.5 million speakers
6418 Slovak @c 5.0 million speakers
6421 @item Three forms, special case for one and some numbers ending in 2, 3, or 4
6422 The header entry would look like this:
6425 Plural-Forms: nplurals=3; \
6427 n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
6431 Languages with this property include:
6435 Polish @c 40.0 million speakers
6438 @item Four forms, special case for one and all numbers ending in 02, 03, or 04
6439 The header entry would look like this:
6442 Plural-Forms: nplurals=4; \
6443 plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;
6447 Languages with this property include:
6451 Slovenian @c 1.9 million speakers
6455 You might now ask, @code{ngettext} handles only numbers @var{n} of type
6456 @samp{unsigned long}. What about larger integer types? What about negative
6457 numbers? What about floating-point numbers?
6459 About larger integer types, such as @samp{uintmax_t} or
6460 @samp{unsigned long long}: they can be handled by reducing the value to a
6461 range that fits in an @samp{unsigned long}. Simply casting the value to
6462 @samp{unsigned long} would not do the right thing, since it would treat
6463 @code{ULONG_MAX + 1} like zero, @code{ULONG_MAX + 2} like singular, and
6464 the like. Here you can exploit the fact that all mentioned plural form
6465 formulas eventually become periodic, with a period that is a divisor of 100
6466 (or 1000 or 1000000). So, when you reduce a large value to another one in
6467 the range [1000000, 1999999] that ends in the same 6 decimal digits, you
6468 can assume that it will lead to the same plural form selection. This code
6472 #include <inttypes.h>
6473 uintmax_t nbytes = ...;
6474 printf (ngettext ("The file has %"PRIuMAX" byte.",
6475 "The file has %"PRIuMAX" bytes.",
6477 ? (nbytes % 1000000) + 1000000
6482 Negative and floating-point values usually represent physical entities for
6483 which singular and plural don't clearly apply. In such cases, there is no
6484 need to use @code{ngettext}; a simple @code{gettext} call with a form suitable
6485 for all values will do. For example:
6488 printf (gettext ("Time elapsed: %.3f seconds"),
6489 num_milliseconds * 0.001);
6493 Even if @var{num_milliseconds} happens to be a multiple of 1000, the output
6495 Time elapsed: 1.000 seconds
6498 is acceptable in English, and similarly for other languages.
6500 The translators' perspective regarding plural forms is explained in
6501 @ref{Translating plural forms}.
6503 @node Optimized gettext, , Plural forms, gettext
6504 @subsection Optimization of the *gettext functions
6505 @cindex optimization of @code{gettext} functions
6507 At this point of the discussion we should talk about an advantage of the
6508 GNU @code{gettext} implementation. Some readers might have pointed out
6509 that an internationalized program might have a poor performance if some
6510 string has to be translated in an inner loop. While this is unavoidable
6511 when the string varies from one run of the loop to the other it is
6512 simply a waste of time when the string is always the same. Take the
6520 puts (gettext ("Hello world"));
6527 When the locale selection does not change between two runs the resulting
6528 string is always the same. One way to use this is:
6533 str = gettext ("Hello world");
6543 But this solution is not usable in all situation (e.g.@: when the locale
6544 selection changes) nor does it lead to legible code.
6546 For this reason, GNU @code{gettext} caches previous translation results.
6547 When the same translation is requested twice, with no new message
6548 catalogs being loaded in between, @code{gettext} will, the second time,
6549 find the result through a single cache lookup.
6551 @node Comparison, Using libintl.a, gettext, Programmers
6552 @section Comparing the Two Interfaces
6553 @cindex @code{gettext} vs @code{catgets}
6554 @cindex comparison of interfaces
6556 @c FIXME: arguments to catgets vs. gettext
6557 @c Partly done 950718 -- drepper
6559 The following discussion is perhaps a little bit colored. As said
6560 above we implemented GNU @code{gettext} following the Uniforum
6561 proposal and this surely has its reasons. But it should show how we
6562 came to this decision.
6564 First we take a look at the developing process. When we write an
6565 application using NLS provided by @code{gettext} we proceed as always.
6566 Only when we come to a string which might be seen by the users and thus
6567 has to be translated we use @code{gettext("@dots{}")} instead of
6568 @code{"@dots{}"}. At the beginning of each source file (or in a central
6569 header file) we define
6572 #define gettext(String) (String)
6575 Even this definition can be avoided when the system supports the
6576 @code{gettext} function in its C library. When we compile this code the
6577 result is the same as if no NLS code is used. When you take a look at
6578 the GNU @code{gettext} code you will see that we use @code{_("@dots{}")}
6579 instead of @code{gettext("@dots{}")}. This reduces the number of
6580 additional characters per translatable string to @emph{3} (in words:
6583 When now a production version of the program is needed we simply replace
6587 #define _(String) (String)
6593 @cindex include file @file{libintl.h}
6595 #include <libintl.h>
6596 #define _(String) gettext (String)
6600 Additionally we run the program @file{xgettext} on all source code file
6601 which contain translatable strings and that's it: we have a running
6602 program which does not depend on translations to be available, but which
6603 can use any that becomes available.
6605 @cindex @code{N_}, a convenience macro
6606 The same procedure can be done for the @code{gettext_noop} invocations
6607 (@pxref{Special cases}). One usually defines @code{gettext_noop} as a
6608 no-op macro. So you should consider the following code for your project:
6611 #define gettext_noop(String) String
6612 #define N_(String) gettext_noop (String)
6615 @code{N_} is a short form similar to @code{_}. The @file{Makefile} in
6616 the @file{po/} directory of GNU @code{gettext} knows by default both of the
6617 mentioned short forms so you are invited to follow this proposal for
6620 Now to @code{catgets}. The main problem is the work for the
6621 programmer. Every time he comes to a translatable string he has to
6622 define a number (or a symbolic constant) which has also be defined in
6623 the message catalog file. He also has to take care for duplicate
6624 entries, duplicate message IDs etc. If he wants to have the same
6625 quality in the message catalog as the GNU @code{gettext} program
6626 provides he also has to put the descriptive comments for the strings and
6627 the location in all source code files in the message catalog. This is
6628 nearly a Mission: Impossible.
6630 But there are also some points people might call advantages speaking for
6631 @code{catgets}. If you have a single word in a string and this string
6632 is used in different contexts it is likely that in one or the other
6633 language the word has different translations. Example:
6636 printf ("%s: %d", gettext ("number"), number_of_errors)
6638 printf ("you should see %d %s", number_count,
6639 number_count == 1 ? gettext ("number") : gettext ("numbers"))
6642 Here we have to translate two times the string @code{"number"}. Even
6643 if you do not speak a language beside English it might be possible to
6644 recognize that the two words have a different meaning. In German the
6645 first appearance has to be translated to @code{"Anzahl"} and the second
6648 Now you can say that this example is really esoteric. And you are
6649 right! This is exactly how we felt about this problem and decide that
6650 it does not weight that much. The solution for the above problem could
6654 printf ("%s %d", gettext ("number:"), number_of_errors)
6656 printf (number_count == 1 ? gettext ("you should see %d number")
6657 : gettext ("you should see %d numbers"),
6661 We believe that we can solve all conflicts with this method. If it is
6662 difficult one can also consider changing one of the conflicting string a
6663 little bit. But it is not impossible to overcome.
6665 @code{catgets} allows same original entry to have different translations,
6666 but @code{gettext} has another, scalable approach for solving ambiguities
6667 of this kind: @xref{Ambiguities}.
6669 @node Using libintl.a, gettext grok, Comparison, Programmers
6670 @section Using libintl.a in own programs
6672 Starting with version 0.9.4 the library @code{libintl.h} should be
6673 self-contained. I.e., you can use it in your own programs without
6674 providing additional functions. The @file{Makefile} will put the header
6675 and the library in directories selected using the @code{$(prefix)}.
6677 @node gettext grok, Temp Programmers, Using libintl.a, Programmers
6678 @section Being a @code{gettext} grok
6680 @strong{ NOTE: } This documentation section is outdated and needs to be
6683 To fully exploit the functionality of the GNU @code{gettext} library it
6684 is surely helpful to read the source code. But for those who don't want
6685 to spend that much time in reading the (sometimes complicated) code here
6689 @item Changing the language at runtime
6690 @cindex language selection at runtime
6692 For interactive programs it might be useful to offer a selection of the
6693 used language at runtime. To understand how to do this one need to know
6694 how the used language is determined while executing the @code{gettext}
6695 function. The method which is presented here only works correctly
6696 with the GNU implementation of the @code{gettext} functions.
6698 In the function @code{dcgettext} at every call the current setting of
6699 the highest priority environment variable is determined and used.
6700 Highest priority means here the following list with decreasing
6704 @vindex LANGUAGE@r{, environment variable}
6705 @item @code{LANGUAGE}
6706 @vindex LC_ALL@r{, environment variable}
6708 @vindex LC_CTYPE@r{, environment variable}
6709 @vindex LC_NUMERIC@r{, environment variable}
6710 @vindex LC_TIME@r{, environment variable}
6711 @vindex LC_COLLATE@r{, environment variable}
6712 @vindex LC_MONETARY@r{, environment variable}
6713 @vindex LC_MESSAGES@r{, environment variable}
6714 @item @code{LC_xxx}, according to selected locale category
6715 @vindex LANG@r{, environment variable}
6719 Afterwards the path is constructed using the found value and the
6720 translation file is loaded if available.
6722 What happens now when the value for, say, @code{LANGUAGE} changes? According
6723 to the process explained above the new value of this variable is found
6724 as soon as the @code{dcgettext} function is called. But this also means
6725 the (perhaps) different message catalog file is loaded. In other
6726 words: the used language is changed.
6728 But there is one little hook. The code for gcc-2.7.0 and up provides
6729 some optimization. This optimization normally prevents the calling of
6730 the @code{dcgettext} function as long as no new catalog is loaded. But
6731 if @code{dcgettext} is not called the program also cannot find the
6732 @code{LANGUAGE} variable be changed (@pxref{Optimized gettext}). A
6733 solution for this is very easy. Include the following code in the
6734 language switching function.
6737 /* Change language. */
6738 setenv ("LANGUAGE", "fr", 1);
6740 /* Make change known. */
6742 extern int _nl_msg_cat_cntr;
6747 @cindex @code{_nl_msg_cat_cntr}
6748 The variable @code{_nl_msg_cat_cntr} is defined in @file{loadmsgcat.c}.
6749 You don't need to know what this is for. But it can be used to detect
6750 whether a @code{gettext} implementation is GNU gettext and not non-GNU
6751 system's native gettext implementation.
6755 @node Temp Programmers, , gettext grok, Programmers
6756 @section Temporary Notes for the Programmers Chapter
6758 @strong{ NOTE: } This documentation section is outdated and needs to be
6762 * Temp Implementations:: Temporary - Two Possible Implementations
6763 * Temp catgets:: Temporary - About @code{catgets}
6764 * Temp WSI:: Temporary - Why a single implementation
6765 * Temp Notes:: Temporary - Notes
6768 @node Temp Implementations, Temp catgets, Temp Programmers, Temp Programmers
6769 @subsection Temporary - Two Possible Implementations
6771 There are two competing methods for language independent messages:
6772 the X/Open @code{catgets} method, and the Uniforum @code{gettext}
6773 method. The @code{catgets} method indexes messages by integers; the
6774 @code{gettext} method indexes them by their English translations.
6775 The @code{catgets} method has been around longer and is supported
6776 by more vendors. The @code{gettext} method is supported by Sun,
6777 and it has been heard that the COSE multi-vendor initiative is
6778 supporting it. Neither method is a POSIX standard; the POSIX.1
6779 committee had a lot of disagreement in this area.
6781 Neither one is in the POSIX standard. There was much disagreement
6782 in the POSIX.1 committee about using the @code{gettext} routines
6783 vs. @code{catgets} (XPG). In the end the committee couldn't
6784 agree on anything, so no messaging system was included as part
6785 of the standard. I believe the informative annex of the standard
6786 includes the XPG3 messaging interfaces, ``@dots{}as an example of
6787 a messaging system that has been implemented@dots{}''
6789 They were very careful not to say anywhere that you should use one
6790 set of interfaces over the other. For more on this topic please
6791 see the Programming for Internationalization FAQ.
6793 @node Temp catgets, Temp WSI, Temp Implementations, Temp Programmers
6794 @subsection Temporary - About @code{catgets}
6796 There have been a few discussions of late on the use of
6797 @code{catgets} as a base. I think it important to present both
6798 sides of the argument and hence am opting to play devil's advocate
6801 I'll not deny the fact that @code{catgets} could have been designed
6802 a lot better. It currently has quite a number of limitations and
6803 these have already been pointed out.
6805 However there is a great deal to be said for consistency and
6806 standardization. A common recurring problem when writing Unix
6807 software is the myriad portability problems across Unix platforms.
6808 It seems as if every Unix vendor had a look at the operating system
6809 and found parts they could improve upon. Undoubtedly, these
6810 modifications are probably innovative and solve real problems.
6811 However, software developers have a hard time keeping up with all
6812 these changes across so many platforms.
6814 And this has prompted the Unix vendors to begin to standardize their
6815 systems. Hence the impetus for Spec1170. Every major Unix vendor
6816 has committed to supporting this standard and every Unix software
6817 developer waits with glee the day they can write software to this
6818 standard and simply recompile (without having to use autoconf)
6819 across different platforms.
6821 As I understand it, Spec1170 is roughly based upon version 4 of the
6822 X/Open Portability Guidelines (XPG4). Because @code{catgets} and
6823 friends are defined in XPG4, I'm led to believe that @code{catgets}
6824 is a part of Spec1170 and hence will become a standardized component
6825 of all Unix systems.
6827 @node Temp WSI, Temp Notes, Temp catgets, Temp Programmers
6828 @subsection Temporary - Why a single implementation
6830 Now it seems kind of wasteful to me to have two different systems
6831 installed for accessing message catalogs. If we do want to remedy
6832 @code{catgets} deficiencies why don't we try to expand @code{catgets}
6833 (in a compatible manner) rather than implement an entirely new system.
6834 Otherwise, we'll end up with two message catalog access systems installed
6835 with an operating system - one set of routines for packages using GNU
6836 @code{gettext} for their internationalization, and another set of routines
6837 (catgets) for all other software. Bloated?
6839 Supposing another catalog access system is implemented. Which do
6840 we recommend? At least for Linux, we need to attract as many
6841 software developers as possible. Hence we need to make it as easy
6842 for them to port their software as possible. Which means supporting
6843 @code{catgets}. We will be implementing the @code{libintl} code
6844 within our @code{libc}, but does this mean we also have to incorporate
6845 another message catalog access scheme within our @code{libc} as well?
6846 And what about people who are going to be using the @code{libintl}
6847 + non-@code{catgets} routines. When they port their software to
6848 other platforms, they're now going to have to include the front-end
6849 (@code{libintl}) code plus the back-end code (the non-@code{catgets}
6850 access routines) with their software instead of just including the
6851 @code{libintl} code with their software.
6853 Message catalog support is however only the tip of the iceberg.
6854 What about the data for the other locale categories? They also have
6855 a number of deficiencies. Are we going to abandon them as well and
6856 develop another duplicate set of routines (should @code{libintl}
6857 expand beyond message catalog support)?
6859 Like many parts of Unix that can be improved upon, we're stuck with balancing
6860 compatibility with the past with useful improvements and innovations for
6863 @node Temp Notes, , Temp WSI, Temp Programmers
6864 @subsection Temporary - Notes
6866 X/Open agreed very late on the standard form so that many
6867 implementations differ from the final form. Both of my system (old
6868 Linux catgets and Ultrix-4) have a strange variation.
6870 OK. After incorporating the last changes I have to spend some time on
6871 making the GNU/Linux @code{libc} @code{gettext} functions. So in future
6872 Solaris is not the only system having @code{gettext}.
6874 @node Translators, Maintainers, Programmers, Top
6875 @chapter The Translator's View
6877 @c FIXME: Reorganize whole chapter.
6880 * Trans Intro 0:: Introduction 0
6881 * Trans Intro 1:: Introduction 1
6882 * Discussions:: Discussions
6883 * Organization:: Organization
6884 * Information Flow:: Information Flow
6885 * Translating plural forms:: How to fill in @code{msgstr[0]}, @code{msgstr[1]}
6886 * Prioritizing messages:: How to find which messages to translate first
6889 @node Trans Intro 0, Trans Intro 1, Translators, Translators
6890 @section Introduction 0
6892 @strong{ NOTE: } This documentation section is outdated and needs to be
6895 Free software is going international! The Translation Project is a way
6896 to get maintainers, translators and users all together, so free software
6897 will gradually become able to speak many native languages.
6899 The GNU @code{gettext} tool set contains @emph{everything} maintainers
6900 need for internationalizing their packages for messages. It also
6901 contains quite useful tools for helping translators at localizing
6902 messages to their native language, once a package has already been
6905 To achieve the Translation Project, we need many interested
6906 people who like their own language and write it well, and who are also
6907 able to synergize with other translators speaking the same language.
6908 If you'd like to volunteer to @emph{work} at translating messages,
6909 please send mail to your translating team.
6911 Each team has its own mailing list, courtesy of Linux
6912 International. You may reach your translating team at the address
6913 @file{@var{ll}@@li.org}, replacing @var{ll} by the two-letter @w{ISO 639}
6914 code for your language. Language codes are @emph{not} the same as
6915 country codes given in @w{ISO 3166}. The following translating teams
6919 Chinese @code{zh}, Czech @code{cs}, Danish @code{da}, Dutch @code{nl},
6920 Esperanto @code{eo}, Finnish @code{fi}, French @code{fr}, Irish
6921 @code{ga}, German @code{de}, Greek @code{el}, Italian @code{it},
6922 Japanese @code{ja}, Indonesian @code{in}, Norwegian @code{no}, Polish
6923 @code{pl}, Portuguese @code{pt}, Russian @code{ru}, Spanish @code{es},
6924 Swedish @code{sv} and Turkish @code{tr}.
6928 For example, you may reach the Chinese translating team by writing to
6929 @file{zh@@li.org}. When you become a member of the translating team
6930 for your own language, you may subscribe to its list. For example,
6931 Swedish people can send a message to @w{@file{sv-request@@li.org}},
6932 having this message body:
6938 Keep in mind that team members should be interested in @emph{working}
6939 at translations, or at solving translational difficulties, rather than
6940 merely lurking around. If your team does not exist yet and you want to
6941 start one, please write to @w{@file{coordinator@@translationproject.org}};
6942 you will then reach the coordinator for all translator teams.
6944 A handful of GNU packages have already been adapted and provided
6945 with message translations for several languages. Translation
6946 teams have begun to organize, using these packages as a starting
6947 point. But there are many more packages and many languages for
6948 which we have no volunteer translators. If you would like to
6949 volunteer to work at translating messages, please send mail to
6950 @file{coordinator@@translationproject.org} indicating what language(s)
6953 @node Trans Intro 1, Discussions, Trans Intro 0, Translators
6954 @section Introduction 1
6956 @strong{ NOTE: } This documentation section is outdated and needs to be
6959 This is now official, GNU is going international! Here is the
6960 announcement submitted for the January 1995 GNU Bulletin:
6963 A handful of GNU packages have already been adapted and provided
6964 with message translations for several languages. Translation
6965 teams have begun to organize, using these packages as a starting
6966 point. But there are many more packages and many languages
6967 for which we have no volunteer translators. If you'd like to
6968 volunteer to work at translating messages, please send mail to
6969 @samp{coordinator@@translationproject.org} indicating what language(s)
6973 This document should answer many questions for those who are curious about
6974 the process or would like to contribute. Please at least skim over it,
6975 hoping to cut down a little of the high volume of e-mail generated by this
6976 collective effort towards internationalization of free software.
6978 Most free programming which is widely shared is done in English, and
6979 currently, English is used as the main communicating language between
6980 national communities collaborating to free software. This very document
6981 is written in English. This will not change in the foreseeable future.
6983 However, there is a strong appetite from national communities for
6984 having more software able to write using national language and habits,
6985 and there is an on-going effort to modify free software in such a way
6986 that it becomes able to do so. The experiments driven so far raised
6987 an enthusiastic response from pretesters, so we believe that
6988 internationalization of free software is dedicated to succeed.
6990 For suggestion clarifications, additions or corrections to this
6991 document, please e-mail to @file{coordinator@@translationproject.org}.
6993 @node Discussions, Organization, Trans Intro 1, Translators
6994 @section Discussions
6996 @strong{ NOTE: } This documentation section is outdated and needs to be
6999 Facing this internationalization effort, a few users expressed their
7000 concerns. Some of these doubts are presented and discussed, here.
7003 @item Smaller groups
7005 Some languages are not spoken by a very large number of people, so people
7006 speaking them sometimes consider that there may not be all that much
7007 demand such versions of free software packages. Moreover, many people
7008 being @emph{into computers}, in some countries, generally seem to prefer
7009 English versions of their software.
7011 On the other end, people might enjoy their own language a lot, and be
7012 very motivated at providing to themselves the pleasure of having their
7013 beloved free software speaking their mother tongue. They do themselves
7014 a personal favor, and do not pay that much attention to the number of
7015 people benefiting of their work.
7017 @item Misinterpretation
7019 Other users are shy to push forward their own language, seeing in this
7020 some kind of misplaced propaganda. Someone thought there must be some
7021 users of the language over the networks pestering other people with it.
7023 But any spoken language is worth localization, because there are
7024 people behind the language for whom the language is important and
7025 dear to their hearts.
7027 @item Odd translations
7029 The biggest problem is to find the right translations so that
7030 everybody can understand the messages. Translations are usually a
7031 little odd. Some people get used to English, to the extent they may
7032 find translations into their own language ``rather pushy, obnoxious
7033 and sometimes even hilarious.'' As a French speaking man, I have
7034 the experience of those instruction manuals for goods, so poorly
7035 translated in French in Korea or Taiwan@dots{}
7037 The fact is that we sometimes have to create a kind of national
7038 computer culture, and this is not easy without the collaboration of
7039 many people liking their mother tongue. This is why translations are
7040 better achieved by people knowing and loving their own language, and
7041 ready to work together at improving the results they obtain.
7043 @item Dependencies over the GPL or LGPL
7045 Some people wonder if using GNU @code{gettext} necessarily brings their
7046 package under the protective wing of the GNU General Public License or
7047 the GNU Lesser General Public License, when they do not want to make
7048 their program free, or want other kinds of freedom. The simplest
7049 answer is ``normally not''.
7051 The @code{gettext-runtime} part of GNU @code{gettext}, i.e.@: the
7052 contents of @code{libintl}, is covered by the GNU Lesser General Public
7053 License. The @code{gettext-tools} part of GNU @code{gettext}, i.e.@: the
7054 rest of the GNU @code{gettext} package, is covered by the GNU General
7057 The mere marking of localizable strings in a package, or conditional
7058 inclusion of a few lines for initialization, is not really including
7059 GPL'ed or LGPL'ed code. However, since the localization routines in
7060 @code{libintl} are under the LGPL, the LGPL needs to be considered.
7061 It gives the right to distribute the complete unmodified source of
7062 @code{libintl} even with non-free programs. It also gives the right
7063 to use @code{libintl} as a shared library, even for non-free programs.
7064 But it gives the right to use @code{libintl} as a static library or
7065 to incorporate @code{libintl} into another library only to free
7070 @node Organization, Information Flow, Discussions, Translators
7071 @section Organization
7073 @strong{ NOTE: } This documentation section is outdated and needs to be
7076 On a larger scale, the true solution would be to organize some kind of
7077 fairly precise set up in which volunteers could participate. I gave
7078 some thought to this idea lately, and realize there will be some
7079 touchy points. I thought of writing to Richard Stallman to launch
7080 such a project, but feel it might be good to shake out the ideas
7081 between ourselves first. Most probably that Linux International has
7082 some experience in the field already, or would like to orchestrate
7083 the volunteer work, maybe. Food for thought, in any case!
7085 I guess we have to setup something early, somehow, that will help
7086 many possible contributors of the same language to interlock and avoid
7087 work duplication, and further be put in contact for solving together
7088 problems particular to their tongue (in most languages, there are many
7089 difficulties peculiar to translating technical English). My Swedish
7090 contributor acknowledged these difficulties, and I'm well aware of
7093 This is surely not a technical issue, but we should manage so the
7094 effort of locale contributors be maximally useful, despite the national
7095 team layer interface between contributors and maintainers.
7097 The Translation Project needs some setup for coordinating language
7098 coordinators. Localizing evolving programs will surely
7099 become a permanent and continuous activity in the free software community,
7101 The setup should be minimally completed and tested before GNU
7102 @code{gettext} becomes an official reality. The e-mail address
7103 @file{coordinator@@translationproject.org} has been set up for receiving
7104 offers from volunteers and general e-mail on these topics. This address
7105 reaches the Translation Project coordinator.
7108 * Central Coordination:: Central Coordination
7109 * National Teams:: National Teams
7110 * Mailing Lists:: Mailing Lists
7113 @node Central Coordination, National Teams, Organization, Organization
7114 @subsection Central Coordination
7116 I also think GNU will need sooner than it thinks, that someone set up
7117 a way to organize and coordinate these groups. Some kind of group
7118 of groups. My opinion is that it would be good that GNU delegates
7119 this task to a small group of collaborating volunteers, shortly.
7120 Perhaps in @file{gnu.announce} a list of this national committee's
7123 My role as coordinator would simply be to refer to Ulrich any German
7124 speaking volunteer interested to localization of free software packages, and
7125 maybe helping national groups to initially organize, while maintaining
7126 national registries for until national groups are ready to take over.
7127 In fact, the coordinator should ease volunteers to get in contact with
7128 one another for creating national teams, which should then select
7129 one coordinator per language, or country (regionalized language).
7130 If well done, the coordination should be useful without being an
7131 overwhelming task, the time to put delegations in place.
7133 @node National Teams, Mailing Lists, Central Coordination, Organization
7134 @subsection National Teams
7136 I suggest we look for volunteer coordinators/editors for individual
7137 languages. These people will scan contributions of translation files
7138 for various programs, for their own languages, and will ensure high
7139 and uniform standards of diction.
7141 From my current experience with other people in these days, those who
7142 provide localizations are very enthusiastic about the process, and are
7143 more interested in the localization process than in the program they
7144 localize, and want to do many programs, not just one. This seems
7145 to confirm that having a coordinator/editor for each language is a
7148 We need to choose someone who is good at writing clear and concise
7149 prose in the language in question. That is hard---we can't check
7150 it ourselves. So we need to ask a few people to judge each others'
7151 writing and select the one who is best.
7153 I announce my prerelease to a few dozen people, and you would not
7154 believe all the discussions it generated already. I shudder to think
7155 what will happen when this will be launched, for true, officially,
7156 world wide. Who am I to arbitrate between two Czekolsovak users
7157 contradicting each other, for example?
7159 I assume that your German is not much better than my French so that
7160 I would not be able to judge about these formulations. What I would
7161 suggest is that for each language there is a group for people who
7162 maintain the PO files and judge about changes. I suspect there will
7163 be cultural differences between how such groups of people will behave.
7164 Some will have relaxed ways, reach consensus easily, and have anyone
7165 of the group relate to the maintainers, while others will fight to
7166 death, organize heavy administrations up to national standards, and
7167 use strict channels.
7169 The German team is putting out a good example. Right now, they are
7170 maybe half a dozen people revising translations of each other and
7171 discussing the linguistic issues. I do not even have all the names.
7172 Ulrich Drepper is taking care of coordinating the German team.
7173 He subscribed to all my pretest lists, so I do not even have to warn
7174 him specifically of incoming releases.
7176 I'm sure, that is a good idea to get teams for each language working
7177 on translations. That will make the translations better and more
7181 * Sub-Cultures:: Sub-Cultures
7182 * Organizational Ideas:: Organizational Ideas
7185 @node Sub-Cultures, Organizational Ideas, National Teams, National Teams
7186 @subsubsection Sub-Cultures
7188 Taking French for example, there are a few sub-cultures around computers
7189 which developed diverging vocabularies. Picking volunteers here and
7190 there without addressing this problem in an organized way, soon in the
7191 project, might produce a distasteful mix of internationalized programs,
7192 and possibly trigger endless quarrels among those who really care.
7194 Keeping some kind of unity in the way French localization of
7195 internationalized programs is achieved is a difficult (and delicate) job.
7196 Knowing the latin character of French people (:-), if we take this
7197 the wrong way, we could end up nowhere, or spoil a lot of energies.
7198 Maybe we should begin to address this problem seriously @emph{before}
7199 GNU @code{gettext} become officially published. And I suspect that this
7202 @node Organizational Ideas, , Sub-Cultures, National Teams
7203 @subsubsection Organizational Ideas
7205 I expect the next big changes after the official release. Please note
7206 that I use the German translation of the short GPL message. We need
7207 to set a few good examples before the localization goes out for true
7208 in the free software community. Here are a few points to discuss:
7212 Each group should have one FTP server (at least one master).
7215 The files on the server should reflect the latest version (of
7216 course!) and it should also contain a RCS directory with the
7217 corresponding archives (I don't have this now).
7220 There should also be a ChangeLog file (this is more useful than the
7221 RCS archive but can be generated automatically from the later by
7225 A @dfn{core group} should judge about questionable changes (for now
7226 this group consists solely by me but I ask some others occasionally;
7227 this also seems to work).
7231 @node Mailing Lists, , National Teams, Organization
7232 @subsection Mailing Lists
7234 If we get any inquiries about GNU @code{gettext}, send them on to:
7237 @file{coordinator@@translationproject.org}
7240 The @file{*-pretest} lists are quite useful to me, maybe the idea could
7241 be generalized to many GNU, and non-GNU packages. But each maintainer
7244 Fran@,{c}ois, we have a mechanism in place here at
7245 @file{gnu.ai.mit.edu} to track teams, support mailing lists for
7246 them and log members. We have a slight preference that you use it.
7247 If this is OK with you, I can get you clued in.
7249 Things are changing! A few years ago, when Daniel Fekete and I
7250 asked for a mailing list for GNU localization, nested at the FSF, we
7251 were politely invited to organize it anywhere else, and so did we.
7252 For communicating with my pretesters, I later made a handful of
7253 mailing lists located at iro.umontreal.ca and administrated by
7254 @code{majordomo}. These lists have been @emph{very} dependable
7257 I suspect that the German team will organize itself a mailing list
7258 located in Germany, and so forth for other countries. But before they
7259 organize for true, it could surely be useful to offer mailing lists
7260 located at the FSF to each national team. So yes, please explain me
7261 how I should proceed to create and handle them.
7263 We should create temporary mailing lists, one per country, to help
7264 people organize. Temporary, because once regrouped and structured, it
7265 would be fair the volunteers from country bring back @emph{their} list
7266 in there and manage it as they want. My feeling is that, in the long
7267 run, each team should run its own list, from within their country.
7268 There also should be some central list to which all teams could
7269 subscribe as they see fit, as long as each team is represented in it.
7271 @node Information Flow, Translating plural forms, Organization, Translators
7272 @section Information Flow
7274 @strong{ NOTE: } This documentation section is outdated and needs to be
7277 There will surely be some discussion about this messages after the
7278 packages are finally released. If people now send you some proposals
7279 for better messages, how do you proceed? Jim, please note that
7280 right now, as I put forward nearly a dozen of localizable programs, I
7281 receive both the translations and the coordination concerns about them.
7283 If I put one of my things to pretest, Ulrich receives the announcement
7284 and passes it on to the German team, who make last minute revisions.
7285 Then he submits the translation files to me @emph{as the maintainer}.
7286 For free packages I do not maintain, I would not even hear about it.
7287 This scheme could be made to work for the whole Translation Project,
7288 I think. For security reasons, maybe Ulrich (national coordinators,
7289 in fact) should update central registry kept at the Translation Project
7290 (Jim, me, or Len's recruits) once in a while.
7292 In December/January, I was aggressively ready to internationalize
7293 all of GNU, giving myself the duty of one small GNU package per week
7294 or so, taking many weeks or months for bigger packages. But it does
7295 not work this way. I first did all the things I'm responsible for.
7296 I've nothing against some missionary work on other maintainers, but
7297 I'm also losing a lot of energy over it---same debates over again.
7299 And when the first localized packages are released we'll get a lot of
7300 responses about ugly translations :-). Surely, and we need to have
7301 beforehand a fairly good idea about how to handle the information
7302 flow between the national teams and the package maintainers.
7304 Please start saving somewhere a quick history of each PO file. I know
7305 for sure that the file format will change, allowing for comments.
7306 It would be nice that each file has a kind of log, and references for
7307 those who want to submit comments or gripes, or otherwise contribute.
7308 I sent a proposal for a fast and flexible format, but it is not
7309 receiving acceptance yet by the GNU deciders. I'll tell you when I
7310 have more information about this.
7312 @node Translating plural forms, Prioritizing messages, Information Flow, Translators
7313 @section Translating plural forms
7315 @cindex plural forms, translating
7316 Suppose you are translating a PO file, and it contains an entry like this:
7320 msgid "One file removed"
7321 msgid_plural "%d files removed"
7327 What does this mean? How do you fill it in?
7329 Such an entry denotes a message with plural forms, that is, a message where
7330 the text depends on a cardinal number. The general form of the message,
7331 in English, is the @code{msgid_plural} line. The @code{msgid} line is the
7332 English singular form, that is, the form for when the number is equal to 1.
7333 More details about plural forms are explained in @ref{Plural forms}.
7335 The first thing you need to look at is the @code{Plural-Forms} line in the
7336 header entry of the PO file. It contains the number of plural forms and a
7337 formula. If the PO file does not yet have such a line, you have to add it.
7338 It only depends on the language into which you are translating. You can
7339 get this info by using the @code{msginit} command (see @ref{Creating}) --
7340 it contains a database of known plural formulas -- or by asking other
7341 members of your translation team.
7343 Suppose the line looks as follows:
7346 "Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n%10>=2 && n"
7347 "%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
7350 It's logically one line; recall that the PO file formatting is allowed to
7351 break long lines so that each physical line fits in 80 monospaced columns.
7353 The value of @code{nplurals} here tells you that there are three plural
7354 forms. The first thing you need to do is to ensure that the entry contains
7355 an @code{msgstr} line for each of the forms:
7359 msgid "One file removed"
7360 msgid_plural "%d files removed"
7366 Then translate the @code{msgid_plural} line and fill it in into each
7371 msgid "One file removed"
7372 msgid_plural "%d files removed"
7373 msgstr[0] "%d slika uklonjenih"
7374 msgstr[1] "%d slika uklonjenih"
7375 msgstr[2] "%d slika uklonjenih"
7378 Now you can refine the translation so that it matches the plural form.
7379 According to the formula above, @code{msgstr[0]} is used when the number
7380 ends in 1 but does not end in 11; @code{msgstr[1]} is used when the number
7381 ends in 2, 3, 4, but not in 12, 13, 14; and @code{msgstr[2]} is used in
7382 all other cases. With this knowledge, you can refine the translations:
7386 msgid "One file removed"
7387 msgid_plural "%d files removed"
7388 msgstr[0] "%d slika je uklonjena"
7389 msgstr[1] "%d datoteke uklonjenih"
7390 msgstr[2] "%d slika uklonjenih"
7393 You noticed that in the English singular form (@code{msgid}) the number
7394 placeholder could be omitted and replaced by the numeral word ``one''.
7395 Can you do this in your translation as well?
7398 msgstr[0] "jednom datotekom je uklonjen"
7402 Well, it depends on whether @code{msgstr[0]} applies only to the number 1,
7403 or to other numbers as well. If, according to the plural formula,
7404 @code{msgstr[0]} applies only to @code{n == 1}, then you can use the
7405 specialized translation without the number placeholder. In our case,
7406 however, @code{msgstr[0]} also applies to the numbers 21, 31, 41, etc.,
7407 and therefore you cannot omit the placeholder.
7409 @node Prioritizing messages, , Translating plural forms, Translators
7410 @section Prioritizing messages: How to determine which messages to translate first
7412 A translator sometimes has only a limited amount of time per week to
7413 spend on a package, and some packages have quite large message catalogs
7414 (over 1000 messages). Therefore she wishes to translate the messages
7415 first that are the most visible to the user, or that occur most frequently.
7416 This section describes how to determine these "most urgent" messages.
7417 It also applies to determine the "next most urgent" messages after the
7418 message catalog has already been partially translated.
7420 In a first step, she uses the programs like a user would do. While she
7421 does this, the GNU @code{gettext} library logs into a file the not yet
7422 translated messages for which a translation was requested from the program.
7424 In a second step, she uses the PO mode to translate precisely this set
7427 @vindex GETTEXT_LOG_UNTRANSLATED@r{, environment variable}
7428 Here a more details. The GNU @code{libintl} library (but not the
7429 corresponding functions in GNU @code{libc}) supports an environment variable
7430 @code{GETTEXT_LOG_UNTRANSLATED}. The GNU @code{libintl} library will
7431 log into this file the messages for which @code{gettext()} and related
7432 functions couldn't find the translation. If the file doesn't exist, it
7433 will be created as needed. On systems with GNU @code{libc} a shared library
7434 @samp{preloadable_libintl.so} is provided that can be used with the ELF
7435 @samp{LD_PRELOAD} mechanism.
7437 So, in the first step, the translator uses these commands on systems with
7441 $ LD_PRELOAD=/usr/local/lib/preloadable_libintl.so
7443 $ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused
7444 $ export GETTEXT_LOG_UNTRANSLATED
7448 and these commands on other systems:
7451 $ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused
7452 $ export GETTEXT_LOG_UNTRANSLATED
7455 Then she uses and peruses the programs. (It is a good and recommended
7456 practice to use the programs for which you provide translations: it
7457 gives you the needed context.) When done, she removes the environment
7462 $ unset GETTEXT_LOG_UNTRANSLATED
7465 The second step starts with removing duplicates:
7468 $ msguniq $HOME/gettextlogused > missing.po
7471 The result is a PO file, but needs some preprocessing before a PO file editor
7472 can be used with it. First, it is a multi-domain PO file, containing
7473 messages from many translation domains. Second, it lacks all translator
7474 comments and source references. Here is how to get a list of the affected
7475 translation domains:
7478 $ sed -n -e 's,^domain "\(.*\)"$,\1,p' < missing.po | sort | uniq
7481 Then the translator can handle the domains one by one. For simplicity,
7482 let's use environment variables to denote the language, domain and source
7486 $ lang=nl # your language
7487 $ domain=coreutils # the name of the domain to be handled
7488 $ package=/usr/src/gnu/coreutils-4.5.4 # the package where it comes from
7491 She takes the latest copy of @file{$lang.po} from the Translation Project,
7492 or from the package (in most cases, @file{$package/po/$lang.po}), or
7493 creates a fresh one if she's the first translator (see @ref{Creating}).
7494 She then uses the following commands to mark the not urgent messages as
7495 "obsolete". (This doesn't mean that these messages - translated and
7496 untranslated ones - will go away. It simply means that the PO file editor
7497 will ignore them in the following editing session.)
7500 $ msggrep --domain=$domain missing.po | grep -v '^domain' \
7501 > $domain-missing.po
7502 $ msgattrib --set-obsolete --ignore-file $domain-missing.po $domain.$lang.po \
7503 > $domain.$lang-urgent.po
7506 The she translates @file{$domain.$lang-urgent.po} by use of a PO file editor
7508 (FIXME: I don't know whether @code{KBabel} and @code{gtranslator} also
7509 preserve obsolete messages, as they should.)
7510 Finally she restores the not urgent messages (with their earlier
7511 translations, for those which were already translated) through this command:
7514 $ msgmerge --no-fuzzy-matching $domain.$lang-urgent.po $package/po/$domain.pot \
7518 Then she can submit @file{$domain.$lang.po} and proceed to the next domain.
7520 @node Maintainers, Installers, Translators, Top
7521 @chapter The Maintainer's View
7522 @cindex package maintainer's view of @code{gettext}
7524 The maintainer of a package has many responsibilities. One of them
7525 is ensuring that the package will install easily on many platforms,
7526 and that the magic we described earlier (@pxref{Users}) will work
7527 for installers and end users.
7529 Of course, there are many possible ways by which GNU @code{gettext}
7530 might be integrated in a distribution, and this chapter does not cover
7531 them in all generality. Instead, it details one possible approach which
7532 is especially adequate for many free software distributions following GNU
7533 standards, or even better, Gnits standards, because GNU @code{gettext}
7534 is purposely for helping the internationalization of the whole GNU
7535 project, and as many other good free packages as possible. So, the
7536 maintainer's view presented here presumes that the package already has
7537 a @file{configure.ac} file and uses GNU Autoconf.
7539 Nevertheless, GNU @code{gettext} may surely be useful for free packages
7540 not following GNU standards and conventions, but the maintainers of such
7541 packages might have to show imagination and initiative in organizing
7542 their distributions so @code{gettext} work for them in all situations.
7543 There are surely many, out there.
7545 Even if @code{gettext} methods are now stabilizing, slight adjustments
7546 might be needed between successive @code{gettext} versions, so you
7547 should ideally revise this chapter in subsequent releases, looking
7551 * Flat and Non-Flat:: Flat or Non-Flat Directory Structures
7552 * Prerequisites:: Prerequisite Works
7553 * gettextize Invocation:: Invoking the @code{gettextize} Program
7554 * Adjusting Files:: Files You Must Create or Alter
7555 * autoconf macros:: Autoconf macros for use in @file{configure.ac}
7556 * CVS Issues:: Integrating with CVS
7557 * Release Management:: Creating a Distribution Tarball
7560 @node Flat and Non-Flat, Prerequisites, Maintainers, Maintainers
7561 @section Flat or Non-Flat Directory Structures
7563 Some free software packages are distributed as @code{tar} files which unpack
7564 in a single directory, these are said to be @dfn{flat} distributions.
7565 Other free software packages have a one level hierarchy of subdirectories, using
7566 for example a subdirectory named @file{doc/} for the Texinfo manual and
7567 man pages, another called @file{lib/} for holding functions meant to
7568 replace or complement C libraries, and a subdirectory @file{src/} for
7569 holding the proper sources for the package. These other distributions
7570 are said to be @dfn{non-flat}.
7572 We cannot say much about flat distributions. A flat
7573 directory structure has the disadvantage of increasing the difficulty
7574 of updating to a new version of GNU @code{gettext}. Also, if you have
7575 many PO files, this could somewhat pollute your single directory.
7576 Also, GNU @code{gettext}'s libintl sources consist of C sources, shell
7577 scripts, @code{sed} scripts and complicated Makefile rules, which don't
7578 fit well into an existing flat structure. For these reasons, we
7579 recommend to use non-flat approach in this case as well.
7581 Maybe because GNU @code{gettext} itself has a non-flat structure,
7582 we have more experience with this approach, and this is what will be
7583 described in the remaining of this chapter. Some maintainers might
7584 use this as an opportunity to unflatten their package structure.
7586 @node Prerequisites, gettextize Invocation, Flat and Non-Flat, Maintainers
7587 @section Prerequisite Works
7588 @cindex converting a package to use @code{gettext}
7589 @cindex migration from earlier versions of @code{gettext}
7590 @cindex upgrading to new versions of @code{gettext}
7592 There are some works which are required for using GNU @code{gettext}
7593 in one of your package. These works have some kind of generality
7594 that escape the point by point descriptions used in the remainder
7595 of this chapter. So, we describe them here.
7599 Before attempting to use @code{gettextize} you should install some
7600 other packages first.
7601 Ensure that recent versions of GNU @code{m4}, GNU Autoconf and GNU
7602 @code{gettext} are already installed at your site, and if not, proceed
7603 to do this first. If you get to install these things, beware that
7604 GNU @code{m4} must be fully installed before GNU Autoconf is even
7607 To further ease the task of a package maintainer the @code{automake}
7608 package was designed and implemented. GNU @code{gettext} now uses this
7609 tool and the @file{Makefile}s in the @file{intl/} and @file{po/}
7610 therefore know about all the goals necessary for using @code{automake}
7611 and @file{libintl} in one project.
7613 Those four packages are only needed by you, as a maintainer; the
7614 installers of your own package and end users do not really need any of
7615 GNU @code{m4}, GNU Autoconf, GNU @code{gettext}, or GNU @code{automake}
7616 for successfully installing and running your package, with messages
7617 properly translated. But this is not completely true if you provide
7618 internationalized shell scripts within your own package: GNU
7619 @code{gettext} shall then be installed at the user site if the end users
7620 want to see the translation of shell script messages.
7623 Your package should use Autoconf and have a @file{configure.ac} or
7624 @file{configure.in} file.
7625 If it does not, you have to learn how. The Autoconf documentation
7626 is quite well written, it is a good idea that you print it and get
7630 Your C sources should have already been modified according to
7631 instructions given earlier in this manual. @xref{Sources}.
7634 Your @file{po/} directory should receive all PO files submitted to you
7635 by the translator teams, each having @file{@var{ll}.po} as a name.
7636 This is not usually easy to get translation
7637 work done before your package gets internationalized and available!
7638 Since the cycle has to start somewhere, the easiest for the maintainer
7639 is to start with absolutely no PO files, and wait until various
7640 translator teams get interested in your package, and submit PO files.
7644 It is worth adding here a few words about how the maintainer should
7645 ideally behave with PO files submissions. As a maintainer, your role is
7646 to authenticate the origin of the submission as being the representative
7647 of the appropriate translating teams of the Translation Project (forward
7648 the submission to @file{coordinator@@translationproject.org} in case of doubt),
7649 to ensure that the PO file format is not severely broken and does not
7650 prevent successful installation, and for the rest, to merely put these
7651 PO files in @file{po/} for distribution.
7653 As a maintainer, you do not have to take on your shoulders the
7654 responsibility of checking if the translations are adequate or
7655 complete, and should avoid diving into linguistic matters. Translation
7656 teams drive themselves and are fully responsible of their linguistic
7657 choices for the Translation Project. Keep in mind that translator teams are @emph{not}
7658 driven by maintainers. You can help by carefully redirecting all
7659 communications and reports from users about linguistic matters to the
7660 appropriate translation team, or explain users how to reach or join
7661 their team. The simplest might be to send them the @file{ABOUT-NLS} file.
7663 Maintainers should @emph{never ever} apply PO file bug reports
7664 themselves, short-cutting translation teams. If some translator has
7665 difficulty to get some of her points through her team, it should not be
7666 an option for her to directly negotiate translations with maintainers.
7667 Teams ought to settle their problems themselves, if any. If you, as
7668 a maintainer, ever think there is a real problem with a team, please
7669 never try to @emph{solve} a team's problem on your own.
7671 @node gettextize Invocation, Adjusting Files, Prerequisites, Maintainers
7672 @section Invoking the @code{gettextize} Program
7674 @include gettextize.texi
7676 @node Adjusting Files, autoconf macros, gettextize Invocation, Maintainers
7677 @section Files You Must Create or Alter
7678 @cindex @code{gettext} files
7680 Besides files which are automatically added through @code{gettextize},
7681 there are many files needing revision for properly interacting with
7682 GNU @code{gettext}. If you are closely following GNU standards for
7683 Makefile engineering and auto-configuration, the adaptations should
7684 be easier to achieve. Here is a point by point description of the
7685 changes needed in each.
7687 So, here comes a list of files, each one followed by a description of
7688 all alterations it needs. Many examples are taken out from the GNU
7689 @code{gettext} @value{VERSION} distribution itself, or from the GNU
7690 @code{hello} distribution (@uref{http://www.franken.de/users/gnu/ke/hello}
7691 or @uref{http://www.gnu.franken.de/ke/hello/}) You may indeed
7692 refer to the source code of the GNU @code{gettext} and GNU @code{hello}
7693 packages, as they are intended to be good examples for using GNU
7694 gettext functionality.
7697 * po/POTFILES.in:: @file{POTFILES.in} in @file{po/}
7698 * po/LINGUAS:: @file{LINGUAS} in @file{po/}
7699 * po/Makevars:: @file{Makevars} in @file{po/}
7700 * po/Rules-*:: Extending @file{Makefile} in @file{po/}
7701 * configure.ac:: @file{configure.ac} at top level
7702 * config.guess:: @file{config.guess}, @file{config.sub} at top level
7703 * mkinstalldirs:: @file{mkinstalldirs} at top level
7704 * aclocal:: @file{aclocal.m4} at top level
7705 * acconfig:: @file{acconfig.h} at top level
7706 * config.h.in:: @file{config.h.in} at top level
7707 * Makefile:: @file{Makefile.in} at top level
7708 * src/Makefile:: @file{Makefile.in} in @file{src/}
7709 * lib/gettext.h:: @file{gettext.h} in @file{lib/}
7712 @node po/POTFILES.in, po/LINGUAS, Adjusting Files, Adjusting Files
7713 @subsection @file{POTFILES.in} in @file{po/}
7714 @cindex @file{POTFILES.in} file
7716 The @file{po/} directory should receive a file named
7717 @file{POTFILES.in}. This file tells which files, among all program
7718 sources, have marked strings needing translation. Here is an example
7723 # List of source files containing translatable strings.
7724 # Copyright (C) 1995 Free Software Foundation, Inc.
7726 # Common library files
7731 # Package source files
7739 Hash-marked comments and white lines are ignored. All other lines
7740 list those source files containing strings marked for translation
7741 (@pxref{Mark Keywords}), in a notation relative to the top level
7742 of your whole distribution, rather than the location of the
7743 @file{POTFILES.in} file itself.
7745 When a C file is automatically generated by a tool, like @code{flex} or
7746 @code{bison}, that doesn't introduce translatable strings by itself,
7747 it is recommended to list in @file{po/POTFILES.in} the real source file
7748 (ending in @file{.l} in the case of @code{flex}, or in @file{.y} in the
7749 case of @code{bison}), not the generated C file.
7751 @node po/LINGUAS, po/Makevars, po/POTFILES.in, Adjusting Files
7752 @subsection @file{LINGUAS} in @file{po/}
7753 @cindex @file{LINGUAS} file
7755 The @file{po/} directory should also receive a file named
7756 @file{LINGUAS}. This file contains the list of available translations.
7757 It is a whitespace separated list. Hash-marked comments and white lines
7758 are ignored. Here is an example file:
7762 # Set of available languages.
7768 This example means that German and French PO files are available, so
7769 that these languages are currently supported by your package. If you
7770 want to further restrict, at installation time, the set of installed
7771 languages, this should not be done by modifying the @file{LINGUAS} file,
7772 but rather by using the @code{LINGUAS} environment variable
7773 (@pxref{Installers}).
7775 It is recommended that you add the "languages" @samp{en@@quot} and
7776 @samp{en@@boldquot} to the @code{LINGUAS} file. @code{en@@quot} is a
7777 variant of English message catalogs (@code{en}) which uses real quotation
7778 marks instead of the ugly looking asymmetric ASCII substitutes @samp{`}
7779 and @samp{'}. @code{en@@boldquot} is a variant of @code{en@@quot} that
7780 additionally outputs quoted pieces of text in a bold font, when used in
7781 a terminal emulator which supports the VT100 escape sequences (such as
7782 @code{xterm} or the Linux console, but not Emacs in @kbd{M-x shell} mode).
7784 These extra message catalogs @samp{en@@quot} and @samp{en@@boldquot}
7785 are constructed automatically, not by translators; to support them, you
7786 need the files @file{Rules-quot}, @file{quot.sed}, @file{boldquot.sed},
7787 @file{en@@quot.header}, @file{en@@boldquot.header}, @file{insert-header.sin}
7788 in the @file{po/} directory. You can copy them from GNU gettext's @file{po/}
7789 directory; they are also installed by running @code{gettextize}.
7791 @node po/Makevars, po/Rules-*, po/LINGUAS, Adjusting Files
7792 @subsection @file{Makevars} in @file{po/}
7793 @cindex @file{Makevars} file
7795 The @file{po/} directory also has a file named @file{Makevars}. It
7796 contains variables that are specific to your project. @file{po/Makevars}
7797 gets inserted into the @file{po/Makefile} when the latter is created.
7798 The variables thus take effect when the POT file is created or updated,
7799 and when the message catalogs get installed.
7801 The first three variables can be left unmodified if your package has a
7802 single message domain and, accordingly, a single @file{po/} directory.
7803 Only packages which have multiple @file{po/} directories at different
7804 locations need to adjust the three first variables defined in
7807 As an alternative to the @code{XGETTEXT_OPTIONS} variables, it is also
7808 possible to specify @code{xgettext} options through the
7809 @code{AM_XGETTEXT_OPTION} autoconf macro. See @ref{AM_XGETTEXT_OPTION}.
7811 @node po/Rules-*, configure.ac, po/Makevars, Adjusting Files
7812 @subsection Extending @file{Makefile} in @file{po/}
7813 @cindex @file{Makefile.in.in} extensions
7815 All files called @file{Rules-*} in the @file{po/} directory get appended to
7816 the @file{po/Makefile} when it is created. They present an opportunity to
7817 add rules for special PO files to the Makefile, without needing to mess
7818 with @file{po/Makefile.in.in}.
7820 @cindex quotation marks
7821 @vindex LANGUAGE@r{, environment variable}
7822 GNU gettext comes with a @file{Rules-quot} file, containing rules for
7823 building catalogs @file{en@@quot.po} and @file{en@@boldquot.po}. The
7824 effect of @file{en@@quot.po} is that people who set their @code{LANGUAGE}
7825 environment variable to @samp{en@@quot} will get messages with proper
7826 looking symmetric Unicode quotation marks instead of abusing the ASCII
7827 grave accent and the ASCII apostrophe for indicating quotations. To
7828 enable this catalog, simply add @code{en@@quot} to the @file{po/LINGUAS}
7829 file. The effect of @file{en@@boldquot.po} is that people who set
7830 @code{LANGUAGE} to @samp{en@@boldquot} will get not only proper quotation
7831 marks, but also the quoted text will be shown in a bold font on terminals
7832 and consoles. This catalog is useful only for command-line programs, not
7833 GUI programs. To enable it, similarly add @code{en@@boldquot} to the
7834 @file{po/LINGUAS} file.
7836 Similarly, you can create rules for building message catalogs for the
7837 @file{sr@@latin} locale -- Serbian written with the Latin alphabet --
7838 from those for the @file{sr} locale -- Serbian written with Cyrillic
7839 letters. See @ref{msgfilter Invocation}.
7841 @node configure.ac, config.guess, po/Rules-*, Adjusting Files
7842 @subsection @file{configure.ac} at top level
7844 @file{configure.ac} or @file{configure.in} - this is the source from which
7845 @code{autoconf} generates the @file{configure} script.
7848 @item Declare the package and version.
7849 @cindex package and version declaration in @file{configure.ac}
7851 This is done by a set of lines like these:
7855 VERSION=@value{VERSION}
7856 AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE")
7857 AC_DEFINE_UNQUOTED(VERSION, "$VERSION")
7863 or, if you are using GNU @code{automake}, by a line like this:
7866 AM_INIT_AUTOMAKE(gettext, @value{VERSION})
7870 Of course, you replace @samp{gettext} with the name of your package,
7871 and @samp{@value{VERSION}} by its version numbers, exactly as they
7872 should appear in the packaged @code{tar} file name of your distribution
7873 (@file{gettext-@value{VERSION}.tar.gz}, here).
7875 @item Check for internationalization support.
7877 Here is the main @code{m4} macro for triggering internationalization
7878 support. Just add this line to @file{configure.ac}:
7885 This call is purposely simple, even if it generates a lot of configure
7886 time checking and actions.
7888 If you have suppressed the @file{intl/} subdirectory by calling
7889 @code{gettextize} without @samp{--intl} option, this call should read
7892 AM_GNU_GETTEXT([external])
7895 @item Have output files created.
7897 The @code{AC_OUTPUT} directive, at the end of your @file{configure.ac}
7898 file, needs to be modified in two ways:
7901 AC_OUTPUT([@var{existing configuration files} intl/Makefile po/Makefile.in],
7902 [@var{existing additional actions}])
7905 The modification to the first argument to @code{AC_OUTPUT} asks
7906 for substitution in the @file{intl/} and @file{po/} directories.
7907 Note the @samp{.in} suffix used for @file{po/} only. This is because
7908 the distributed file is really @file{po/Makefile.in.in}.
7910 If you have suppressed the @file{intl/} subdirectory by calling
7911 @code{gettextize} without @samp{--intl} option, then you don't need to
7912 add @code{intl/Makefile} to the @code{AC_OUTPUT} line.
7916 If, after doing the recommended modifications, a command like
7917 @samp{aclocal -I m4} or @samp{autoconf} or @samp{autoreconf} fails with
7918 a trace similar to this:
7921 configure.ac:44: warning: AC_COMPILE_IFELSE was called before AC_GNU_SOURCE
7922 ../../lib/autoconf/specific.m4:335: AC_GNU_SOURCE is expanded from...
7923 m4/lock.m4:224: gl_LOCK is expanded from...
7924 m4/gettext.m4:571: gt_INTL_SUBDIR_CORE is expanded from...
7925 m4/gettext.m4:472: AM_INTL_SUBDIR is expanded from...
7926 m4/gettext.m4:347: AM_GNU_GETTEXT is expanded from...
7927 configure.ac:44: the top level
7928 configure.ac:44: warning: AC_RUN_IFELSE was called before AC_GNU_SOURCE
7932 you need to add an explicit invocation of @samp{AC_GNU_SOURCE} in the
7933 @file{configure.ac} file - after @samp{AC_PROG_CC} but before
7934 @samp{AM_GNU_GETTEXT}, most likely very close to the @samp{AC_PROG_CC}
7935 invocation. This is necessary because of ordering restrictions imposed
7938 @node config.guess, mkinstalldirs, configure.ac, Adjusting Files
7939 @subsection @file{config.guess}, @file{config.sub} at top level
7941 If you haven't suppressed the @file{intl/} subdirectory,
7942 you need to add the GNU @file{config.guess} and @file{config.sub} files
7943 to your distribution. They are needed because the @file{intl/} directory
7944 has platform dependent support for determining the locale's character
7945 encoding and therefore needs to identify the platform.
7947 You can obtain the newest version of @file{config.guess} and
7948 @file{config.sub} from the @samp{config} project at
7949 @file{http://savannah.gnu.org/}. The commands to fetch them are
7951 $ wget -O config.guess 'http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD'
7952 $ wget -O config.sub 'http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD'
7955 Less recent versions are also contained in the GNU @code{automake} and
7956 GNU @code{libtool} packages.
7958 Normally, @file{config.guess} and @file{config.sub} are put at the
7959 top level of a distribution. But it is also possible to put them in a
7960 subdirectory, altogether with other configuration support files like
7961 @file{install-sh}, @file{ltconfig}, @file{ltmain.sh} or @file{missing}.
7962 All you need to do, other than moving the files, is to add the following line
7963 to your @file{configure.ac}.
7966 AC_CONFIG_AUX_DIR([@var{subdir}])
7969 @node mkinstalldirs, aclocal, config.guess, Adjusting Files
7970 @subsection @file{mkinstalldirs} at top level
7971 @cindex @file{mkinstalldirs} file
7973 With earlier versions of GNU gettext, you needed to add the GNU
7974 @file{mkinstalldirs} script to your distribution. This is not needed any
7975 more. You can remove it if you not also using an automake version older than
7978 @node aclocal, acconfig, mkinstalldirs, Adjusting Files
7979 @subsection @file{aclocal.m4} at top level
7980 @cindex @file{aclocal.m4} file
7982 If you do not have an @file{aclocal.m4} file in your distribution,
7983 the simplest is to concatenate the files @file{codeset.m4}, @file{fcntl-o.m4},
7984 @file{gettext.m4}, @file{glibc2.m4}, @file{glibc21.m4}, @file{iconv.m4},
7985 @file{intdiv0.m4}, @file{intl.m4}, @file{intldir.m4}, @file{intlmacosx.m4},
7986 @file{intmax.m4}, @file{inttypes_h.m4}, @file{inttypes-pri.m4},
7987 @file{lcmessage.m4}, @file{lib-ld.m4}, @file{lib-link.m4},
7988 @file{lib-prefix.m4}, @file{lock.m4}, @file{longlong.m4}, @file{nls.m4},
7989 @file{po.m4}, @file{printf-posix.m4}, @file{progtest.m4}, @file{size_max.m4},
7990 @file{stdint_h.m4}, @file{threadlib.m4}, @file{uintmax_t.m4},
7991 @file{visibility.m4}, @file{wchar_t.m4}, @file{wint_t.m4}, @file{xsize.m4}
7992 from GNU @code{gettext}'s
7993 @file{m4/} directory into a single file. If you have suppressed the
7994 @file{intl/} directory, only @file{gettext.m4}, @file{iconv.m4},
7995 @file{lib-ld.m4}, @file{lib-link.m4}, @file{lib-prefix.m4},
7996 @file{nls.m4}, @file{po.m4}, @file{progtest.m4} need to be concatenated.
7998 If you are not using GNU @code{automake} 1.8 or newer, you will need to
7999 add a file @file{mkdirp.m4} from a newer automake distribution to the
8000 list of files above.
8002 If you already have an @file{aclocal.m4} file, then you will have
8003 to merge the said macro files into your @file{aclocal.m4}. Note that if
8004 you are upgrading from a previous release of GNU @code{gettext}, you
8005 should most probably @emph{replace} the macros (@code{AM_GNU_GETTEXT},
8006 etc.), as they usually
8007 change a little from one release of GNU @code{gettext} to the next.
8008 Their contents may vary as we get more experience with strange systems
8011 If you are using GNU @code{automake} 1.5 or newer, it is enough to put
8012 these macro files into a subdirectory named @file{m4/} and add the line
8015 ACLOCAL_AMFLAGS = -I m4
8019 to your top level @file{Makefile.am}.
8021 If you are using GNU @code{automake} 1.10 or newer, it is even easier:
8025 ACLOCAL_AMFLAGS = --install -I m4
8029 to your top level @file{Makefile.am}, and run @samp{aclocal --install -I m4}.
8030 This will copy the needed files to the @file{m4/} subdirectory automatically,
8031 before updating @file{aclocal.m4}.
8033 These macros check for the internationalization support functions
8034 and related informations. Hopefully, once stabilized, these macros
8035 might be integrated in the standard Autoconf set, because this
8036 piece of @code{m4} code will be the same for all projects using GNU
8039 @node acconfig, config.h.in, aclocal, Adjusting Files
8040 @subsection @file{acconfig.h} at top level
8041 @cindex @file{acconfig.h} file
8043 Earlier GNU @code{gettext} releases required to put definitions for
8044 @code{ENABLE_NLS}, @code{HAVE_GETTEXT} and @code{HAVE_LC_MESSAGES},
8045 @code{HAVE_STPCPY}, @code{PACKAGE} and @code{VERSION} into an
8046 @file{acconfig.h} file. This is not needed any more; you can remove
8047 them from your @file{acconfig.h} file unless your package uses them
8048 independently from the @file{intl/} directory.
8050 @node config.h.in, Makefile, acconfig, Adjusting Files
8051 @subsection @file{config.h.in} at top level
8052 @cindex @file{config.h.in} file
8054 The include file template that holds the C macros to be defined by
8055 @code{configure} is usually called @file{config.h.in} and may be
8056 maintained either manually or automatically.
8058 If @code{gettextize} has created an @file{intl/} directory, this file
8059 must be called @file{config.h.in} and must be at the top level. If,
8060 however, you have suppressed the @file{intl/} directory by calling
8061 @code{gettextize} without @samp{--intl} option, then you can choose the
8062 name of this file and its location freely.
8064 If it is maintained automatically, by use of the @samp{autoheader}
8065 program, you need to do nothing about it. This is the case in particular
8066 if you are using GNU @code{automake}.
8068 If it is maintained manually, and if @code{gettextize} has created an
8069 @file{intl/} directory, you should switch to using @samp{autoheader}.
8070 The list of C macros to be added for the sake of the @file{intl/}
8071 directory is just too long to be maintained manually; it also changes
8072 between different versions of GNU @code{gettext}.
8074 If it is maintained manually, and if on the other hand you have
8075 suppressed the @file{intl/} directory by calling @code{gettextize}
8076 without @samp{--intl} option, then you can get away by adding the
8077 following lines to @file{config.h.in}:
8080 /* Define to 1 if translation of program messages to the user's
8081 native language is requested. */
8085 @node Makefile, src/Makefile, config.h.in, Adjusting Files
8086 @subsection @file{Makefile.in} at top level
8088 Here are a few modifications you need to make to your main, top-level
8089 @file{Makefile.in} file.
8093 Add the following lines near the beginning of your @file{Makefile.in},
8094 so the @samp{dist:} goal will work properly (as explained further down):
8097 PACKAGE = @@PACKAGE@@
8098 VERSION = @@VERSION@@
8102 Add file @file{ABOUT-NLS} to the @code{DISTFILES} definition, so the file gets
8106 Wherever you process subdirectories in your @file{Makefile.in}, be sure
8107 you also process the subdirectories @samp{intl} and @samp{po}. Special
8108 rules in the @file{Makefiles} take care for the case where no
8109 internationalization is wanted.
8111 If you are using Makefiles, either generated by automake, or hand-written
8112 so they carefully follow the GNU coding standards, the effected goals for
8113 which the new subdirectories must be handled include @samp{installdirs},
8114 @samp{install}, @samp{uninstall}, @samp{clean}, @samp{distclean}.
8116 Here is an example of a canonical order of processing. In this
8117 example, we also define @code{SUBDIRS} in @code{Makefile.in} for it
8118 to be further used in the @samp{dist:} goal.
8121 SUBDIRS = doc intl lib src po
8124 Note that you must arrange for @samp{make} to descend into the
8125 @code{intl} directory before descending into other directories containing
8126 code which make use of the @code{libintl.h} header file. For this
8127 reason, here we mention @code{intl} before @code{lib} and @code{src}.
8130 A delicate point is the @samp{dist:} goal, as both
8131 @file{intl/Makefile} and @file{po/Makefile} will later assume that the
8132 proper directory has been set up from the main @file{Makefile}. Here is
8133 an example at what the @samp{dist:} goal might look like:
8136 distdir = $(PACKAGE)-$(VERSION)
8140 chmod 777 $(distdir)
8141 for file in $(DISTFILES); do \
8142 ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir); \
8144 for subdir in $(SUBDIRS); do \
8145 mkdir $(distdir)/$$subdir || exit 1; \
8146 chmod 777 $(distdir)/$$subdir; \
8147 (cd $$subdir && $(MAKE) $@@) || exit 1; \
8149 tar chozf $(distdir).tar.gz $(distdir)
8155 Note that if you are using GNU @code{automake}, @file{Makefile.in} is
8156 automatically generated from @file{Makefile.am}, and all needed changes
8157 to @file{Makefile.am} are already made by running @samp{gettextize}.
8159 @node src/Makefile, lib/gettext.h, Makefile, Adjusting Files
8160 @subsection @file{Makefile.in} in @file{src/}
8162 Some of the modifications made in the main @file{Makefile.in} will
8163 also be needed in the @file{Makefile.in} from your package sources,
8164 which we assume here to be in the @file{src/} subdirectory. Here are
8165 all the modifications needed in @file{src/Makefile.in}:
8169 In view of the @samp{dist:} goal, you should have these lines near the
8170 beginning of @file{src/Makefile.in}:
8173 PACKAGE = @@PACKAGE@@
8174 VERSION = @@VERSION@@
8178 If not done already, you should guarantee that @code{top_srcdir}
8179 gets defined. This will serve for @code{cpp} include files. Just add
8183 top_srcdir = @@top_srcdir@@
8187 You might also want to define @code{subdir} as @samp{src}, later
8188 allowing for almost uniform @samp{dist:} goals in all your
8189 @file{Makefile.in}. At list, the @samp{dist:} goal below assume that
8197 The @code{main} function of your program will normally call
8198 @code{bindtextdomain} (see @pxref{Triggering}), like this:
8201 bindtextdomain (@var{PACKAGE}, LOCALEDIR);
8202 textdomain (@var{PACKAGE});
8205 To make LOCALEDIR known to the program, add the following lines to
8206 @file{Makefile.in} if you are using Autoconf version 2.60 or newer:
8209 datadir = @@datadir@@
8210 datarootdir= @@datarootdir@@
8211 localedir = @@localedir@@
8212 DEFS = -DLOCALEDIR=\"$(localedir)\" @@DEFS@@
8215 or these lines if your version of Autoconf is older than 2.60:
8218 datadir = @@datadir@@
8219 localedir = $(datadir)/locale
8220 DEFS = -DLOCALEDIR=\"$(localedir)\" @@DEFS@@
8223 Note that @code{@@datadir@@} defaults to @samp{$(prefix)/share}, thus
8224 @code{$(localedir)} defaults to @samp{$(prefix)/share/locale}.
8227 You should ensure that the final linking will use @code{@@LIBINTL@@} or
8228 @code{@@LTLIBINTL@@} as a library. @code{@@LIBINTL@@} is for use without
8229 @code{libtool}, @code{@@LTLIBINTL@@} is for use with @code{libtool}. An
8230 easy way to achieve this is to manage that it gets into @code{LIBS}, like
8234 LIBS = @@LIBINTL@@ @@LIBS@@
8237 In most packages internationalized with GNU @code{gettext}, one will
8238 find a directory @file{lib/} in which a library containing some helper
8239 functions will be build. (You need at least the few functions which the
8240 GNU @code{gettext} Library itself needs.) However some of the functions
8241 in the @file{lib/} also give messages to the user which of course should be
8242 translated, too. Taking care of this, the support library (say
8243 @file{libsupport.a}) should be placed before @code{@@LIBINTL@@} and
8244 @code{@@LIBS@@} in the above example. So one has to write this:
8247 LIBS = ../lib/libsupport.a @@LIBINTL@@ @@LIBS@@
8251 You should also ensure that directory @file{intl/} will be searched for
8252 C preprocessor include files in all circumstances. So, you have to
8253 manage so both @samp{-I../intl} and @samp{-I$(top_srcdir)/intl} will
8254 be given to the C compiler.
8257 Your @samp{dist:} goal has to conform with others. Here is a
8258 reasonable definition for it:
8261 distdir = ../$(PACKAGE)-$(VERSION)/$(subdir)
8262 dist: Makefile $(DISTFILES)
8263 for file in $(DISTFILES); do \
8264 ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir) || exit 1; \
8270 Note that if you are using GNU @code{automake}, @file{Makefile.in} is
8271 automatically generated from @file{Makefile.am}, and the first three
8272 changes and the last change are not necessary. The remaining needed
8273 @file{Makefile.am} modifications are the following:
8277 To make LOCALEDIR known to the program, add the following to
8281 <module>_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\"
8285 for each specific module or compilation unit, or
8288 AM_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\"
8291 for all modules and compilation units together. Furthermore, if you are
8292 using an Autoconf version older then 2.60, add this line to define
8296 localedir = $(datadir)/locale
8300 To ensure that the final linking will use @code{@@LIBINTL@@} or
8301 @code{@@LTLIBINTL@@} as a library, add the following to
8305 <program>_LDADD = @@LIBINTL@@
8309 for each specific program, or
8315 for all programs together. Remember that when you use @code{libtool}
8316 to link a program, you need to use @@LTLIBINTL@@ instead of @@LIBINTL@@
8320 If you have an @file{intl/} directory, whose contents is created by
8321 @code{gettextize}, then to ensure that it will be searched for
8322 C preprocessor include files in all circumstances, add something like
8323 this to @file{Makefile.am}:
8326 AM_CPPFLAGS = -I../intl -I$(top_srcdir)/intl
8331 @node lib/gettext.h, , src/Makefile, Adjusting Files
8332 @subsection @file{gettext.h} in @file{lib/}
8333 @cindex @file{gettext.h} file
8334 @cindex turning off NLS support
8335 @cindex disabling NLS
8337 Internationalization of packages, as provided by GNU @code{gettext}, is
8338 optional. It can be turned off in two situations:
8342 When the installer has specified @samp{./configure --disable-nls}. This
8343 can be useful when small binaries are more important than features, for
8344 example when building utilities for boot diskettes. It can also be useful
8345 in order to get some specific C compiler warnings about code quality with
8346 some older versions of GCC (older than 3.0).
8349 When the package does not include the @code{intl/} subdirectory, and the
8350 libintl.h header (with its associated libintl library, if any) is not
8351 already installed on the system, it is preferable that the package builds
8352 without internationalization support, rather than to give a compilation
8356 A C preprocessor macro can be used to detect these two cases. Usually,
8357 when @code{libintl.h} was found and not explicitly disabled, the
8358 @code{ENABLE_NLS} macro will be defined to 1 in the autoconf generated
8359 configuration file (usually called @file{config.h}). In the two negative
8360 situations, however, this macro will not be defined, thus it will evaluate
8361 to 0 in C preprocessor expressions.
8363 @cindex include file @file{libintl.h}
8364 @file{gettext.h} is a convenience header file for conditional use of
8365 @file{<libintl.h>}, depending on the @code{ENABLE_NLS} macro. If
8366 @code{ENABLE_NLS} is set, it includes @file{<libintl.h>}; otherwise it
8367 defines no-op substitutes for the libintl.h functions. We recommend
8368 the use of @code{"gettext.h"} over direct use of @file{<libintl.h>},
8369 so that portability to older systems is guaranteed and installers can
8370 turn off internationalization if they want to. In the C code, you will
8374 #include "gettext.h"
8381 #include <libintl.h>
8384 The location of @code{gettext.h} is usually in a directory containing
8385 auxiliary include files. In many GNU packages, there is a directory
8386 @file{lib/} containing helper functions; @file{gettext.h} fits there.
8387 In other packages, it can go into the @file{src} directory.
8389 Do not install the @code{gettext.h} file in public locations. Every
8390 package that needs it should contain a copy of it on its own.
8392 @node autoconf macros, CVS Issues, Adjusting Files, Maintainers
8393 @section Autoconf macros for use in @file{configure.ac}
8394 @cindex autoconf macros for @code{gettext}
8396 GNU @code{gettext} installs macros for use in a package's
8397 @file{configure.ac} or @file{configure.in}.
8398 @xref{Top, , Introduction, autoconf, The Autoconf Manual}.
8399 The primary macro is, of course, @code{AM_GNU_GETTEXT}.
8402 * AM_GNU_GETTEXT:: AM_GNU_GETTEXT in @file{gettext.m4}
8403 * AM_GNU_GETTEXT_VERSION:: AM_GNU_GETTEXT_VERSION in @file{gettext.m4}
8404 * AM_GNU_GETTEXT_NEED:: AM_GNU_GETTEXT_NEED in @file{gettext.m4}
8405 * AM_GNU_GETTEXT_INTL_SUBDIR:: AM_GNU_GETTEXT_INTL_SUBDIR in @file{intldir.m4}
8406 * AM_PO_SUBDIRS:: AM_PO_SUBDIRS in @file{po.m4}
8407 * AM_XGETTEXT_OPTION:: AM_XGETTEXT_OPTION in @file{po.m4}
8408 * AM_ICONV:: AM_ICONV in @file{iconv.m4}
8411 @node AM_GNU_GETTEXT, AM_GNU_GETTEXT_VERSION, autoconf macros, autoconf macros
8412 @subsection AM_GNU_GETTEXT in @file{gettext.m4}
8414 @amindex AM_GNU_GETTEXT
8415 The @code{AM_GNU_GETTEXT} macro tests for the presence of the GNU gettext
8416 function family in either the C library or a separate @code{libintl}
8417 library (shared or static libraries are both supported) or in the package's
8418 @file{intl/} directory. It also invokes @code{AM_PO_SUBDIRS}, thus preparing
8419 the @file{po/} directories of the package for building.
8421 @code{AM_GNU_GETTEXT} accepts up to three optional arguments. The general
8425 AM_GNU_GETTEXT([@var{intlsymbol}], [@var{needsymbol}], [@var{intldir}])
8428 @c We don't document @var{intlsymbol} = @samp{use-libtool} here, because
8429 @c it is of no use for packages other than GNU gettext itself. (Such packages
8430 @c are not allowed to install the shared libintl. But if they use libtool,
8431 @c then it is in order to install shared libraries that depend on libintl.)
8432 @var{intlsymbol} can be @samp{external} or @samp{no-libtool}. The default
8433 (if it is not specified or empty) is @samp{no-libtool}. @var{intlsymbol}
8434 should be @samp{external} for packages with no @file{intl/} directory.
8435 For packages with an @file{intl/} directory, you can either use an
8436 @var{intlsymbol} equal to @samp{no-libtool}, or you can use @samp{external}
8437 and override by using the macro @code{AM_GNU_GETTEXT_INTL_SUBDIR} elsewhere.
8438 The two ways to specify the existence of an @file{intl/} directory are
8439 equivalent. At build time, a static library
8440 @code{$(top_builddir)/intl/libintl.a} will then be created.
8442 If @var{needsymbol} is specified and is @samp{need-ngettext}, then GNU
8443 gettext implementations (in libc or libintl) without the @code{ngettext()}
8444 function will be ignored. If @var{needsymbol} is specified and is
8445 @samp{need-formatstring-macros}, then GNU gettext implementations that don't
8446 support the ISO C 99 @file{<inttypes.h>} formatstring macros will be ignored.
8447 Only one @var{needsymbol} can be specified. These requirements can also be
8448 specified by using the macro @code{AM_GNU_GETTEXT_NEED} elsewhere. To specify
8449 more than one requirement, just specify the strongest one among them, or
8450 invoke the @code{AM_GNU_GETTEXT_NEED} macro several times. The hierarchy
8451 among the various alternatives is as follows: @samp{need-formatstring-macros}
8452 implies @samp{need-ngettext}.
8454 @var{intldir} is used to find the intl libraries. If empty, the value
8455 @samp{$(top_builddir)/intl/} is used.
8457 The @code{AM_GNU_GETTEXT} macro determines whether GNU gettext is
8458 available and should be used. If so, it sets the @code{USE_NLS} variable
8459 to @samp{yes}; it defines @code{ENABLE_NLS} to 1 in the autoconf
8460 generated configuration file (usually called @file{config.h}); it sets
8461 the variables @code{LIBINTL} and @code{LTLIBINTL} to the linker options
8462 for use in a Makefile (@code{LIBINTL} for use without libtool,
8463 @code{LTLIBINTL} for use with libtool); it adds an @samp{-I} option to
8464 @code{CPPFLAGS} if necessary. In the negative case, it sets
8465 @code{USE_NLS} to @samp{no}; it sets @code{LIBINTL} and @code{LTLIBINTL}
8466 to empty and doesn't change @code{CPPFLAGS}.
8468 The complexities that @code{AM_GNU_GETTEXT} deals with are the following:
8472 @cindex @code{libintl} library
8473 Some operating systems have @code{gettext} in the C library, for example
8474 glibc. Some have it in a separate library @code{libintl}. GNU @code{libintl}
8475 might have been installed as part of the GNU @code{gettext} package.
8478 GNU @code{libintl}, if installed, is not necessarily already in the search
8479 path (@code{CPPFLAGS} for the include file search path, @code{LDFLAGS} for
8480 the library search path).
8483 Except for glibc, the operating system's native @code{gettext} cannot
8484 exploit the GNU mo files, doesn't have the necessary locale dependency
8485 features, and cannot convert messages from the catalog's text encoding
8486 to the user's locale encoding.
8489 GNU @code{libintl}, if installed, is not necessarily already in the
8490 run time library search path. To avoid the need for setting an environment
8491 variable like @code{LD_LIBRARY_PATH}, the macro adds the appropriate
8492 run time search path options to the @code{LIBINTL} and @code{LTLIBINTL}
8493 variables. This works on most systems, but not on some operating systems
8494 with limited shared library support, like SCO.
8497 GNU @code{libintl} relies on POSIX/XSI @code{iconv}. The macro checks for
8498 linker options needed to use iconv and appends them to the @code{LIBINTL}
8499 and @code{LTLIBINTL} variables.
8502 @node AM_GNU_GETTEXT_VERSION, AM_GNU_GETTEXT_NEED, AM_GNU_GETTEXT, autoconf macros
8503 @subsection AM_GNU_GETTEXT_VERSION in @file{gettext.m4}
8505 @amindex AM_GNU_GETTEXT_VERSION
8506 The @code{AM_GNU_GETTEXT_VERSION} macro declares the version number of
8507 the GNU gettext infrastructure that is used by the package.
8509 The use of this macro is optional; only the @code{autopoint} program makes
8510 use of it (@pxref{CVS Issues}).
8512 @node AM_GNU_GETTEXT_NEED, AM_GNU_GETTEXT_INTL_SUBDIR, AM_GNU_GETTEXT_VERSION, autoconf macros
8513 @subsection AM_GNU_GETTEXT_NEED in @file{gettext.m4}
8515 @amindex AM_GNU_GETTEXT_NEED
8516 The @code{AM_GNU_GETTEXT_NEED} macro declares a constraint regarding the
8517 GNU gettext implementation. The syntax is
8520 AM_GNU_GETTEXT_NEED([@var{needsymbol}])
8523 If @var{needsymbol} is @samp{need-ngettext}, then GNU gettext implementations
8524 (in libc or libintl) without the @code{ngettext()} function will be ignored.
8525 If @var{needsymbol} is @samp{need-formatstring-macros}, then GNU gettext
8526 implementations that don't support the ISO C 99 @file{<inttypes.h>}
8527 formatstring macros will be ignored.
8529 The optional second argument of @code{AM_GNU_GETTEXT} is also taken into
8532 The @code{AM_GNU_GETTEXT_NEED} invocations can occur before or after
8533 the @code{AM_GNU_GETTEXT} invocation; the order doesn't matter.
8535 @node AM_GNU_GETTEXT_INTL_SUBDIR, AM_PO_SUBDIRS, AM_GNU_GETTEXT_NEED, autoconf macros
8536 @subsection AM_GNU_GETTEXT_INTL_SUBDIR in @file{intldir.m4}
8538 @amindex AM_GNU_GETTEXT_INTL_SUBDIR
8539 The @code{AM_GNU_GETTEXT_INTL_SUBDIR} macro specifies that the
8540 @code{AM_GNU_GETTEXT} macro, although invoked with the first argument
8541 @samp{external}, should also prepare for building the @file{intl/}
8544 The @code{AM_GNU_GETTEXT_INTL_SUBDIR} invocation can occur before or after
8545 the @code{AM_GNU_GETTEXT} invocation; the order doesn't matter.
8547 The use of this macro requires GNU automake 1.10 or newer and
8548 GNU autoconf 2.61 or newer.
8550 @node AM_PO_SUBDIRS, AM_XGETTEXT_OPTION, AM_GNU_GETTEXT_INTL_SUBDIR, autoconf macros
8551 @subsection AM_PO_SUBDIRS in @file{po.m4}
8553 @amindex AM_PO_SUBDIRS
8554 The @code{AM_PO_SUBDIRS} macro prepares the @file{po/} directories of the
8555 package for building. This macro should be used in internationalized
8556 programs written in other programming languages than C, C++, Objective C,
8557 for example @code{sh}, @code{Python}, @code{Lisp}. See @ref{Programming
8558 Languages} for a list of programming languages that support localization
8561 The @code{AM_PO_SUBDIRS} macro determines whether internationalization
8562 should be used. If so, it sets the @code{USE_NLS} variable to @samp{yes},
8563 otherwise to @samp{no}. It also determines the right values for Makefile
8564 variables in each @file{po/} directory.
8566 @node AM_XGETTEXT_OPTION, AM_ICONV, AM_PO_SUBDIRS, autoconf macros
8567 @subsection AM_XGETTEXT_OPTION in @file{po.m4}
8569 @amindex AM_XGETTEXT_OPTION
8570 The @code{AM_XGETTEXT_OPTION} macro registers a command-line option to be
8571 used in the invocations of @code{xgettext} in the @file{po/} directories
8574 For example, if you have a source file that defines a function
8575 @samp{error_at_line} whose fifth argument is a format string, you can use
8577 AM_XGETTEXT_OPTION([--flag=error_at_line:5:c-format])
8580 to instruct @code{xgettext} to mark all translatable strings in @samp{gettext}
8581 invocations that occur as fifth argument to this function as @samp{c-format}.
8583 See @ref{xgettext Invocation} for the list of options that @code{xgettext}
8586 The use of this macro is an alternative to the use of the
8587 @samp{XGETTEXT_OPTIONS} variable in @file{po/Makevars}.
8589 @node AM_ICONV, , AM_XGETTEXT_OPTION, autoconf macros
8590 @subsection AM_ICONV in @file{iconv.m4}
8593 The @code{AM_ICONV} macro tests for the presence of the POSIX/XSI
8594 @code{iconv} function family in either the C library or a separate
8595 @code{libiconv} library. If found, it sets the @code{am_cv_func_iconv}
8596 variable to @samp{yes}; it defines @code{HAVE_ICONV} to 1 in the autoconf
8597 generated configuration file (usually called @file{config.h}); it defines
8598 @code{ICONV_CONST} to @samp{const} or to empty, depending on whether the
8599 second argument of @code{iconv()} is of type @samp{const char **} or
8600 @samp{char **}; it sets the variables @code{LIBICONV} and
8601 @code{LTLIBICONV} to the linker options for use in a Makefile
8602 (@code{LIBICONV} for use without libtool, @code{LTLIBICONV} for use with
8603 libtool); it adds an @samp{-I} option to @code{CPPFLAGS} if
8604 necessary. If not found, it sets @code{LIBICONV} and @code{LTLIBICONV} to
8605 empty and doesn't change @code{CPPFLAGS}.
8607 The complexities that @code{AM_ICONV} deals with are the following:
8611 @cindex @code{libiconv} library
8612 Some operating systems have @code{iconv} in the C library, for example
8613 glibc. Some have it in a separate library @code{libiconv}, for example
8614 OSF/1 or FreeBSD. Regardless of the operating system, GNU @code{libiconv}
8615 might have been installed. In that case, it should be used instead of the
8616 operating system's native @code{iconv}.
8619 GNU @code{libiconv}, if installed, is not necessarily already in the search
8620 path (@code{CPPFLAGS} for the include file search path, @code{LDFLAGS} for
8621 the library search path).
8624 GNU @code{libiconv} is binary incompatible with some operating system's
8625 native @code{iconv}, for example on FreeBSD. Use of an @file{iconv.h}
8626 and @file{libiconv.so} that don't fit together would produce program
8630 GNU @code{libiconv}, if installed, is not necessarily already in the
8631 run time library search path. To avoid the need for setting an environment
8632 variable like @code{LD_LIBRARY_PATH}, the macro adds the appropriate
8633 run time search path options to the @code{LIBICONV} variable. This works
8634 on most systems, but not on some operating systems with limited shared
8635 library support, like SCO.
8638 @file{iconv.m4} is distributed with the GNU gettext package because
8639 @file{gettext.m4} relies on it.
8641 @node CVS Issues, Release Management, autoconf macros, Maintainers
8642 @section Integrating with CVS
8644 Many projects use CVS for distributed development, version control and
8645 source backup. This section gives some advice how to manage the uses
8646 of @code{cvs}, @code{gettextize}, @code{autopoint} and @code{autoconf}.
8649 * Distributed CVS:: Avoiding version mismatch in distributed development
8650 * Files under CVS:: Files to put under CVS version control
8651 * autopoint Invocation:: Invoking the @code{autopoint} Program
8654 @node Distributed CVS, Files under CVS, CVS Issues, CVS Issues
8655 @subsection Avoiding version mismatch in distributed development
8657 In a project development with multiple developers, using CVS, there
8658 should be a single developer who occasionally - when there is desire to
8659 upgrade to a new @code{gettext} version - runs @code{gettextize} and
8660 performs the changes listed in @ref{Adjusting Files}, and then commits
8661 his changes to the CVS.
8663 It is highly recommended that all developers on a project use the same
8664 version of GNU @code{gettext} in the package. In other words, if a
8665 developer runs @code{gettextize}, he should go the whole way, make the
8666 necessary remaining changes and commit his changes to the CVS.
8667 Otherwise the following damages will likely occur:
8671 Apparent version mismatch between developers. Since some @code{gettext}
8672 specific portions in @file{configure.ac}, @file{configure.in} and
8673 @code{Makefile.am}, @code{Makefile.in} files depend on the @code{gettext}
8674 version, the use of infrastructure files belonging to different
8675 @code{gettext} versions can easily lead to build errors.
8678 Hidden version mismatch. Such version mismatch can also lead to
8679 malfunctioning of the package, that may be undiscovered by the developers.
8680 The worst case of hidden version mismatch is that internationalization
8681 of the package doesn't work at all.
8684 Release risks. All developers implicitly perform constant testing on
8685 a package. This is important in the days and weeks before a release.
8686 If the guy who makes the release tar files uses a different version
8687 of GNU @code{gettext} than the other developers, the distribution will
8688 be less well tested than if all had been using the same @code{gettext}
8689 version. For example, it is possible that a platform specific bug goes
8690 undiscovered due to this constellation.
8693 @node Files under CVS, autopoint Invocation, Distributed CVS, CVS Issues
8694 @subsection Files to put under CVS version control
8696 There are basically three ways to deal with generated files in the
8697 context of a CVS repository, such as @file{configure} generated from
8698 @file{configure.ac}, @code{@var{parser}.c} generated from
8699 @code{@var{parser}.y}, or @code{po/Makefile.in.in} autoinstalled by
8700 @code{gettextize} or @code{autopoint}.
8704 All generated files are always committed into the repository.
8707 All generated files are committed into the repository occasionally,
8708 for example each time a release is made.
8711 Generated files are never committed into the repository.
8714 Each of these three approaches has different advantages and drawbacks.
8718 The advantage is that anyone can check out the CVS at any moment and
8719 gets a working build. The drawbacks are: 1a. It requires some frequent
8720 "cvs commit" actions by the maintainers. 1b. The repository grows in size
8724 The advantage is that anyone can check out the CVS, and the usual
8725 "./configure; make" will work. The drawbacks are: 2a. The one who
8726 checks out the repository needs tools like GNU @code{automake},
8727 GNU @code{autoconf}, GNU @code{m4} installed in his PATH; sometimes
8728 he even needs particular versions of them. 2b. When a release is made
8729 and a commit is made on the generated files, the other developers get
8730 conflicts on the generated files after doing "cvs update". Although
8731 these conflicts are easy to resolve, they are annoying.
8734 The advantage is less work for the maintainers. The drawback is that
8735 anyone who checks out the CVS not only needs tools like GNU @code{automake},
8736 GNU @code{autoconf}, GNU @code{m4} installed in his PATH, but also that
8737 he needs to perform a package specific pre-build step before being able
8738 to "./configure; make".
8741 For the first and second approach, all files modified or brought in
8742 by the occasional @code{gettextize} invocation and update should be
8743 committed into the CVS.
8745 For the third approach, the maintainer can omit from the CVS repository
8746 all the files that @code{gettextize} mentions as "copy". Instead, he
8747 adds to the @file{configure.ac} or @file{configure.in} a line of the
8751 AM_GNU_GETTEXT_VERSION(@value{VERSION})
8755 and adds to the package's pre-build script an invocation of
8756 @samp{autopoint}. For everyone who checks out the CVS, this
8757 @code{autopoint} invocation will copy into the right place the
8758 @code{gettext} infrastructure files that have been omitted from the CVS.
8760 The version number used as argument to @code{AM_GNU_GETTEXT_VERSION} is
8761 the version of the @code{gettext} infrastructure that the package wants
8762 to use. It is also the minimum version number of the @samp{autopoint}
8763 program. So, if you write @code{AM_GNU_GETTEXT_VERSION(0.11.5)} then the
8764 developers can have any version >= 0.11.5 installed; the package will work
8765 with the 0.11.5 infrastructure in all developers' builds. When the
8766 maintainer then runs gettextize from, say, version 0.12.1 on the package,
8767 the occurrence of @code{AM_GNU_GETTEXT_VERSION(0.11.5)} will be changed
8768 into @code{AM_GNU_GETTEXT_VERSION(0.12.1)}, and all other developers that
8769 use the CVS will henceforth need to have GNU @code{gettext} 0.12.1 or newer
8772 @node autopoint Invocation, , Files under CVS, CVS Issues
8773 @subsection Invoking the @code{autopoint} Program
8775 @include autopoint.texi
8777 @node Release Management, , CVS Issues, Maintainers
8778 @section Creating a Distribution Tarball
8781 @cindex distribution tarball
8782 In projects that use GNU @code{automake}, the usual commands for creating
8783 a distribution tarball, @samp{make dist} or @samp{make distcheck},
8784 automatically update the PO files as needed.
8786 If GNU @code{automake} is not used, the maintainer needs to perform this
8787 update before making a release:
8791 $ (cd po; make update-po)
8795 @node Installers, Programming Languages, Maintainers, Top
8796 @chapter The Installer's and Distributor's View
8797 @cindex package installer's view of @code{gettext}
8798 @cindex package distributor's view of @code{gettext}
8799 @cindex package build and installation options
8800 @cindex setting up @code{gettext} at build time
8802 By default, packages fully using GNU @code{gettext}, internally,
8803 are installed in such a way that they to allow translation of
8804 messages. At @emph{configuration} time, those packages should
8805 automatically detect whether the underlying host system already provides
8806 the GNU @code{gettext} functions. If not,
8807 the GNU @code{gettext} library should be automatically prepared
8808 and used. Installers may use special options at configuration
8809 time for changing this behavior. The command @samp{./configure
8810 --with-included-gettext} bypasses system @code{gettext} to
8811 use the included GNU @code{gettext} instead,
8812 while @samp{./configure --disable-nls}
8813 produces programs totally unable to translate messages.
8815 @vindex LINGUAS@r{, environment variable}
8816 Internationalized packages have usually many @file{@var{ll}.po}
8818 translations are disabled, all those available are installed together
8819 with the package. However, the environment variable @code{LINGUAS}
8820 may be set, prior to configuration, to limit the installed set.
8821 @code{LINGUAS} should then contain a space separated list of two-letter
8822 codes, stating which languages are allowed.
8824 @node Programming Languages, Conclusion, Installers, Top
8825 @chapter Other Programming Languages
8827 While the presentation of @code{gettext} focuses mostly on C and
8828 implicitly applies to C++ as well, its scope is far broader than that:
8829 Many programming languages, scripting languages and other textual data
8830 like GUI resources or package descriptions can make use of the gettext
8834 * Language Implementors:: The Language Implementor's View
8835 * Programmers for other Languages:: The Programmer's View
8836 * Translators for other Languages:: The Translator's View
8837 * Maintainers for other Languages:: The Maintainer's View
8838 * List of Programming Languages:: Individual Programming Languages
8839 * List of Data Formats:: Internationalizable Data
8842 @node Language Implementors, Programmers for other Languages, Programming Languages, Programming Languages
8843 @section The Language Implementor's View
8844 @cindex programming languages
8845 @cindex scripting languages
8847 All programming and scripting languages that have the notion of strings
8848 are eligible to supporting @code{gettext}. Supporting @code{gettext}
8849 means the following:
8853 You should add to the language a syntax for translatable strings. In
8854 principle, a function call of @code{gettext} would do, but a shorthand
8855 syntax helps keeping the legibility of internationalized programs. For
8856 example, in C we use the syntax @code{_("string")}, and in GNU awk we use
8857 the shorthand @code{_"string"}.
8860 You should arrange that evaluation of such a translatable string at
8861 runtime calls the @code{gettext} function, or performs equivalent
8865 Similarly, you should make the functions @code{ngettext},
8866 @code{dcgettext}, @code{dcngettext} available from within the language.
8867 These functions are less often used, but are nevertheless necessary for
8868 particular purposes: @code{ngettext} for correct plural handling, and
8869 @code{dcgettext} and @code{dcngettext} for obeying other locale-related
8870 environment variables than @code{LC_MESSAGES}, such as @code{LC_TIME} or
8871 @code{LC_MONETARY}. For these latter functions, you need to make the
8872 @code{LC_*} constants, available in the C header @code{<locale.h>},
8873 referenceable from within the language, usually either as enumeration
8874 values or as strings.
8877 You should allow the programmer to designate a message domain, either by
8878 making the @code{textdomain} function available from within the
8879 language, or by introducing a magic variable called @code{TEXTDOMAIN}.
8880 Similarly, you should allow the programmer to designate where to search
8881 for message catalogs, by providing access to the @code{bindtextdomain}
8885 You should either perform a @code{setlocale (LC_ALL, "")} call during
8886 the startup of your language runtime, or allow the programmer to do so.
8887 Remember that gettext will act as a no-op if the @code{LC_MESSAGES} and
8888 @code{LC_CTYPE} locale categories are not both set.
8891 A programmer should have a way to extract translatable strings from a
8892 program into a PO file. The GNU @code{xgettext} program is being
8893 extended to support very different programming languages. Please
8894 contact the GNU @code{gettext} maintainers to help them doing this. If
8895 the string extractor is best integrated into your language's parser, GNU
8896 @code{xgettext} can function as a front end to your string extractor.
8899 The language's library should have a string formatting facility where
8900 the arguments of a format string are denoted by a positional number or a
8901 name. This is needed because for some languages and some messages with
8902 more than one substitutable argument, the translation will need to
8903 output the substituted arguments in different order. @xref{c-format Flag}.
8906 If the language has more than one implementation, and not all of the
8907 implementations use @code{gettext}, but the programs should be portable
8908 across implementations, you should provide a no-i18n emulation, that
8909 makes the other implementations accept programs written for yours,
8910 without actually translating the strings.
8913 To help the programmer in the task of marking translatable strings,
8914 which is sometimes performed using the Emacs PO mode (@pxref{Marking}),
8916 contact the GNU @code{gettext} maintainers, so they can add support for
8917 your language to @file{po-mode.el}.
8920 On the implementation side, three approaches are possible, with
8921 different effects on portability and copyright:
8925 You may integrate the GNU @code{gettext}'s @file{intl/} directory in
8926 your package, as described in @ref{Maintainers}. This allows you to
8927 have internationalization on all kinds of platforms. Note that when you
8928 then distribute your package, it legally falls under the GNU General
8929 Public License, and the GNU project will be glad about your contribution
8930 to the Free Software pool.
8933 You may link against GNU @code{gettext} functions if they are found in
8934 the C library. For example, an autoconf test for @code{gettext()} and
8935 @code{ngettext()} will detect this situation. For the moment, this test
8936 will succeed on GNU systems and not on other platforms. No severe
8937 copyright restrictions apply.
8940 You may emulate or reimplement the GNU @code{gettext} functionality.
8941 This has the advantage of full portability and no copyright
8942 restrictions, but also the drawback that you have to reimplement the GNU
8943 @code{gettext} features (such as the @code{LANGUAGE} environment
8944 variable, the locale aliases database, the automatic charset conversion,
8945 and plural handling).
8948 @node Programmers for other Languages, Translators for other Languages, Language Implementors, Programming Languages
8949 @section The Programmer's View
8951 For the programmer, the general procedure is the same as for the C
8952 language. The Emacs PO mode marking supports other languages, and the GNU
8953 @code{xgettext} string extractor recognizes other languages based on the
8954 file extension or a command-line option. In some languages,
8955 @code{setlocale} is not needed because it is already performed by the
8956 underlying language runtime.
8958 @node Translators for other Languages, Maintainers for other Languages, Programmers for other Languages, Programming Languages
8959 @section The Translator's View
8961 The translator works exactly as in the C language case. The only
8962 difference is that when translating format strings, she has to be aware
8963 of the language's particular syntax for positional arguments in format
8967 * c-format:: C Format Strings
8968 * objc-format:: Objective C Format Strings
8969 * sh-format:: Shell Format Strings
8970 * python-format:: Python Format Strings
8971 * lisp-format:: Lisp Format Strings
8972 * elisp-format:: Emacs Lisp Format Strings
8973 * librep-format:: librep Format Strings
8974 * scheme-format:: Scheme Format Strings
8975 * smalltalk-format:: Smalltalk Format Strings
8976 * java-format:: Java Format Strings
8977 * csharp-format:: C# Format Strings
8978 * awk-format:: awk Format Strings
8979 * object-pascal-format:: Object Pascal Format Strings
8980 * ycp-format:: YCP Format Strings
8981 * tcl-format:: Tcl Format Strings
8982 * perl-format:: Perl Format Strings
8983 * php-format:: PHP Format Strings
8984 * gcc-internal-format:: GCC internal Format Strings
8985 * gfc-internal-format:: GFC internal Format Strings
8986 * qt-format:: Qt Format Strings
8987 * qt-plural-format:: Qt Plural Format Strings
8988 * kde-format:: KDE Format Strings
8989 * boost-format:: Boost Format Strings
8990 * lua-format:: Lua Format Strings
8991 * javascript-format:: JavaScript Format Strings
8994 @node c-format, objc-format, Translators for other Languages, Translators for other Languages
8995 @subsection C Format Strings
8997 C format strings are described in POSIX (IEEE P1003.1 2001), section
8999 @uref{http://www.opengroup.org/onlinepubs/007904975/functions/fprintf.html}.
9000 See also the fprintf() manual page,
9001 @uref{http://www.linuxvalley.it/encyclopedia/ldp/manpage/man3/printf.3.php},
9002 @uref{http://informatik.fh-wuerzburg.de/student/i510/man/printf.html}.
9004 Although format strings with positions that reorder arguments, such as
9007 "Only %2$d bytes free on '%1$s'."
9011 which is semantically equivalent to
9014 "'%s' has only %d bytes free."
9018 are a POSIX/XSI feature and not specified by ISO C 99, translators can rely
9019 on this reordering ability: On the few platforms where @code{printf()},
9020 @code{fprintf()} etc. don't support this feature natively, @file{libintl.a}
9021 or @file{libintl.so} provides replacement functions, and GNU @code{<libintl.h>}
9022 activates these replacement functions automatically.
9025 @cindex Arabic digits
9026 As a special feature for Farsi (Persian) and maybe Arabic, translators can
9027 insert an @samp{I} flag into numeric format directives. For example, the
9028 translation of @code{"%d"} can be @code{"%Id"}. The effect of this flag,
9029 on systems with GNU @code{libc}, is that in the output, the ASCII digits are
9030 replaced with the @samp{outdigits} defined in the @code{LC_CTYPE} locale
9031 category. On other systems, the @code{gettext} function removes this flag,
9032 so that it has no effect.
9034 Note that the programmer should @emph{not} put this flag into the
9035 untranslated string. (Putting the @samp{I} format directive flag into an
9036 @var{msgid} string would lead to undefined behaviour on platforms without
9037 glibc when NLS is disabled.)
9039 @node objc-format, sh-format, c-format, Translators for other Languages
9040 @subsection Objective C Format Strings
9042 Objective C format strings are like C format strings. They support an
9043 additional format directive: "%@@", which when executed consumes an argument
9044 of type @code{Object *}.
9046 @node sh-format, python-format, objc-format, Translators for other Languages
9047 @subsection Shell Format Strings
9049 Shell format strings, as supported by GNU gettext and the @samp{envsubst}
9050 program, are strings with references to shell variables in the form
9051 @code{$@var{variable}} or @code{$@{@var{variable}@}}. References of the form
9052 @code{$@{@var{variable}-@var{default}@}},
9053 @code{$@{@var{variable}:-@var{default}@}},
9054 @code{$@{@var{variable}=@var{default}@}},
9055 @code{$@{@var{variable}:=@var{default}@}},
9056 @code{$@{@var{variable}+@var{replacement}@}},
9057 @code{$@{@var{variable}:+@var{replacement}@}},
9058 @code{$@{@var{variable}?@var{ignored}@}},
9059 @code{$@{@var{variable}:?@var{ignored}@}},
9060 that would be valid inside shell scripts, are not supported. The
9061 @var{variable} names must consist solely of alphanumeric or underscore
9062 ASCII characters, not start with a digit and be nonempty; otherwise such
9063 a variable reference is ignored.
9065 @node python-format, lisp-format, sh-format, Translators for other Languages
9066 @subsection Python Format Strings
9068 There are two kinds of format strings in Python: those acceptable to
9069 the Python built-in format operator @code{%}, labelled as
9070 @samp{python-format}, and those acceptable to the @code{format} method
9071 of the @samp{str} object.
9073 Python @code{%} format strings are described in
9074 @w{Python Library reference} /
9075 @w{2. Built-in Types, Exceptions and Functions} /
9076 @w{2.2. Built-in Types} /
9077 @w{2.2.6. Sequence Types} /
9078 @w{2.2.6.2. String Formatting Operations}.
9079 @uref{http://www.python.org/doc/2.2.1/lib/typesseq-strings.html}.
9081 Python brace format strings are described in @w{PEP 3101 -- Advanced
9082 String Formatting}, @uref{http://www.python.org/dev/peps/pep-3101/}.
9084 @node lisp-format, elisp-format, python-format, Translators for other Languages
9085 @subsection Lisp Format Strings
9087 Lisp format strings are described in the Common Lisp HyperSpec,
9088 chapter 22.3 @w{Formatted Output},
9089 @uref{http://www.lisp.org/HyperSpec/Body/sec_22-3.html}.
9091 @node elisp-format, librep-format, lisp-format, Translators for other Languages
9092 @subsection Emacs Lisp Format Strings
9094 Emacs Lisp format strings are documented in the Emacs Lisp reference,
9095 section @w{Formatting Strings},
9096 @uref{http://www.gnu.org/manual/elisp-manual-21-2.8/html_chapter/elisp_4.html#SEC75}.
9097 Note that as of version 21, XEmacs supports numbered argument specifications
9098 in format strings while FSF Emacs doesn't.
9100 @node librep-format, scheme-format, elisp-format, Translators for other Languages
9101 @subsection librep Format Strings
9103 librep format strings are documented in the librep manual, section
9104 @w{Formatted Output},
9105 @url{http://librep.sourceforge.net/librep-manual.html#Formatted%20Output},
9106 @url{http://www.gwinnup.org/research/docs/librep.html#SEC122}.
9108 @node scheme-format, smalltalk-format, librep-format, Translators for other Languages
9109 @subsection Scheme Format Strings
9111 Scheme format strings are documented in the SLIB manual, section
9112 @w{Format Specification}.
9114 @node smalltalk-format, java-format, scheme-format, Translators for other Languages
9115 @subsection Smalltalk Format Strings
9117 Smalltalk format strings are described in the GNU Smalltalk documentation,
9118 class @code{CharArray}, methods @samp{bindWith:} and
9119 @samp{bindWithArguments:}.
9120 @uref{http://www.gnu.org/software/smalltalk/gst-manual/gst_68.html#SEC238}.
9121 In summary, a directive starts with @samp{%} and is followed by @samp{%}
9122 or a nonzero digit (@samp{1} to @samp{9}).
9124 @node java-format, csharp-format, smalltalk-format, Translators for other Languages
9125 @subsection Java Format Strings
9127 Java format strings are described in the JDK documentation for class
9128 @code{java.text.MessageFormat},
9129 @uref{http://java.sun.com/j2se/1.4/docs/api/java/text/MessageFormat.html}.
9130 See also the ICU documentation
9131 @uref{http://oss.software.ibm.com/icu/apiref/classMessageFormat.html}.
9133 @node csharp-format, awk-format, java-format, Translators for other Languages
9134 @subsection C# Format Strings
9136 C# format strings are described in the .NET documentation for class
9137 @code{System.String} and in
9138 @uref{http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpConFormattingOverview.asp}.
9140 @node awk-format, object-pascal-format, csharp-format, Translators for other Languages
9141 @subsection awk Format Strings
9143 awk format strings are described in the gawk documentation, section
9145 @uref{http://www.gnu.org/manual/gawk/html_node/Printf.html#Printf}.
9147 @node object-pascal-format, ycp-format, awk-format, Translators for other Languages
9148 @subsection Object Pascal Format Strings
9150 Object Pascal format strings are described in the documentation of the
9151 Free Pascal runtime library, section Format,
9152 @uref{http://www.freepascal.org/docs-html/rtl/sysutils/format.html}.
9154 @node ycp-format, tcl-format, object-pascal-format, Translators for other Languages
9155 @subsection YCP Format Strings
9157 YCP sformat strings are described in the libycp documentation
9158 @uref{file:/usr/share/doc/packages/libycp/YCP-builtins.html}.
9159 In summary, a directive starts with @samp{%} and is followed by @samp{%}
9160 or a nonzero digit (@samp{1} to @samp{9}).
9162 @node tcl-format, perl-format, ycp-format, Translators for other Languages
9163 @subsection Tcl Format Strings
9165 Tcl format strings are described in the @file{format.n} manual page,
9166 @uref{http://www.scriptics.com/man/tcl8.3/TclCmd/format.htm}.
9168 @node perl-format, php-format, tcl-format, Translators for other Languages
9169 @subsection Perl Format Strings
9171 There are two kinds format strings in Perl: those acceptable to the
9172 Perl built-in function @code{printf}, labelled as @samp{perl-format},
9173 and those acceptable to the @code{libintl-perl} function @code{__x},
9174 labelled as @samp{perl-brace-format}.
9176 Perl @code{printf} format strings are described in the @code{sprintf}
9177 section of @samp{man perlfunc}.
9179 Perl brace format strings are described in the
9180 @file{Locale::TextDomain(3pm)} manual page of the CPAN package
9181 libintl-perl. In brief, Perl format uses placeholders put between
9182 braces (@samp{@{} and @samp{@}}). The placeholder must have the syntax
9183 of simple identifiers.
9185 @node php-format, gcc-internal-format, perl-format, Translators for other Languages
9186 @subsection PHP Format Strings
9188 PHP format strings are described in the documentation of the PHP function
9189 @code{sprintf}, in @file{phpdoc/manual/function.sprintf.html} or
9190 @uref{http://www.php.net/manual/en/function.sprintf.php}.
9192 @node gcc-internal-format, gfc-internal-format, php-format, Translators for other Languages
9193 @subsection GCC internal Format Strings
9195 These format strings are used inside the GCC sources. In such a format
9196 string, a directive starts with @samp{%}, is optionally followed by a
9197 size specifier @samp{l}, an optional flag @samp{+}, another optional flag
9198 @samp{#}, and is finished by a specifier: @samp{%} denotes a literal
9199 percent sign, @samp{c} denotes a character, @samp{s} denotes a string,
9200 @samp{i} and @samp{d} denote an integer, @samp{o}, @samp{u}, @samp{x}
9201 denote an unsigned integer, @samp{.*s} denotes a string preceded by a
9202 width specification, @samp{H} denotes a @samp{location_t *} pointer,
9203 @samp{D} denotes a general declaration, @samp{F} denotes a function
9204 declaration, @samp{T} denotes a type, @samp{A} denotes a function argument,
9205 @samp{C} denotes a tree code, @samp{E} denotes an expression, @samp{L}
9206 denotes a programming language, @samp{O} denotes a binary operator,
9207 @samp{P} denotes a function parameter, @samp{Q} denotes an assignment
9208 operator, @samp{V} denotes a const/volatile qualifier.
9210 @node gfc-internal-format, qt-format, gcc-internal-format, Translators for other Languages
9211 @subsection GFC internal Format Strings
9213 These format strings are used inside the GNU Fortran Compiler sources,
9214 that is, the Fortran frontend in the GCC sources. In such a format
9215 string, a directive starts with @samp{%} and is finished by a
9216 specifier: @samp{%} denotes a literal percent sign, @samp{C} denotes the
9217 current source location, @samp{L} denotes a source location, @samp{c}
9218 denotes a character, @samp{s} denotes a string, @samp{i} and @samp{d}
9219 denote an integer, @samp{u} denotes an unsigned integer. @samp{i},
9220 @samp{d}, and @samp{u} may be preceded by a size specifier @samp{l}.
9222 @node qt-format, qt-plural-format, gfc-internal-format, Translators for other Languages
9223 @subsection Qt Format Strings
9225 Qt format strings are described in the documentation of the QString class
9226 @uref{file:/usr/lib/qt-4.3.0/doc/html/qstring.html}.
9227 In summary, a directive consists of a @samp{%} followed by a digit. The same
9228 directive cannot occur more than once in a format string.
9230 @node qt-plural-format, kde-format, qt-format, Translators for other Languages
9231 @subsection Qt Format Strings
9233 Qt format strings are described in the documentation of the QObject::tr method
9234 @uref{file:/usr/lib/qt-4.3.0/doc/html/qobject.html}.
9235 In summary, the only allowed directive is @samp{%n}.
9237 @node kde-format, boost-format, qt-plural-format, Translators for other Languages
9238 @subsection KDE Format Strings
9240 KDE 4 format strings are defined as follows:
9241 A directive consists of a @samp{%} followed by a non-zero decimal number.
9242 If a @samp{%n} occurs in a format strings, all of @samp{%1}, ..., @samp{%(n-1)}
9243 must occur as well, except possibly one of them.
9245 @node boost-format, lua-format, kde-format, Translators for other Languages
9246 @subsection Boost Format Strings
9248 Boost format strings are described in the documentation of the
9249 @code{boost::format} class, at
9250 @uref{http://www.boost.org/libs/format/doc/format.html}.
9251 In summary, a directive has either the same syntax as in a C format string,
9252 such as @samp{%1$+5d}, or may be surrounded by vertical bars, such as
9253 @samp{%|1$+5d|} or @samp{%|1$+5|}, or consists of just an argument number
9254 between percent signs, such as @samp{%1%}.
9256 @node lua-format, javascript-format, boost-format, Translators for other Languages
9257 @subsection Lua Format Strings
9259 Lua format strings are described in the Lua reference manual, section @w{String Manipulation},
9260 @uref{http://www.lua.org/manual/5.1/manual.html#pdf-string.format}.
9262 @node javascript-format, , lua-format, Translators for other Languages
9263 @subsection JavaScript Format Strings
9265 Although JavaScript specification itself does not define any format
9266 strings, many JavaScript implementations provide printf-like
9267 functions. @code{xgettext} understands a set of common format strings
9268 used in popular JavaScript implementations including Gjs, Seed, and
9269 Node.JS. In such a format string, a directive starts with @samp{%}
9270 and is finished by a specifier: @samp{%} denotes a literal percent
9271 sign, @samp{c} denotes a character, @samp{s} denotes a string,
9272 @samp{b}, @samp{d}, @samp{o}, @samp{x}, @samp{X} denote an integer,
9273 @samp{f} denotes floating-point number, @samp{j} denotes a JSON
9277 @node Maintainers for other Languages, List of Programming Languages, Translators for other Languages, Programming Languages
9278 @section The Maintainer's View
9280 For the maintainer, the general procedure differs from the C language
9285 For those languages that don't use GNU gettext, the @file{intl/} directory
9286 is not needed and can be omitted. This means that the maintainer calls the
9287 @code{gettextize} program without the @samp{--intl} option, and that he
9288 invokes the @code{AM_GNU_GETTEXT} autoconf macro via
9289 @samp{AM_GNU_GETTEXT([external])}.
9292 If only a single programming language is used, the @code{XGETTEXT_OPTIONS}
9293 variable in @file{po/Makevars} (@pxref{po/Makevars}) should be adjusted to
9294 match the @code{xgettext} options for that particular programming language.
9295 If the package uses more than one programming language with @code{gettext}
9296 support, it becomes necessary to change the POT file construction rule
9297 in @file{po/Makefile.in.in}. It is recommended to make one @code{xgettext}
9298 invocation per programming language, each with the options appropriate for
9299 that language, and to combine the resulting files using @code{msgcat}.
9302 @node List of Programming Languages, List of Data Formats, Maintainers for other Languages, Programming Languages
9303 @section Individual Programming Languages
9305 @c Here is a list of programming languages, as used for Free Software projects
9306 @c on SourceForge/Freshmeat, as of February 2002. Those supported by gettext
9307 @c are marked with a star.
9343 @c Object Pascal 5 *
9355 @c Other Scripting Engines 49
9359 * C:: C, C++, Objective C
9360 * sh:: sh - Shell Script
9361 * bash:: bash - Bourne-Again Shell Script
9363 * Common Lisp:: GNU clisp - Common Lisp
9364 * clisp C:: GNU clisp C sources
9365 * Emacs Lisp:: Emacs Lisp
9367 * Scheme:: GNU guile - Scheme
9368 * Smalltalk:: GNU Smalltalk
9372 * Pascal:: Pascal - Free Pascal Compiler
9373 * wxWidgets:: wxWidgets library
9374 * YCP:: YCP - YaST2 scripting language
9375 * Tcl:: Tcl - Tk's scripting language
9377 * PHP:: PHP Hypertext Preprocessor
9379 * GCC-source:: GNU Compiler Collection sources
9381 * JavaScript:: JavaScript
9384 @node C, sh, List of Programming Languages, List of Programming Languages
9385 @subsection C, C++, Objective C
9386 @cindex C and C-like languages
9390 gcc, gpp, gobjc, glibc, gettext
9392 @item File extension
9393 For C: @code{c}, @code{h}.
9394 @*For C++: @code{C}, @code{c++}, @code{cc}, @code{cxx}, @code{cpp}, @code{hpp}.
9395 @*For Objective C: @code{m}.
9400 @item gettext shorthand
9403 @item gettext/ngettext functions
9404 @code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext},
9405 @code{dngettext}, @code{dcngettext}
9408 @code{textdomain} function
9410 @item bindtextdomain
9411 @code{bindtextdomain} function
9414 Programmer must call @code{setlocale (LC_ALL, "")}
9417 @code{#include <libintl.h>}
9418 @*@code{#include <locale.h>}
9419 @*@code{#define _(string) gettext (string)}
9421 @item Use or emulate GNU gettext
9427 @item Formatting with positions
9428 @code{fprintf "%2$d %1$d"}
9429 @*In C++: @code{autosprintf "%2$d %1$d"}
9430 (@pxref{Top, , Introduction, autosprintf, GNU autosprintf})
9433 autoconf (gettext.m4) and #if ENABLE_NLS
9435 @item po-mode marking
9439 The following examples are available in the @file{examples} directory:
9440 @code{hello-c}, @code{hello-c-gnome}, @code{hello-c++}, @code{hello-c++-qt},
9441 @code{hello-c++-kde}, @code{hello-c++-gnome}, @code{hello-c++-wxwidgets},
9442 @code{hello-objc}, @code{hello-objc-gnustep}, @code{hello-objc-gnome}.
9444 @node sh, bash, C, List of Programming Languages
9445 @subsection sh - Shell Script
9446 @cindex shell scripts
9452 @item File extension
9456 @code{"abc"}, @code{'abc'}, @code{abc}
9458 @item gettext shorthand
9459 @code{"`gettext \"abc\"`"}
9461 @item gettext/ngettext functions
9464 @code{gettext}, @code{ngettext} programs
9465 @*@code{eval_gettext}, @code{eval_ngettext} shell functions
9468 @vindex TEXTDOMAIN@r{, environment variable}
9469 environment variable @code{TEXTDOMAIN}
9471 @item bindtextdomain
9472 @vindex TEXTDOMAINDIR@r{, environment variable}
9473 environment variable @code{TEXTDOMAINDIR}
9481 @item Use or emulate GNU gettext
9487 @item Formatting with positions
9493 @item po-mode marking
9497 An example is available in the @file{examples} directory: @code{hello-sh}.
9500 * Preparing Shell Scripts:: Preparing Shell Scripts for Internationalization
9501 * gettext.sh:: Contents of @code{gettext.sh}
9502 * gettext Invocation:: Invoking the @code{gettext} program
9503 * ngettext Invocation:: Invoking the @code{ngettext} program
9504 * envsubst Invocation:: Invoking the @code{envsubst} program
9505 * eval_gettext Invocation:: Invoking the @code{eval_gettext} function
9506 * eval_ngettext Invocation:: Invoking the @code{eval_ngettext} function
9509 @node Preparing Shell Scripts, gettext.sh, sh, sh
9510 @subsubsection Preparing Shell Scripts for Internationalization
9511 @cindex preparing shell scripts for translation
9513 Preparing a shell script for internationalization is conceptually similar
9514 to the steps described in @ref{Sources}. The concrete steps for shell
9515 scripts are as follows.
9525 near the top of the script. @code{gettext.sh} is a shell function library
9526 that provides the functions
9527 @code{eval_gettext} (see @ref{eval_gettext Invocation}) and
9528 @code{eval_ngettext} (see @ref{eval_ngettext Invocation}).
9529 You have to ensure that @code{gettext.sh} can be found in the @code{PATH}.
9532 Set and export the @code{TEXTDOMAIN} and @code{TEXTDOMAINDIR} environment
9533 variables. Usually @code{TEXTDOMAIN} is the package or program name, and
9534 @code{TEXTDOMAINDIR} is the absolute pathname corresponding to
9535 @code{$prefix/share/locale}, where @code{$prefix} is the installation location.
9538 TEXTDOMAIN=@@PACKAGE@@
9540 TEXTDOMAINDIR=@@LOCALEDIR@@
9541 export TEXTDOMAINDIR
9545 Prepare the strings for translation, as described in @ref{Preparing Strings}.
9548 Simplify translatable strings so that they don't contain command substitution
9549 (@code{"`...`"} or @code{"$(...)"}), variable access with defaulting (like
9550 @code{$@{@var{variable}-@var{default}@}}), access to positional arguments
9551 (like @code{$0}, @code{$1}, ...) or highly volatile shell variables (like
9552 @code{$?}). This can always be done through simple local code restructuring.
9556 echo "Usage: $0 [OPTION] FILE..."
9563 echo "Usage: $program_name [OPTION] FILE..."
9569 echo "Remaining files: `ls | wc -l`"
9575 filecount="`ls | wc -l`"
9576 echo "Remaining files: $filecount"
9580 For each translatable string, change the output command @samp{echo} or
9581 @samp{$echo} to @samp{gettext} (if the string contains no references to
9582 shell variables) or to @samp{eval_gettext} (if it refers to shell variables),
9583 followed by a no-argument @samp{echo} command (to account for the terminating
9584 newline). Similarly, for cases with plural handling, replace a conditional
9585 @samp{echo} command with an invocation of @samp{ngettext} or
9586 @samp{eval_ngettext}, followed by a no-argument @samp{echo} command.
9588 When doing this, you also need to add an extra backslash before the dollar
9589 sign in references to shell variables, so that the @samp{eval_gettext}
9590 function receives the translatable string before the variable values are
9591 substituted into it. For example,
9594 echo "Remaining files: $filecount"
9600 eval_gettext "Remaining files: \$filecount"; echo
9603 If the output command is not @samp{echo}, you can make it use @samp{echo}
9604 nevertheless, through the use of backquotes. However, note that inside
9605 backquotes, backslashes must be doubled to be effective (because the
9606 backquoting eats one level of backslashes). For example, assuming that
9607 @samp{error} is a shell function that signals an error,
9610 error "file not found: $filename"
9613 is first transformed into
9616 error "`echo \"file not found: \$filename\"`"
9622 error "`eval_gettext \"file not found: \\\$filename\"`"
9626 @node gettext.sh, gettext Invocation, Preparing Shell Scripts, sh
9627 @subsubsection Contents of @code{gettext.sh}
9629 @code{gettext.sh}, contained in the run-time package of GNU gettext, provides
9634 The variable @code{echo} is set to a command that outputs its first argument
9635 and a newline, without interpreting backslashes in the argument string.
9638 See @ref{eval_gettext Invocation}.
9641 See @ref{eval_ngettext Invocation}.
9644 @node gettext Invocation, ngettext Invocation, gettext.sh, sh
9645 @subsubsection Invoking the @code{gettext} program
9647 @include rt-gettext.texi
9649 Note: @code{xgettext} supports only the one-argument form of the
9650 @code{gettext} invocation, where no options are present and the
9651 @var{textdomain} is implicit, from the environment.
9653 @node ngettext Invocation, envsubst Invocation, gettext Invocation, sh
9654 @subsubsection Invoking the @code{ngettext} program
9656 @include rt-ngettext.texi
9658 Note: @code{xgettext} supports only the three-arguments form of the
9659 @code{ngettext} invocation, where no options are present and the
9660 @var{textdomain} is implicit, from the environment.
9662 @node envsubst Invocation, eval_gettext Invocation, ngettext Invocation, sh
9663 @subsubsection Invoking the @code{envsubst} program
9665 @include rt-envsubst.texi
9667 @node eval_gettext Invocation, eval_ngettext Invocation, envsubst Invocation, sh
9668 @subsubsection Invoking the @code{eval_gettext} function
9670 @cindex @code{eval_gettext} function, usage
9672 eval_gettext @var{msgid}
9675 @cindex lookup message translation
9676 This function outputs the native language translation of a textual message,
9677 performing dollar-substitution on the result. Note that only shell variables
9678 mentioned in @var{msgid} will be dollar-substituted in the result.
9680 @node eval_ngettext Invocation, , eval_gettext Invocation, sh
9681 @subsubsection Invoking the @code{eval_ngettext} function
9683 @cindex @code{eval_ngettext} function, usage
9685 eval_ngettext @var{msgid} @var{msgid-plural} @var{count}
9688 @cindex lookup plural message translation
9689 This function outputs the native language translation of a textual message
9690 whose grammatical form depends on a number, performing dollar-substitution
9691 on the result. Note that only shell variables mentioned in @var{msgid} or
9692 @var{msgid-plural} will be dollar-substituted in the result.
9694 @node bash, Python, sh, List of Programming Languages
9695 @subsection bash - Bourne-Again Shell Script
9698 GNU @code{bash} 2.0 or newer has a special shorthand for translating a
9699 string and substituting variable values in it: @code{$"msgid"}. But
9700 the use of this construct is @strong{discouraged}, due to the security
9701 holes it opens and due to its portability problems.
9703 The security holes of @code{$"..."} come from the fact that after looking up
9704 the translation of the string, @code{bash} processes it like it processes
9705 any double-quoted string: dollar and backquote processing, like @samp{eval}
9710 In a locale whose encoding is one of BIG5, BIG5-HKSCS, GBK, GB18030, SHIFT_JIS,
9711 JOHAB, some double-byte characters have a second byte whose value is
9712 @code{0x60}. For example, the byte sequence @code{\xe0\x60} is a single
9713 character in these locales. Many versions of @code{bash} (all versions
9714 up to bash-2.05, and newer versions on platforms without @code{mbsrtowcs()}
9715 function) don't know about character boundaries and see a backquote character
9716 where there is only a particular Chinese character. Thus it can start
9717 executing part of the translation as a command list. This situation can occur
9718 even without the translator being aware of it: if the translator provides
9719 translations in the UTF-8 encoding, it is the @code{gettext()} function which
9720 will, during its conversion from the translator's encoding to the user's
9721 locale's encoding, produce the dangerous @code{\x60} bytes.
9724 A translator could - voluntarily or inadvertently - use backquotes
9725 @code{"`...`"} or dollar-parentheses @code{"$(...)"} in her translations.
9726 The enclosed strings would be executed as command lists by the shell.
9729 The portability problem is that @code{bash} must be built with
9730 internationalization support; this is normally not the case on systems
9731 that don't have the @code{gettext()} function in libc.
9733 @node Python, Common Lisp, bash, List of Programming Languages
9741 @item File extension
9745 @code{'abc'}, @code{u'abc'}, @code{r'abc'}, @code{ur'abc'},
9746 @*@code{"abc"}, @code{u"abc"}, @code{r"abc"}, @code{ur"abc"},
9747 @*@code{'''abc'''}, @code{u'''abc'''}, @code{r'''abc'''}, @code{ur'''abc'''},
9748 @*@code{"""abc"""}, @code{u"""abc"""}, @code{r"""abc"""}, @code{ur"""abc"""}
9750 @item gettext shorthand
9751 @code{_('abc')} etc.
9753 @item gettext/ngettext functions
9754 @code{gettext.gettext}, @code{gettext.dgettext},
9755 @code{gettext.ngettext}, @code{gettext.dngettext},
9756 also @code{ugettext}, @code{ungettext}
9759 @code{gettext.textdomain} function, or
9760 @code{gettext.install(@var{domain})} function
9762 @item bindtextdomain
9763 @code{gettext.bindtextdomain} function, or
9764 @code{gettext.install(@var{domain},@var{localedir})} function
9767 not used by the gettext emulation
9770 @code{import gettext}
9772 @item Use or emulate GNU gettext
9778 @item Formatting with positions
9779 @code{'...%(ident)d...' % @{ 'ident': value @}}
9784 @item po-mode marking
9788 An example is available in the @file{examples} directory: @code{hello-python}.
9790 A note about format strings: Python supports format strings with unnamed
9791 arguments, such as @code{'...%d...'}, and format strings with named arguments,
9792 such as @code{'...%(ident)d...'}. The latter are preferable for
9793 internationalized programs, for two reasons:
9797 When a format string takes more than one argument, the translator can provide
9798 a translation that uses the arguments in a different order, if the format
9799 string uses named arguments. For example, the translator can reformulate
9801 "'%(volume)s' has only %(freespace)d bytes free."
9806 "Only %(freespace)d bytes free on '%(volume)s'."
9809 Additionally, the identifiers also provide some context to the translator.
9812 In the context of plural forms, the format string used for the singular form
9813 does not use the numeric argument in many languages. Even in English, one
9814 prefers to write @code{"one hour"} instead of @code{"1 hour"}. Omitting
9815 individual arguments from format strings like this is only possible with
9816 the named argument syntax. (With unnamed arguments, Python -- unlike C --
9817 verifies that the format string uses all supplied arguments.)
9820 @node Common Lisp, clisp C, Python, List of Programming Languages
9821 @subsection GNU clisp - Common Lisp
9830 @item File extension
9836 @item gettext shorthand
9837 @code{(_ "abc")}, @code{(ENGLISH "abc")}
9839 @item gettext/ngettext functions
9840 @code{i18n:gettext}, @code{i18n:ngettext}
9843 @code{i18n:textdomain}
9845 @item bindtextdomain
9846 @code{i18n:textdomaindir}
9854 @item Use or emulate GNU gettext
9858 @code{xgettext -k_ -kENGLISH}
9860 @item Formatting with positions
9861 @code{format "~1@@*~D ~0@@*~D"}
9864 On platforms without gettext, no translation.
9866 @item po-mode marking
9870 An example is available in the @file{examples} directory: @code{hello-clisp}.
9872 @node clisp C, Emacs Lisp, Common Lisp, List of Programming Languages
9873 @subsection GNU clisp C sources
9874 @cindex clisp C sources
9880 @item File extension
9886 @item gettext shorthand
9887 @code{ENGLISH ? "abc" : ""}
9888 @*@code{GETTEXT("abc")}
9889 @*@code{GETTEXTL("abc")}
9891 @item gettext/ngettext functions
9892 @code{clgettext}, @code{clgettextl}
9897 @item bindtextdomain
9904 @code{#include "lispbibl.c"}
9906 @item Use or emulate GNU gettext
9910 @code{clisp-xgettext}
9912 @item Formatting with positions
9913 @code{fprintf "%2$d %1$d"}
9916 On platforms without gettext, no translation.
9918 @item po-mode marking
9922 @node Emacs Lisp, librep, clisp C, List of Programming Languages
9923 @subsection Emacs Lisp
9930 @item File extension
9936 @item gettext shorthand
9939 @item gettext/ngettext functions
9940 @code{gettext}, @code{dgettext} (xemacs only)
9943 @code{domain} special form (xemacs only)
9945 @item bindtextdomain
9946 @code{bind-text-domain} function (xemacs only)
9954 @item Use or emulate GNU gettext
9960 @item Formatting with positions
9961 @code{format "%2$d %1$d"}
9964 Only XEmacs. Without @code{I18N3} defined at build time, no translation.
9966 @item po-mode marking
9970 @node librep, Scheme, Emacs Lisp, List of Programming Languages
9972 @cindex @code{librep} Lisp
9976 librep 0.15.3 or newer
9978 @item File extension
9984 @item gettext shorthand
9987 @item gettext/ngettext functions
9991 @code{textdomain} function
9993 @item bindtextdomain
9994 @code{bindtextdomain} function
10000 @code{(require 'rep.i18n.gettext)}
10002 @item Use or emulate GNU gettext
10008 @item Formatting with positions
10009 @code{format "%2$d %1$d"}
10012 On platforms without gettext, no translation.
10014 @item po-mode marking
10018 An example is available in the @file{examples} directory: @code{hello-librep}.
10020 @node Scheme, Smalltalk, librep, List of Programming Languages
10021 @subsection GNU guile - Scheme
10029 @item File extension
10032 @item String syntax
10035 @item gettext shorthand
10038 @item gettext/ngettext functions
10039 @code{gettext}, @code{ngettext}
10044 @item bindtextdomain
10045 @code{bindtextdomain}
10048 @code{(catch #t (lambda () (setlocale LC_ALL "")) (lambda args #f))}
10051 @code{(use-modules (ice-9 format))}
10053 @item Use or emulate GNU gettext
10057 @code{xgettext -k_}
10059 @item Formatting with positions
10060 @c @code{format "~1@@*~D ~0@@*~D~2@@*"}, requires @code{(use-modules (ice-9 format))}
10061 @c not yet supported
10065 On platforms without gettext, no translation.
10067 @item po-mode marking
10071 An example is available in the @file{examples} directory: @code{hello-guile}.
10073 @node Smalltalk, Java, Scheme, List of Programming Languages
10074 @subsection GNU Smalltalk
10081 @item File extension
10084 @item String syntax
10087 @item gettext shorthand
10090 @item gettext/ngettext functions
10091 @code{LcMessagesDomain>>#at:}, @code{LcMessagesDomain>>#at:plural:with:}
10094 @code{LcMessages>>#domain:localeDirectory:} (returns a @code{LcMessagesDomain}
10096 Example: @code{I18N Locale default messages domain: 'gettext' localeDirectory: /usr/local/share/locale'}
10098 @item bindtextdomain
10099 @code{LcMessages>>#domain:localeDirectory:}, see above.
10102 Automatic if you use @code{I18N Locale default}.
10105 @code{PackageLoader fileInPackage: 'I18N'!}
10107 @item Use or emulate GNU gettext
10113 @item Formatting with positions
10114 @code{'%1 %2' bindWith: 'Hello' with: 'world'}
10119 @item po-mode marking
10123 An example is available in the @file{examples} directory:
10124 @code{hello-smalltalk}.
10126 @node Java, C#, Smalltalk, List of Programming Languages
10134 @item File extension
10137 @item String syntax
10140 @item gettext shorthand
10143 @item gettext/ngettext functions
10144 @code{GettextResource.gettext}, @code{GettextResource.ngettext},
10145 @code{GettextResource.pgettext}, @code{GettextResource.npgettext}
10148 ---, use @code{ResourceBundle.getResource} instead
10150 @item bindtextdomain
10151 ---, use CLASSPATH instead
10159 @item Use or emulate GNU gettext
10160 ---, uses a Java specific message catalog format
10163 @code{xgettext -k_}
10165 @item Formatting with positions
10166 @code{MessageFormat.format "@{1,number@} @{0,number@}"}
10171 @item po-mode marking
10175 Before marking strings as internationalizable, uses of the string
10176 concatenation operator need to be converted to @code{MessageFormat}
10177 applications. For example, @code{"file "+filename+" not found"} becomes
10178 @code{MessageFormat.format("file @{0@} not found", new Object[] @{ filename @})}.
10179 Only after this is done, can the strings be marked and extracted.
10181 GNU gettext uses the native Java internationalization mechanism, namely
10182 @code{ResourceBundle}s. There are two formats of @code{ResourceBundle}s:
10183 @code{.properties} files and @code{.class} files. The @code{.properties}
10184 format is a text file which the translators can directly edit, like PO
10185 files, but which doesn't support plural forms. Whereas the @code{.class}
10186 format is compiled from @code{.java} source code and can support plural
10187 forms (provided it is accessed through an appropriate API, see below).
10189 To convert a PO file to a @code{.properties} file, the @code{msgcat}
10190 program can be used with the option @code{--properties-output}. To convert
10191 a @code{.properties} file back to a PO file, the @code{msgcat} program
10192 can be used with the option @code{--properties-input}. All the tools
10193 that manipulate PO files can work with @code{.properties} files as well,
10194 if given the @code{--properties-input} and/or @code{--properties-output}
10197 To convert a PO file to a ResourceBundle class, the @code{msgfmt} program
10198 can be used with the option @code{--java} or @code{--java2}. To convert a
10199 ResourceBundle back to a PO file, the @code{msgunfmt} program can be used
10200 with the option @code{--java}.
10202 Two different programmatic APIs can be used to access ResourceBundles.
10203 Note that both APIs work with all kinds of ResourceBundles, whether
10204 GNU gettext generated classes, or other @code{.class} or @code{.properties}
10209 The @code{java.util.ResourceBundle} API.
10211 In particular, its @code{getString} function returns a string translation.
10212 Note that a missing translation yields a @code{MissingResourceException}.
10214 This has the advantage of being the standard API. And it does not require
10215 any additional libraries, only the @code{msgcat} generated @code{.properties}
10216 files or the @code{msgfmt} generated @code{.class} files. But it cannot do
10217 plural handling, even if the resource was generated by @code{msgfmt} from
10218 a PO file with plural handling.
10221 The @code{gnu.gettext.GettextResource} API.
10223 Reference documentation in Javadoc 1.1 style format is in the
10224 @uref{javadoc2/index.html,javadoc2 directory}.
10226 Its @code{gettext} function returns a string translation. Note that when
10227 a translation is missing, the @var{msgid} argument is returned unchanged.
10229 This has the advantage of having the @code{ngettext} function for plural
10230 handling and the @code{pgettext} and @code{npgettext} for strings constraint
10231 to a particular context.
10233 @cindex @code{libintl} for Java
10234 To use this API, one needs the @code{libintl.jar} file which is part of
10235 the GNU gettext package and distributed under the LGPL.
10238 Four examples, using the second API, are available in the @file{examples}
10239 directory: @code{hello-java}, @code{hello-java-awt}, @code{hello-java-swing},
10240 @code{hello-java-qtjambi}.
10242 Now, to make use of the API and define a shorthand for @samp{getString},
10243 there are three idioms that you can choose from:
10247 (This one assumes Java 1.5 or newer.)
10248 In a unique class of your project, say @samp{Util}, define a static variable
10249 holding the @code{ResourceBundle} instance and the shorthand:
10252 private static ResourceBundle myResources =
10253 ResourceBundle.getBundle("domain-name");
10254 public static String _(String s) @{
10255 return myResources.getString(s);
10259 All classes containing internationalized strings then contain
10262 import static Util._;
10266 and the shorthand is used like this:
10269 System.out.println(_("Operation completed."));
10273 In a unique class of your project, say @samp{Util}, define a static variable
10274 holding the @code{ResourceBundle} instance:
10277 public static ResourceBundle myResources =
10278 ResourceBundle.getBundle("domain-name");
10281 All classes containing internationalized strings then contain
10284 private static ResourceBundle res = Util.myResources;
10285 private static String _(String s) @{ return res.getString(s); @}
10289 and the shorthand is used like this:
10292 System.out.println(_("Operation completed."));
10296 You add a class with a very short name, say @samp{S}, containing just the
10297 definition of the resource bundle and of the shorthand:
10301 public static ResourceBundle myResources =
10302 ResourceBundle.getBundle("domain-name");
10303 public static String _(String s) @{
10304 return myResources.getString(s);
10310 and the shorthand is used like this:
10313 System.out.println(S._("Operation completed."));
10317 Which of the three idioms you choose, will depend on whether your project
10318 requires portability to Java versions prior to Java 1.5 and, if so, whether
10319 copying two lines of codes into every class is more acceptable in your project
10320 than a class with a single-letter name.
10322 @node C#, gawk, Java, List of Programming Languages
10328 pnet, pnetlib 0.6.2 or newer, or mono 0.29 or newer
10330 @item File extension
10333 @item String syntax
10334 @code{"abc"}, @code{@@"abc"}
10336 @item gettext shorthand
10339 @item gettext/ngettext functions
10340 @code{GettextResourceManager.GetString},
10341 @code{GettextResourceManager.GetPluralString}
10342 @code{GettextResourceManager.GetParticularString}
10343 @code{GettextResourceManager.GetParticularPluralString}
10346 @code{new GettextResourceManager(domain)}
10348 @item bindtextdomain
10349 ---, compiled message catalogs are located in subdirectories of the directory
10350 containing the executable
10358 @item Use or emulate GNU gettext
10359 ---, uses a C# specific message catalog format
10362 @code{xgettext -k_}
10364 @item Formatting with positions
10365 @code{String.Format "@{1@} @{0@}"}
10370 @item po-mode marking
10374 Before marking strings as internationalizable, uses of the string
10375 concatenation operator need to be converted to @code{String.Format}
10376 invocations. For example, @code{"file "+filename+" not found"} becomes
10377 @code{String.Format("file @{0@} not found", filename)}.
10378 Only after this is done, can the strings be marked and extracted.
10380 GNU gettext uses the native C#/.NET internationalization mechanism, namely
10381 the classes @code{ResourceManager} and @code{ResourceSet}. Applications
10382 use the @code{ResourceManager} methods to retrieve the native language
10383 translation of strings. An instance of @code{ResourceSet} is the in-memory
10384 representation of a message catalog file. The @code{ResourceManager} loads
10385 and accesses @code{ResourceSet} instances as needed to look up the
10388 There are two formats of @code{ResourceSet}s that can be directly loaded by
10389 the C# runtime: @code{.resources} files and @code{.dll} files.
10393 The @code{.resources} format is a binary file usually generated through the
10394 @code{resgen} or @code{monoresgen} utility, but which doesn't support plural
10395 forms. @code{.resources} files can also be embedded in .NET @code{.exe} files.
10396 This only affects whether a file system access is performed to load the message
10397 catalog; it doesn't affect the contents of the message catalog.
10400 On the other hand, the @code{.dll} format is a binary file that is compiled
10401 from @code{.cs} source code and can support plural forms (provided it is
10402 accessed through the GNU gettext API, see below).
10405 Note that these .NET @code{.dll} and @code{.exe} files are not tied to a
10406 particular platform; their file format and GNU gettext for C# can be used
10409 To convert a PO file to a @code{.resources} file, the @code{msgfmt} program
10410 can be used with the option @samp{--csharp-resources}. To convert a
10411 @code{.resources} file back to a PO file, the @code{msgunfmt} program can be
10412 used with the option @samp{--csharp-resources}. You can also, in some cases,
10413 use the @code{resgen} program (from the @code{pnet} package) or the
10414 @code{monoresgen} program (from the @code{mono}/@code{mcs} package). These
10415 programs can also convert a @code{.resources} file back to a PO file. But
10416 beware: as of this writing (January 2004), the @code{monoresgen} converter is
10417 quite buggy and the @code{resgen} converter ignores the encoding of the PO
10420 To convert a PO file to a @code{.dll} file, the @code{msgfmt} program can be
10421 used with the option @code{--csharp}. The result will be a @code{.dll} file
10422 containing a subclass of @code{GettextResourceSet}, which itself is a subclass
10423 of @code{ResourceSet}. To convert a @code{.dll} file containing a
10424 @code{GettextResourceSet} subclass back to a PO file, the @code{msgunfmt}
10425 program can be used with the option @code{--csharp}.
10427 The advantages of the @code{.dll} format over the @code{.resources} format
10432 Freedom to localize: Users can add their own translations to an application
10433 after it has been built and distributed. Whereas when the programmer uses
10434 a @code{ResourceManager} constructor provided by the system, the set of
10435 @code{.resources} files for an application must be specified when the
10436 application is built and cannot be extended afterwards.
10437 @c If this were the only issue with the @code{.resources} format, one could
10438 @c use the @code{ResourceManager.CreateFileBasedResourceManager} function.
10441 Plural handling: A message catalog in @code{.dll} format supports the plural
10442 handling function @code{GetPluralString}. Whereas @code{.resources} files can
10443 only contain data and only support lookups that depend on a single string.
10446 Context handling: A message catalog in @code{.dll} format supports the
10447 query-with-context functions @code{GetParticularString} and
10448 @code{GetParticularPluralString}. Whereas @code{.resources} files can
10449 only contain data and only support lookups that depend on a single string.
10452 The @code{GettextResourceManager} that loads the message catalogs in
10453 @code{.dll} format also provides for inheritance on a per-message basis.
10454 For example, in Austrian (@code{de_AT}) locale, translations from the German
10455 (@code{de}) message catalog will be used for messages not found in the
10456 Austrian message catalog. This has the consequence that the Austrian
10457 translators need only translate those few messages for which the translation
10458 into Austrian differs from the German one. Whereas when working with
10459 @code{.resources} files, each message catalog must provide the translations
10460 of all messages by itself.
10463 The @code{GettextResourceManager} that loads the message catalogs in
10464 @code{.dll} format also provides for a fallback: The English @var{msgid} is
10465 returned when no translation can be found. Whereas when working with
10466 @code{.resources} files, a language-neutral @code{.resources} file must
10467 explicitly be provided as a fallback.
10470 On the side of the programmatic APIs, the programmer can use either the
10471 standard @code{ResourceManager} API and the GNU @code{GettextResourceManager}
10472 API. The latter is an extension of the former, because
10473 @code{GettextResourceManager} is a subclass of @code{ResourceManager}.
10477 The @code{System.Resources.ResourceManager} API.
10479 This API works with resources in @code{.resources} format.
10481 The creation of the @code{ResourceManager} is done through
10483 new ResourceManager(domainname, Assembly.GetExecutingAssembly())
10487 The @code{GetString} function returns a string's translation. Note that this
10488 function returns null when a translation is missing (i.e.@: not even found in
10489 the fallback resource file).
10492 The @code{GNU.Gettext.GettextResourceManager} API.
10494 This API works with resources in @code{.dll} format.
10496 Reference documentation is in the
10497 @uref{csharpdoc/index.html,csharpdoc directory}.
10499 The creation of the @code{ResourceManager} is done through
10501 new GettextResourceManager(domainname)
10504 The @code{GetString} function returns a string's translation. Note that when
10505 a translation is missing, the @var{msgid} argument is returned unchanged.
10507 The @code{GetPluralString} function returns a string translation with plural
10508 handling, like the @code{ngettext} function in C.
10510 The @code{GetParticularString} function returns a string's translation,
10511 specific to a particular context, like the @code{pgettext} function in C.
10512 Note that when a translation is missing, the @var{msgid} argument is returned
10515 The @code{GetParticularPluralString} function returns a string translation,
10516 specific to a particular context, with plural handling, like the
10517 @code{npgettext} function in C.
10519 @cindex @code{libintl} for C#
10520 To use this API, one needs the @code{GNU.Gettext.dll} file which is part of
10521 the GNU gettext package and distributed under the LGPL.
10524 You can also mix both approaches: use the
10525 @code{GNU.Gettext.GettextResourceManager} constructor, but otherwise use
10526 only the @code{ResourceManager} type and only the @code{GetString} method.
10527 This is appropriate when you want to profit from the tools for PO files,
10528 but don't want to change an existing source code that uses
10529 @code{ResourceManager} and don't (yet) need the @code{GetPluralString} method.
10531 Two examples, using the second API, are available in the @file{examples}
10532 directory: @code{hello-csharp}, @code{hello-csharp-forms}.
10534 Now, to make use of the API and define a shorthand for @samp{GetString},
10535 there are two idioms that you can choose from:
10539 In a unique class of your project, say @samp{Util}, define a static variable
10540 holding the @code{ResourceManager} instance:
10543 public static GettextResourceManager MyResourceManager =
10544 new GettextResourceManager("domain-name");
10547 All classes containing internationalized strings then contain
10550 private static GettextResourceManager Res = Util.MyResourceManager;
10551 private static String _(String s) @{ return Res.GetString(s); @}
10555 and the shorthand is used like this:
10558 Console.WriteLine(_("Operation completed."));
10562 You add a class with a very short name, say @samp{S}, containing just the
10563 definition of the resource manager and of the shorthand:
10567 public static GettextResourceManager MyResourceManager =
10568 new GettextResourceManager("domain-name");
10569 public static String _(String s) @{
10570 return MyResourceManager.GetString(s);
10576 and the shorthand is used like this:
10579 Console.WriteLine(S._("Operation completed."));
10583 Which of the two idioms you choose, will depend on whether copying two lines
10584 of codes into every class is more acceptable in your project than a class
10585 with a single-letter name.
10587 @node gawk, Pascal, C#, List of Programming Languages
10588 @subsection GNU awk
10596 @item File extension
10599 @item String syntax
10602 @item gettext shorthand
10605 @item gettext/ngettext functions
10606 @code{dcgettext}, missing @code{dcngettext} in gawk-3.1.0
10609 @code{TEXTDOMAIN} variable
10611 @item bindtextdomain
10612 @code{bindtextdomain} function
10615 automatic, but missing @code{setlocale (LC_MESSAGES, "")} in gawk-3.1.0
10620 @item Use or emulate GNU gettext
10626 @item Formatting with positions
10627 @code{printf "%2$d %1$d"} (GNU awk only)
10630 On platforms without gettext, no translation. On non-GNU awks, you must
10631 define @code{dcgettext}, @code{dcngettext} and @code{bindtextdomain}
10634 @item po-mode marking
10638 An example is available in the @file{examples} directory: @code{hello-gawk}.
10640 @node Pascal, wxWidgets, gawk, List of Programming Languages
10641 @subsection Pascal - Free Pascal Compiler
10643 @cindex Free Pascal
10644 @cindex Object Pascal
10650 @item File extension
10651 @code{pp}, @code{pas}
10653 @item String syntax
10656 @item gettext shorthand
10659 @item gettext/ngettext functions
10660 ---, use @code{ResourceString} data type instead
10663 ---, use @code{TranslateResourceStrings} function instead
10665 @item bindtextdomain
10666 ---, use @code{TranslateResourceStrings} function instead
10669 automatic, but uses only LANG, not LC_MESSAGES or LC_ALL
10672 @code{@{$mode delphi@}} or @code{@{$mode objfpc@}}@*@code{uses gettext;}
10674 @item Use or emulate GNU gettext
10678 @code{ppc386} followed by @code{xgettext} or @code{rstconv}
10680 @item Formatting with positions
10681 @code{uses sysutils;}@*@code{format "%1:d %0:d"}
10686 @item po-mode marking
10690 The Pascal compiler has special support for the @code{ResourceString} data
10691 type. It generates a @code{.rst} file. This is then converted to a
10692 @code{.pot} file by use of @code{xgettext} or @code{rstconv}. At runtime,
10693 a @code{.mo} file corresponding to translations of this @code{.pot} file
10694 can be loaded using the @code{TranslateResourceStrings} function in the
10695 @code{gettext} unit.
10697 An example is available in the @file{examples} directory: @code{hello-pascal}.
10699 @node wxWidgets, YCP, Pascal, List of Programming Languages
10700 @subsection wxWidgets library
10701 @cindex @code{wxWidgets} library
10707 @item File extension
10710 @item String syntax
10713 @item gettext shorthand
10716 @item gettext/ngettext functions
10717 @code{wxLocale::GetString}, @code{wxGetTranslation}
10720 @code{wxLocale::AddCatalog}
10722 @item bindtextdomain
10723 @code{wxLocale::AddCatalogLookupPathPrefix}
10726 @code{wxLocale::Init}, @code{wxSetLocale}
10729 @code{#include <wx/intl.h>}
10731 @item Use or emulate GNU gettext
10732 emulate, see @code{include/wx/intl.h} and @code{src/common/intl.cpp}
10737 @item Formatting with positions
10738 wxString::Format supports positions if and only if the system has
10739 @code{wprintf()}, @code{vswprintf()} functions and they support positions
10740 according to POSIX.
10745 @item po-mode marking
10749 @node YCP, Tcl, wxWidgets, List of Programming Languages
10750 @subsection YCP - YaST2 scripting language
10752 @cindex YaST2 scripting language
10756 libycp, libycp-devel, yast2-core, yast2-core-devel
10758 @item File extension
10761 @item String syntax
10764 @item gettext shorthand
10767 @item gettext/ngettext functions
10768 @code{_()} with 1 or 3 arguments
10771 @code{textdomain} statement
10773 @item bindtextdomain
10782 @item Use or emulate GNU gettext
10788 @item Formatting with positions
10789 @code{sformat "%2 %1"}
10794 @item po-mode marking
10798 An example is available in the @file{examples} directory: @code{hello-ycp}.
10800 @node Tcl, Perl, YCP, List of Programming Languages
10801 @subsection Tcl - Tk's scripting language
10803 @cindex Tk's scripting language
10809 @item File extension
10812 @item String syntax
10815 @item gettext shorthand
10818 @item gettext/ngettext functions
10819 @code{::msgcat::mc}
10824 @item bindtextdomain
10825 ---, use @code{::msgcat::mcload} instead
10828 automatic, uses LANG, but ignores LC_MESSAGES and LC_ALL
10831 @code{package require msgcat}
10832 @*@code{proc _ @{s@} @{return [::msgcat::mc $s]@}}
10834 @item Use or emulate GNU gettext
10835 ---, uses a Tcl specific message catalog format
10838 @code{xgettext -k_}
10840 @item Formatting with positions
10841 @code{format "%2\$d %1\$d"}
10846 @item po-mode marking
10850 Two examples are available in the @file{examples} directory:
10851 @code{hello-tcl}, @code{hello-tcl-tk}.
10853 Before marking strings as internationalizable, substitutions of variables
10854 into the string need to be converted to @code{format} applications. For
10855 example, @code{"file $filename not found"} becomes
10856 @code{[format "file %s not found" $filename]}.
10857 Only after this is done, can the strings be marked and extracted.
10858 After marking, this example becomes
10859 @code{[format [_ "file %s not found"] $filename]} or
10860 @code{[msgcat::mc "file %s not found" $filename]}. Note that the
10861 @code{msgcat::mc} function implicitly calls @code{format} when more than one
10864 @node Perl, PHP, Tcl, List of Programming Languages
10872 @item File extension
10873 @code{pl}, @code{PL}, @code{pm}, @code{perl}, @code{cgi}
10875 @item String syntax
10882 @item @code{qq (abc)}
10884 @item @code{q (abc)}
10886 @item @code{qr /abc/}
10888 @item @code{qx (/bin/date)}
10890 @item @code{/pattern match/}
10892 @item @code{?pattern match?}
10894 @item @code{s/substitution/operators/}
10896 @item @code{$tied_hash@{"message"@}}
10898 @item @code{$tied_hash_reference->@{"message"@}}
10900 @item etc., issue the command @samp{man perlsyn} for details
10904 @item gettext shorthand
10905 @code{__} (double underscore)
10907 @item gettext/ngettext functions
10908 @code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext},
10909 @code{dngettext}, @code{dcngettext}
10912 @code{textdomain} function
10914 @item bindtextdomain
10915 @code{bindtextdomain} function
10917 @item bind_textdomain_codeset
10918 @code{bind_textdomain_codeset} function
10921 Use @code{setlocale (LC_ALL, "");}
10925 @*@code{use Locale::TextDomain;} (included in the package libintl-perl
10926 which is available on the Comprehensive Perl Archive Network CPAN,
10927 http://www.cpan.org/).
10929 @item Use or emulate GNU gettext
10930 platform dependent: gettext_pp emulates, gettext_xs uses GNU gettext
10933 @code{xgettext -k__ -k\$__ -k%__ -k__x -k__n:1,2 -k__nx:1,2 -k__xn:1,2 -kN__ -k}
10935 @item Formatting with positions
10936 Both kinds of format strings support formatting with positions.
10937 @*@code{printf "%2\$d %1\$d", ...} (requires Perl 5.8.0 or newer)
10938 @*@code{__expand("[new] replaces [old]", old => $oldvalue, new => $newvalue)}
10941 The @code{libintl-perl} package is platform independent but is not
10942 part of the Perl core. The programmer is responsible for
10943 providing a dummy implementation of the required functions if the
10944 package is not installed on the target system.
10946 @item po-mode marking
10949 @item Documentation
10950 Included in @code{libintl-perl}, available on CPAN
10951 (http://www.cpan.org/).
10955 An example is available in the @file{examples} directory: @code{hello-perl}.
10957 @cindex marking Perl sources
10959 The @code{xgettext} parser backend for Perl differs significantly from
10960 the parser backends for other programming languages, just as Perl
10961 itself differs significantly from other programming languages. The
10962 Perl parser backend offers many more string marking facilities than
10963 the other backends but it also has some Perl specific limitations, the
10964 worst probably being its imperfectness.
10967 * General Problems:: General Problems Parsing Perl Code
10968 * Default Keywords:: Which Keywords Will xgettext Look For?
10969 * Special Keywords:: How to Extract Hash Keys
10970 * Quote-like Expressions:: What are Strings And Quote-like Expressions?
10971 * Interpolation I:: Invalid String Interpolation
10972 * Interpolation II:: Valid String Interpolation
10973 * Parentheses:: When To Use Parentheses
10974 * Long Lines:: How To Grok with Long Lines
10975 * Perl Pitfalls:: Bugs, Pitfalls, and Things That Do Not Work
10978 @node General Problems, Default Keywords, , Perl
10979 @subsubsection General Problems Parsing Perl Code
10981 It is often heard that only Perl can parse Perl. This is not true.
10982 Perl cannot be @emph{parsed} at all, it can only be @emph{executed}.
10983 Perl has various built-in ambiguities that can only be resolved at runtime.
10985 The following example may illustrate one common problem:
10988 print gettext "Hello World!";
10991 Although this example looks like a bullet-proof case of a function
10992 invocation, it is not:
10995 open gettext, ">testfile" or die;
10996 print gettext "Hello world!"
10999 In this context, the string @code{gettext} looks more like a
11000 file handle. But not necessarily:
11003 use Locale::Messages qw (:libintl_h);
11004 open gettext ">testfile" or die;
11005 print gettext "Hello world!";
11008 Now, the file is probably syntactically incorrect, provided that the module
11009 @code{Locale::Messages} found first in the Perl include path exports a
11010 function @code{gettext}. But what if the module
11011 @code{Locale::Messages} really looks like this?
11014 use vars qw (*gettext);
11019 In this case, the string @code{gettext} will be interpreted as a file
11020 handle again, and the above example will create a file @file{testfile}
11021 and write the string ``Hello world!'' into it. Even advanced
11022 control flow analysis will not really help:
11030 print gettext "Hello world!";
11033 If the module @code{Sane} exports a function @code{gettext} that does
11034 what we expect, and the module @code{InSane} opens a file for writing
11035 and associates the @emph{handle} @code{gettext} with this output
11036 stream, we are clueless again about what will happen at runtime. It is
11037 completely unpredictable. The truth is that Perl has so many ways to
11038 fill its symbol table at runtime that it is impossible to interpret a
11039 particular piece of code without executing it.
11041 Of course, @code{xgettext} will not execute your Perl sources while
11042 scanning for translatable strings, but rather use heuristics in order
11043 to guess what you meant.
11045 Another problem is the ambiguity of the slash and the question mark.
11046 Their interpretation depends on the context:
11050 print "OK\n" if /foobar/;
11055 # Another pattern match.
11056 print "OK\n" if ?foobar?;
11059 print $x ? "foo" : "bar";
11062 The slash may either act as the division operator or introduce a
11063 pattern match, whereas the question mark may act as the ternary
11064 conditional operator or as a pattern match, too. Other programming
11065 languages like @code{awk} present similar problems, but the consequences of a
11066 misinterpretation are particularly nasty with Perl sources. In @code{awk}
11067 for instance, a statement can never exceed one line and the parser
11068 can recover from a parsing error at the next newline and interpret
11069 the rest of the input stream correctly. Perl is different, as a
11070 pattern match is terminated by the next appearance of the delimiter
11071 (the slash or the question mark) in the input stream, regardless of
11072 the semantic context. If a slash is really a division sign but
11073 mis-interpreted as a pattern match, the rest of the input file is most
11074 probably parsed incorrectly.
11076 There are certain cases, where the ambiguity cannot be resolved at all:
11079 $x = wantarray ? 1 : 0;
11082 The Perl built-in function @code{wantarray} does not accept any arguments.
11083 The Perl parser therefore knows that the question mark does not start
11084 a regular expression but is the ternary conditional operator.
11087 sub wantarrays @{@}
11088 $x = wantarrays ? 1 : 0;
11091 Now the situation is different. The function @code{wantarrays} takes
11092 a variable number of arguments (like any non-prototyped Perl function).
11093 The question mark is now the delimiter of a pattern match, and hence
11094 the piece of code does not compile.
11097 sub wantarrays() @{@}
11098 $x = wantarrays ? 1 : 0;
11101 Now the function is prototyped, Perl knows that it does not accept any
11102 arguments, and the question mark is therefore interpreted as the
11103 ternaray operator again. But that unfortunately outsmarts @code{xgettext}.
11105 The Perl parser in @code{xgettext} cannot know whether a function has
11106 a prototype and what that prototype would look like. It therefore makes
11107 an educated guess. If a function is known to be a Perl built-in and
11108 this function does not accept any arguments, a following question mark
11109 or slash is treated as an operator, otherwise as the delimiter of a
11110 following regular expression. The Perl built-ins that do not accept
11111 arguments are @code{wantarray}, @code{fork}, @code{time}, @code{times},
11112 @code{getlogin}, @code{getppid}, @code{getpwent}, @code{getgrent},
11113 @code{gethostent}, @code{getnetent}, @code{getprotoent}, @code{getservent},
11114 @code{setpwent}, @code{setgrent}, @code{endpwent}, @code{endgrent},
11115 @code{endhostent}, @code{endnetent}, @code{endprotoent}, and
11118 If you find that @code{xgettext} fails to extract strings from
11119 portions of your sources, you should therefore look out for slashes
11120 and/or question marks preceding these sections. You may have come
11121 across a bug in @code{xgettext}'s Perl parser (and of course you
11122 should report that bug). In the meantime you should consider to
11123 reformulate your code in a manner less challenging to @code{xgettext}.
11125 In particular, if the parser is too dumb to see that a function
11126 does not accept arguments, use parentheses:
11129 $x = somefunc() ? 1 : 0;
11130 $y = (somefunc) ? 1 : 0;
11133 In fact the Perl parser itself has similar problems and warns you
11134 about such constructs.
11136 @node Default Keywords, Special Keywords, General Problems, Perl
11137 @subsubsection Which keywords will xgettext look for?
11138 @cindex Perl default keywords
11140 Unless you instruct @code{xgettext} otherwise by invoking it with one
11141 of the options @code{--keyword} or @code{-k}, it will recognize the
11142 following keywords in your Perl sources:
11146 @item @code{gettext}
11148 @item @code{dgettext}
11150 @item @code{dcgettext}
11152 @item @code{ngettext:1,2}
11154 The first (singular) and the second (plural) argument will be
11157 @item @code{dngettext:1,2}
11159 The first (singular) and the second (plural) argument will be
11162 @item @code{dcngettext:1,2}
11164 The first (singular) and the second (plural) argument will be
11167 @item @code{gettext_noop}
11169 @item @code{%gettext}
11171 The keys of lookups into the hash @code{%gettext} will be extracted.
11173 @item @code{$gettext}
11175 The keys of lookups into the hash reference @code{$gettext} will be extracted.
11179 @node Special Keywords, Quote-like Expressions, Default Keywords, Perl
11180 @subsubsection How to Extract Hash Keys
11181 @cindex Perl special keywords for hash-lookups
11183 Translating messages at runtime is normally performed by looking up the
11184 original string in the translation database and returning the
11185 translated version. The ``natural'' Perl implementation is a hash
11186 lookup, and, of course, @code{xgettext} supports such practice.
11189 print __"Hello world!";
11190 print $__@{"Hello world!"@};
11191 print $__->@{"Hello world!"@};
11192 print $$__@{"Hello world!"@};
11195 The above four lines all do the same thing. The Perl module
11196 @code{Locale::TextDomain} exports by default a hash @code{%__} that
11197 is tied to the function @code{__()}. It also exports a reference
11198 @code{$__} to @code{%__}.
11200 If an argument to the @code{xgettext} option @code{--keyword},
11201 resp. @code{-k} starts with a percent sign, the rest of the keyword is
11202 interpreted as the name of a hash. If it starts with a dollar
11203 sign, the rest of the keyword is interpreted as a reference to a
11206 Note that you can omit the quotation marks (single or double) around
11207 the hash key (almost) whenever Perl itself allows it:
11210 print $gettext@{Error@};
11213 The exact rule is: You can omit the surrounding quotes, when the hash
11214 key is a valid C (!) identifier, i.e.@: when it starts with an
11215 underscore or an ASCII letter and is followed by an arbitrary number
11216 of underscores, ASCII letters or digits. Other Unicode characters
11217 are @emph{not} allowed, regardless of the @code{use utf8} pragma.
11219 @node Quote-like Expressions, Interpolation I, Special Keywords, Perl
11220 @subsubsection What are Strings And Quote-like Expressions?
11221 @cindex Perl quote-like expressions
11223 Perl offers a plethora of different string constructs. Those that can
11224 be used either as arguments to functions or inside braces for hash
11225 lookups are generally supported by @code{xgettext}.
11228 @item @strong{double-quoted strings}
11231 print gettext "Hello World!";
11234 @item @strong{single-quoted strings}
11237 print gettext 'Hello World!';
11240 @item @strong{the operator qq}
11243 print gettext qq |Hello World!|;
11244 print gettext qq <E-mail: <guido\@@imperia.net>>;
11247 The operator @code{qq} is fully supported. You can use arbitrary
11248 delimiters, including the four bracketing delimiters (round, angle,
11249 square, curly) that nest.
11251 @item @strong{the operator q}
11254 print gettext q |Hello World!|;
11255 print gettext q <E-mail: <guido@@imperia.net>>;
11258 The operator @code{q} is fully supported. You can use arbitrary
11259 delimiters, including the four bracketing delimiters (round, angle,
11260 square, curly) that nest.
11262 @item @strong{the operator qx}
11265 print gettext qx ;LANGUAGE=C /bin/date;
11266 print gettext qx [/usr/bin/ls | grep '^[A-Z]*'];
11269 The operator @code{qx} is fully supported. You can use arbitrary
11270 delimiters, including the four bracketing delimiters (round, angle,
11271 square, curly) that nest.
11273 The example is actually a useless use of @code{gettext}. It will
11274 invoke the @code{gettext} function on the output of the command
11275 specified with the @code{qx} operator. The feature was included
11276 in order to make the interface consistent (the parser will extract
11277 all strings and quote-like expressions).
11279 @item @strong{here documents}
11283 print gettext <<'EOF';
11284 program not found in $PATH
11287 print ngettext <<EOF, <<"EOF";
11290 several files deleted
11295 Here-documents are recognized. If the delimiter is enclosed in single
11296 quotes, the string is not interpolated. If it is enclosed in double
11297 quotes or has no quotes at all, the string is interpolated.
11299 Delimiters that start with a digit are not supported!
11303 @node Interpolation I, Interpolation II, Quote-like Expressions, Perl
11304 @subsubsection Invalid Uses Of String Interpolation
11305 @cindex Perl invalid string interpolation
11307 Perl is capable of interpolating variables into strings. This offers
11308 some nice features in localized programs but can also lead to
11311 A common error is a construct like the following:
11314 print gettext "This is the program $0!\n";
11317 Perl will interpolate at runtime the value of the variable @code{$0}
11318 into the argument of the @code{gettext()} function. Hence, this
11319 argument is not a string constant but a variable argument (@code{$0}
11320 is a global variable that holds the name of the Perl script being
11321 executed). The interpolation is performed by Perl before the string
11322 argument is passed to @code{gettext()} and will therefore depend on
11323 the name of the script which can only be determined at runtime.
11324 Consequently, it is almost impossible that a translation can be looked
11325 up at runtime (except if, by accident, the interpolated string is found
11326 in the message catalog).
11328 The @code{xgettext} program will therefore terminate parsing with a fatal
11329 error if it encounters a variable inside of an extracted string. In
11330 general, this will happen for all kinds of string interpolations that
11331 cannot be safely performed at compile time. If you absolutely know
11332 what you are doing, you can always circumvent this behavior:
11335 my $know_what_i_am_doing = "This is program $0!\n";
11336 print gettext $know_what_i_am_doing;
11339 Since the parser only recognizes strings and quote-like expressions,
11340 but not variables or other terms, the above construct will be
11341 accepted. You will have to find another way, however, to let your
11342 original string make it into your message catalog.
11344 If invoked with the option @code{--extract-all}, resp. @code{-a},
11345 variable interpolation will be accepted. Rationale: You will
11346 generally use this option in order to prepare your sources for
11347 internationalization.
11349 Please see the manual page @samp{man perlop} for details of strings and
11350 quote-like expressions that are subject to interpolation and those
11351 that are not. Safe interpolations (that will not lead to a fatal
11356 @item the escape sequences @code{\t} (tab, HT, TAB), @code{\n}
11357 (newline, NL), @code{\r} (return, CR), @code{\f} (form feed, FF),
11358 @code{\b} (backspace, BS), @code{\a} (alarm, bell, BEL), and @code{\e}
11361 @item octal chars, like @code{\033}
11363 Note that octal escapes in the range of 400-777 are translated into a
11364 UTF-8 representation, regardless of the presence of the @code{use utf8} pragma.
11366 @item hex chars, like @code{\x1b}
11368 @item wide hex chars, like @code{\x@{263a@}}
11370 Note that this escape is translated into a UTF-8 representation,
11371 regardless of the presence of the @code{use utf8} pragma.
11373 @item control chars, like @code{\c[} (CTRL-[)
11375 @item named Unicode chars, like @code{\N@{LATIN CAPITAL LETTER C WITH CEDILLA@}}
11377 Note that this escape is translated into a UTF-8 representation,
11378 regardless of the presence of the @code{use utf8} pragma.
11381 The following escapes are considered partially safe:
11385 @item @code{\l} lowercase next char
11387 @item @code{\u} uppercase next char
11389 @item @code{\L} lowercase till \E
11391 @item @code{\U} uppercase till \E
11393 @item @code{\E} end case modification
11395 @item @code{\Q} quote non-word characters till \E
11399 These escapes are only considered safe if the string consists of
11400 ASCII characters only. Translation of characters outside the range
11401 defined by ASCII is locale-dependent and can actually only be performed
11402 at runtime; @code{xgettext} doesn't do these locale-dependent translations
11403 at extraction time.
11405 Except for the modifier @code{\Q}, these translations, albeit valid,
11406 are generally useless and only obfuscate your sources. If a
11407 translation can be safely performed at compile time you can just as
11408 well write what you mean.
11410 @node Interpolation II, Parentheses, Interpolation I, Perl
11411 @subsubsection Valid Uses Of String Interpolation
11412 @cindex Perl valid string interpolation
11414 Perl is often used to generate sources for other programming languages
11415 or arbitrary file formats. Web applications that output HTML code
11416 make a prominent example for such usage.
11418 You will often come across situations where you want to intersperse
11419 code written in the target (programming) language with translatable
11420 messages, like in the following HTML example:
11423 print gettext <<EOF;
11424 <h1>My Homepage</h1>
11425 <script language="JavaScript"><!--
11426 for (i = 0; i < 100; ++i) @{
11427 alert ("Thank you so much for visiting my homepage!");
11433 The parser will extract the entire here document, and it will appear
11434 entirely in the resulting PO file, including the JavaScript snippet
11435 embedded in the HTML code. If you exaggerate with constructs like
11436 the above, you will run the risk that the translators of your package
11437 will look out for a less challenging project. You should consider an
11438 alternative expression here:
11442 <h1>$gettext@{"My Homepage"@}</h1>
11443 <script language="JavaScript"><!--
11444 for (i = 0; i < 100; ++i) @{
11445 alert ("$gettext@{'Thank you so much for visiting my homepage!'@}");
11451 Only the translatable portions of the code will be extracted here, and
11452 the resulting PO file will begrudgingly improve in terms of readability.
11454 You can interpolate hash lookups in all strings or quote-like
11455 expressions that are subject to interpolation (see the manual page
11456 @samp{man perlop} for details). Double interpolation is invalid, however:
11459 # TRANSLATORS: Replace "the earth" with the name of your planet.
11460 print gettext qq@{Welcome to $gettext->@{"the earth"@}@};
11463 The @code{qq}-quoted string is recognized as an argument to @code{xgettext} in
11464 the first place, and checked for invalid variable interpolation. The
11465 dollar sign of hash-dereferencing will therefore terminate the parser
11466 with an ``invalid interpolation'' error.
11468 It is valid to interpolate hash lookups in regular expressions:
11471 if ($var =~ /$gettext@{"the earth"@}/) @{
11472 print gettext "Match!\n";
11474 s/$gettext@{"U. S. A."@}/$gettext@{"U. S. A."@} $gettext@{"(dial +0)"@}/g;
11477 @node Parentheses, Long Lines, Interpolation II, Perl
11478 @subsubsection When To Use Parentheses
11479 @cindex Perl parentheses
11481 In Perl, parentheses around function arguments are mostly optional.
11482 @code{xgettext} will always assume that all
11483 recognized keywords (except for hashes and hash references) are names
11484 of properly prototyped functions, and will (hopefully) only require
11485 parentheses where Perl itself requires them. All constructs in the
11486 following example are therefore ok to use:
11490 print gettext ("Hello World!\n");
11491 print gettext "Hello World!\n";
11492 print dgettext ($package => "Hello World!\n");
11493 print dgettext $package, "Hello World!\n";
11495 # The "fat comma" => turns the left-hand side argument into a
11496 # single-quoted string!
11497 print dgettext smellovision => "Hello World!\n";
11499 # The following assignment only works with prototyped functions.
11500 # Otherwise, the functions will act as "greedy" list operators and
11501 # eat up all following arguments.
11502 my $anonymous_hash = @{
11503 planet => gettext "earth",
11504 cakes => ngettext "one cake", "several cakes", $n,
11507 # The same without fat comma:
11508 my $other_hash = @{
11509 'planet', gettext "earth",
11510 'cakes', ngettext "one cake", "several cakes", $n,
11514 # Parentheses are only significant for the first argument.
11515 print dngettext 'package', ("one cake", "several cakes", $n), $discarded;
11519 @node Long Lines, Perl Pitfalls, Parentheses, Perl
11520 @subsubsection How To Grok with Long Lines
11521 @cindex Perl long lines
11523 The necessity of long messages can often lead to a cumbersome or
11524 unreadable coding style. Perl has several options that may prevent
11525 you from writing unreadable code, and
11526 @code{xgettext} does its best to do likewise. This is where the dot
11527 operator (the string concatenation operator) may come in handy:
11531 print gettext ("This is a very long"
11532 . " message that is still"
11533 . " readable, because"
11534 . " it is split into"
11535 . " multiple lines.\n");
11539 Perl is smart enough to concatenate these constant string fragments
11540 into one long string at compile time, and so is
11541 @code{xgettext}. You will only find one long message in the resulting
11544 Note that the future Perl 6 will probably use the underscore
11545 (@samp{_}) as the string concatenation operator, and the dot
11546 (@samp{.}) for dereferencing. This new syntax is not yet supported by
11549 If embedded newline characters are not an issue, or even desired, you
11550 may also insert newline characters inside quoted strings wherever you
11555 print gettext ("<em>In HTML output
11556 embedded newlines are generally no
11557 problem, since adjacent whitespace
11558 is always rendered into a single
11559 space character.</em>");
11563 You may also consider to use here documents:
11567 print gettext <<EOF;
11569 embedded newlines are generally no
11570 problem, since adjacent whitespace
11571 is always rendered into a single
11572 space character.</em>
11577 Please do not forget that the line breaks are real, i.e.@: they
11578 translate into newline characters that will consequently show up in
11579 the resulting POT file.
11581 @node Perl Pitfalls, , Long Lines, Perl
11582 @subsubsection Bugs, Pitfalls, And Things That Do Not Work
11583 @cindex Perl pitfalls
11585 The foregoing sections should have proven that
11586 @code{xgettext} is quite smart in extracting translatable strings from
11587 Perl sources. Yet, some more or less exotic constructs that could be
11588 expected to work, actually do not work.
11590 One of the more relevant limitations can be found in the
11591 implementation of variable interpolation inside quoted strings. Only
11592 simple hash lookups can be used there:
11596 $gettext@{"The dot operator"
11599 Likewise, you cannot @@@{[ gettext ("interpolate function calls") ]@}
11600 inside quoted strings or quote-like expressions.
11604 This is valid Perl code and will actually trigger invocations of the
11605 @code{gettext} function at runtime. Yet, the Perl parser in
11606 @code{xgettext} will fail to recognize the strings. A less obvious
11607 example can be found in the interpolation of regular expressions:
11610 s/<!--START_OF_WEEK-->/gettext ("Sunday")/e;
11613 The modifier @code{e} will cause the substitution to be interpreted as
11614 an evaluable statement. Consequently, at runtime the function
11615 @code{gettext()} is called, but again, the parser fails to extract the
11616 string ``Sunday''. Use a temporary variable as a simple workaround if
11617 you really happen to need this feature:
11620 my $sunday = gettext "Sunday";
11621 s/<!--START_OF_WEEK-->/$sunday/;
11624 Hash slices would also be handy but are not recognized:
11627 my @@weekdays = @@gettext@{'Sunday', 'Monday', 'Tuesday', 'Wednesday',
11628 'Thursday', 'Friday', 'Saturday'@};
11630 @@weekdays = @@gettext@{qw (Sunday Monday Tuesday Wednesday Thursday
11631 Friday Saturday) @};
11634 This is perfectly valid usage of the tied hash @code{%gettext} but the
11635 strings are not recognized and therefore will not be extracted.
11637 Another caveat of the current version is its rudimentary support for
11638 non-ASCII characters in identifiers. You may encounter serious
11639 problems if you use identifiers with characters outside the range of
11640 'A'-'Z', 'a'-'z', '0'-'9' and the underscore '_'.
11642 Maybe some of these missing features will be implemented in future
11643 versions, but since you can always make do without them at minimal effort,
11644 these todos have very low priority.
11646 A nasty problem are brace format strings that already contain braces
11647 as part of the normal text, for example the usage strings typically
11648 encountered in programs:
11651 die "usage: $0 @{OPTIONS@} FILENAME...\n";
11654 If you want to internationalize this code with Perl brace format strings,
11655 you will run into a problem:
11658 die __x ("usage: @{program@} @{OPTIONS@} FILENAME...\n", program => $0);
11661 Whereas @samp{@{program@}} is a placeholder, @samp{@{OPTIONS@}}
11662 is not and should probably be translated. Yet, there is no way to teach
11663 the Perl parser in @code{xgettext} to recognize the first one, and leave
11664 the other one alone.
11666 There are two possible work-arounds for this problem. If you are
11667 sure that your program will run under Perl 5.8.0 or newer (these
11668 Perl versions handle positional parameters in @code{printf()}) or
11669 if you are sure that the translator will not have to reorder the arguments
11670 in her translation -- for example if you have only one brace placeholder
11671 in your string, or if it describes a syntax, like in this one --, you can
11672 mark the string as @code{no-perl-brace-format} and use @code{printf()}:
11675 # xgettext: no-perl-brace-format
11676 die sprintf ("usage: %s @{OPTIONS@} FILENAME...\n", $0);
11679 If you want to use the more portable Perl brace format, you will have to do
11680 put placeholders in place of the literal braces:
11683 die __x ("usage: @{program@} @{[@}OPTIONS@{]@} FILENAME...\n",
11684 program => $0, '[' => '@{', ']' => '@}');
11687 Perl brace format strings know no escaping mechanism. No matter how this
11688 escaping mechanism looked like, it would either give the programmer a
11689 hard time, make translating Perl brace format strings heavy-going, or
11690 result in a performance penalty at runtime, when the format directives
11691 get executed. Most of the time you will happily get along with
11692 @code{printf()} for this special case.
11694 @node PHP, Pike, Perl, List of Programming Languages
11695 @subsection PHP Hypertext Preprocessor
11700 mod_php4, mod_php4-core, phpdoc
11702 @item File extension
11703 @code{php}, @code{php3}, @code{php4}
11705 @item String syntax
11706 @code{"abc"}, @code{'abc'}
11708 @item gettext shorthand
11711 @item gettext/ngettext functions
11712 @code{gettext}, @code{dgettext}, @code{dcgettext}; starting with PHP 4.2.0
11713 also @code{ngettext}, @code{dngettext}, @code{dcngettext}
11716 @code{textdomain} function
11718 @item bindtextdomain
11719 @code{bindtextdomain} function
11722 Programmer must call @code{setlocale (LC_ALL, "")}
11727 @item Use or emulate GNU gettext
11733 @item Formatting with positions
11734 @code{printf "%2\$d %1\$d"}
11737 On platforms without gettext, the functions are not available.
11739 @item po-mode marking
11743 An example is available in the @file{examples} directory: @code{hello-php}.
11745 @node Pike, GCC-source, PHP, List of Programming Languages
11753 @item File extension
11756 @item String syntax
11759 @item gettext shorthand
11762 @item gettext/ngettext functions
11763 @code{gettext}, @code{dgettext}, @code{dcgettext}
11766 @code{textdomain} function
11768 @item bindtextdomain
11769 @code{bindtextdomain} function
11772 @code{setlocale} function
11775 @code{import Locale.Gettext;}
11777 @item Use or emulate GNU gettext
11783 @item Formatting with positions
11787 On platforms without gettext, the functions are not available.
11789 @item po-mode marking
11793 @node GCC-source, Lua, Pike, List of Programming Languages
11794 @subsection GNU Compiler Collection sources
11801 @item File extension
11802 @code{c}, @code{h}.
11804 @item String syntax
11807 @item gettext shorthand
11810 @item gettext/ngettext functions
11811 @code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext},
11812 @code{dngettext}, @code{dcngettext}
11815 @code{textdomain} function
11817 @item bindtextdomain
11818 @code{bindtextdomain} function
11821 Programmer must call @code{setlocale (LC_ALL, "")}
11824 @code{#include "intl.h"}
11826 @item Use or emulate GNU gettext
11830 @code{xgettext -k_}
11832 @item Formatting with positions
11836 Uses autoconf macros
11838 @item po-mode marking
11842 @node Lua, JavaScript, GCC-source, List of Programming Languages
11849 @item File extension
11852 @item String syntax
11859 @item @code{[[abc]]}
11861 @item @code{[=[abc]=]}
11863 @item @code{[==[abc]==]}
11869 @item gettext shorthand
11872 @item gettext/ngettext functions
11873 @code{gettext.gettext}, @code{gettext.dgettext}, @code{gettext.dcgettext},
11874 @code{gettext.ngettext}, @code{gettext.dngettext}, @code{gettext.dcngettext}
11877 @code{textdomain} function
11879 @item bindtextdomain
11880 @code{bindtextdomain} function
11886 @code{require 'gettext'} or running lua interpreter with @code{-l gettext} option
11888 @item Use or emulate GNU gettext
11894 @item Formatting with positions
11898 On platforms without gettext, the functions are not available.
11900 @item po-mode marking
11905 @subsection JavaScript
11911 @item File extension
11914 @item String syntax
11923 @item gettext shorthand
11926 @item gettext/ngettext functions
11927 @code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext},
11931 @code{textdomain} function
11933 @item bindtextdomain
11934 @code{bindtextdomain} function
11942 @item Use or emulate GNU gettext
11948 @item Formatting with positions
11952 On platforms without gettext, the functions are not available.
11954 @item po-mode marking
11958 @c This is the template for new languages.
11967 @item File extension
11969 @item String syntax
11971 @item gettext shorthand
11973 @item gettext/ngettext functions
11977 @item bindtextdomain
11983 @item Use or emulate GNU gettext
11987 @item Formatting with positions
11991 @item po-mode marking
11996 @node List of Data Formats, , List of Programming Languages, Programming Languages
11997 @section Internationalizable Data
11999 Here is a list of other data formats which can be internationalized
12003 * POT:: POT - Portable Object Template
12004 * RST:: Resource String Table
12005 * Glade:: Glade - GNOME user interface description
12008 @node POT, RST, List of Data Formats, List of Data Formats
12009 @subsection POT - Portable Object Template
12015 @item File extension
12016 @code{pot}, @code{po}
12022 @node RST, Glade, POT, List of Data Formats
12023 @subsection Resource String Table
12030 @item File extension
12034 @code{xgettext}, @code{rstconv}
12037 @node Glade, , RST, List of Data Formats
12038 @subsection Glade - GNOME user interface description
12042 glade, libglade, glade2, libglade2, intltool
12044 @item File extension
12045 @code{glade}, @code{glade2}, @code{ui}
12048 @code{xgettext}, @code{libglade-xgettext}, @code{xml-i18n-extract}, @code{intltool-extract}
12051 @c This is the template for new data formats.
12060 @item File extension
12067 @node Conclusion, Language Codes, Programming Languages, Top
12068 @chapter Concluding Remarks
12070 We would like to conclude this GNU @code{gettext} manual by presenting
12071 an history of the Translation Project so far. We finally give
12072 a few pointers for those who want to do further research or readings
12073 about Native Language Support matters.
12076 * History:: History of GNU @code{gettext}
12077 * References:: Related Readings
12080 @node History, References, Conclusion, Conclusion
12081 @section History of GNU @code{gettext}
12082 @cindex history of GNU @code{gettext}
12084 Internationalization concerns and algorithms have been informally
12085 and casually discussed for years in GNU, sometimes around GNU
12086 @code{libc}, maybe around the incoming @code{Hurd}, or otherwise
12087 (nobody clearly remembers). And even then, when the work started for
12088 real, this was somewhat independently of these previous discussions.
12090 This all began in July 1994, when Patrick D'Cruze had the idea and
12091 initiative of internationalizing version 3.9.2 of GNU @code{fileutils}.
12092 He then asked Jim Meyering, the maintainer, how to get those changes
12093 folded into an official release. That first draft was full of
12094 @code{#ifdef}s and somewhat disconcerting, and Jim wanted to find
12095 nicer ways. Patrick and Jim shared some tries and experimentations
12096 in this area. Then, feeling that this might eventually have a deeper
12097 impact on GNU, Jim wanted to know what standards were, and contacted
12098 Richard Stallman, who very quickly and verbally described an overall
12099 design for what was meant to become @code{glocale}, at that time.
12101 Jim implemented @code{glocale} and got a lot of exhausting feedback
12102 from Patrick and Richard, of course, but also from Mitchum DSouza
12103 (who wrote a @code{catgets}-like package), Roland McGrath, maybe David
12104 MacKenzie, Fran@,{c}ois Pinard, and Paul Eggert, all pushing and
12105 pulling in various directions, not always compatible, to the extent
12106 that after a couple of test releases, @code{glocale} was torn apart.
12107 In particular, Paul Eggert -- always keeping an eye on developments
12108 in Solaris -- advocated the use of the @code{gettext} API over
12109 @code{glocale}'s @code{catgets}-based API.
12111 While Jim took some distance and time and became dad for a second
12112 time, Roland wanted to get GNU @code{libc} internationalized, and
12113 got Ulrich Drepper involved in that project. Instead of starting
12114 from @code{glocale}, Ulrich rewrote something from scratch, but
12115 more conforming to the set of guidelines who emerged out of the
12116 @code{glocale} effort. Then, Ulrich got people from the previous
12117 forum to involve themselves into this new project, and the switch
12118 from @code{glocale} to what was first named @code{msgutils}, renamed
12119 @code{nlsutils}, and later @code{gettext}, became officially accepted
12120 by Richard in May 1995 or so.
12122 Let's summarize by saying that Ulrich Drepper wrote GNU @code{gettext}
12123 in April 1995. The first official release of the package, including
12124 PO mode, occurred in July 1995, and was numbered 0.7. Other people
12125 contributed to the effort by providing a discussion forum around
12126 Ulrich, writing little pieces of code, or testing. These are quoted
12127 in the @code{THANKS} file which comes with the GNU @code{gettext}
12130 While this was being done, Fran@,{c}ois adapted half a dozen of
12131 GNU packages to @code{glocale} first, then later to @code{gettext},
12132 putting them in pretest, so providing along the way an effective
12133 user environment for fine tuning the evolving tools. He also took
12134 the responsibility of organizing and coordinating the Translation
12135 Project. After nearly a year of informal exchanges between people from
12136 many countries, translator teams started to exist in May 1995, through
12137 the creation and support by Patrick D'Cruze of twenty unmoderated
12138 mailing lists for that many native languages, and two moderated
12139 lists: one for reaching all teams at once, the other for reaching
12140 all willing maintainers of internationalized free software packages.
12142 Fran@,{c}ois also wrote PO mode in June 1995 with the collaboration
12143 of Greg McGary, as a kind of contribution to Ulrich's package.
12144 He also gave a hand with the GNU @code{gettext} Texinfo manual.
12146 In 1997, Ulrich Drepper released the GNU libc 2.0, which included the
12147 @code{gettext}, @code{textdomain} and @code{bindtextdomain} functions.
12149 In 2000, Ulrich Drepper added plural form handling (the @code{ngettext}
12150 function) to GNU libc. Later, in 2001, he released GNU libc 2.2.x,
12151 which is the first free C library with full internationalization support.
12153 Ulrich being quite busy in his role of General Maintainer of GNU libc,
12154 he handed over the GNU @code{gettext} maintenance to Bruno Haible in
12155 2000. Bruno added the plural form handling to the tools as well, added
12156 support for UTF-8 and CJK locales, and wrote a few new tools for
12157 manipulating PO files.
12159 @node References, , History, Conclusion
12160 @section Related Readings
12161 @cindex related reading
12162 @cindex bibliography
12164 @strong{ NOTE: } This documentation section is outdated and needs to be
12167 Eugene H. Dorr (@file{dorre@@well.com}) maintains an interesting
12168 bibliography on internationalization matters, called
12169 @cite{Internationalization Reference List}, which is available as:
12171 ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/i18n-books.txt
12174 Michael Gschwind (@file{mike@@vlsivie.tuwien.ac.at}) maintains a
12175 Frequently Asked Questions (FAQ) list, entitled @cite{Programming for
12176 Internationalisation}. This FAQ discusses writing programs which
12177 can handle different language conventions, character sets, etc.;
12178 and is applicable to all character set encodings, with particular
12179 emphasis on @w{ISO 8859-1}. It is regularly published in Usenet
12180 groups @file{comp.unix.questions}, @file{comp.std.internat},
12181 @file{comp.software.international}, @file{comp.lang.c},
12182 @file{comp.windows.x}, @file{comp.std.c}, @file{comp.answers}
12183 and @file{news.answers}. The home location of this document is:
12185 ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/ISO-programming
12188 Patrick D'Cruze (@file{pdcruze@@li.org}) wrote a tutorial about NLS
12189 matters, and Jochen Hein (@file{Hein@@student.tu-clausthal.de}) took
12190 over the responsibility of maintaining it. It may be found as:
12192 ftp://sunsite.unc.edu/pub/Linux/utils/nls/catalogs/Incoming/...
12193 ...locale-tutorial-0.8.txt.gz
12196 This site is mirrored in:
12198 ftp://ftp.ibp.fr/pub/linux/sunsite/
12201 A French version of the same tutorial should be findable at:
12203 ftp://ftp.ibp.fr/pub/linux/french/docs/
12206 together with French translations of many Linux-related documents.
12208 @node Language Codes, Country Codes, Conclusion, Top
12209 @appendix Language Codes
12210 @cindex language codes
12213 The @w{ISO 639} standard defines two-letter codes for many languages, and
12214 three-letter codes for more rarely used languages.
12215 All abbreviations for languages used in the Translation Project should
12216 come from this standard.
12219 * Usual Language Codes:: Two-letter ISO 639 language codes
12220 * Rare Language Codes:: Three-letter ISO 639 language codes
12223 @node Usual Language Codes, Rare Language Codes, Language Codes, Language Codes
12224 @appendixsec Usual Language Codes
12226 For the commonly used languages, the @w{ISO 639-1} standard defines two-letter
12230 @include iso-639.texi
12233 @node Rare Language Codes, , Usual Language Codes, Language Codes
12234 @appendixsec Rare Language Codes
12236 For rarely used languages, the @w{ISO 639-2} standard defines three-letter
12237 codes. Here is the current list, reduced to only living languages with at least
12238 one million of speakers.
12241 @include iso-639-2.texi
12244 @node Country Codes, Licenses, Language Codes, Top
12245 @appendix Country Codes
12246 @cindex country codes
12249 The @w{ISO 3166} standard defines two character codes for many countries
12250 and territories. All abbreviations for countries used in the Translation
12251 Project should come from this standard.
12254 @include iso-3166.texi
12257 @node Licenses, Program Index, Country Codes, Top
12261 The files of this package are covered by the licenses indicated in each
12262 particular file or directory. Here is a summary:
12266 The @code{libintl} and @code{libasprintf} libraries are covered by the
12267 GNU Lesser General Public License (LGPL).
12268 A copy of the license is included in @ref{GNU LGPL}.
12271 The executable programs of this package and the @code{libgettextpo} library
12272 are covered by the GNU General Public License (GPL).
12273 A copy of the license is included in @ref{GNU GPL}.
12276 This manual is free documentation. It is dually licensed under the
12277 GNU FDL and the GNU GPL. This means that you can redistribute this
12278 manual under either of these two licenses, at your choice.
12280 This manual is covered by the GNU FDL. Permission is granted to copy,
12281 distribute and/or modify this document under the terms of the
12282 GNU Free Documentation License (FDL), either version 1.2 of the
12283 License, or (at your option) any later version published by the
12284 Free Software Foundation (FSF); with no Invariant Sections, with no
12285 Front-Cover Text, and with no Back-Cover Texts.
12286 A copy of the license is included in @ref{GNU FDL}.
12288 This manual is covered by the GNU GPL. You can redistribute it and/or
12289 modify it under the terms of the GNU General Public License (GPL), either
12290 version 2 of the License, or (at your option) any later version published
12291 by the Free Software Foundation (FSF).
12292 A copy of the license is included in @ref{GNU GPL}.
12296 * GNU GPL:: GNU General Public License
12297 * GNU LGPL:: GNU Lesser General Public License
12298 * GNU FDL:: GNU Free Documentation License
12308 @node Program Index, Option Index, Licenses, Top
12309 @unnumbered Program Index
12313 @node Option Index, Variable Index, Program Index, Top
12314 @unnumbered Option Index
12318 @node Variable Index, PO Mode Index, Option Index, Top
12319 @unnumbered Variable Index
12323 @node PO Mode Index, Autoconf Macro Index, Variable Index, Top
12324 @unnumbered PO Mode Index
12328 @node Autoconf Macro Index, Index, PO Mode Index, Top
12329 @unnumbered Autoconf Macro Index
12333 @node Index, , Autoconf Macro Index, Top
12334 @unnumbered General Index
12340 @c Local variables:
12341 @c texinfo-column-for-description: 32