gettext-tools/doc/gettext.texi

   1 \input texinfo          @c -*-texinfo-*-
   2 @c %**start of header
   3 @setfilename gettext.info
   4 @c The @ifset makeinfo ... @end ifset conditional evaluates to true in makeinfo
   5 @c for info and html output, but to false in texi2html.
   6 @ifnottex
   7 @ifclear texi2html
   8 @set makeinfo
   9 @end ifclear
  10 @end ifnottex
  11 @c The @documentencoding is needed for makeinfo; texi2html 1.52
  12 @c doesn't recognize it.
  13 @ifset makeinfo
  14 @documentencoding UTF-8
  15 @end ifset
  16 @settitle GNU @code{gettext} utilities
  17 @finalout
  18 @c Indices:
  19 @c   am = autoconf macro  @amindex
  20 @c   cp = concept         @cindex
  21 @c   ef = emacs function  @efindex
  22 @c   em = emacs mode      @emindex
  23 @c   ev = emacs variable  @evindex
  24 @c   fn = function        @findex
  25 @c   kw = keyword         @kwindex
  26 @c   op = option          @opindex
  27 @c   pg = program         @pindex
  28 @c   vr = variable        @vindex
  29 @c Unused predefined indices:
  30 @c   tp = type            @tindex
  31 @c   ky = keystroke       @kindex
  32 @defcodeindex am
  33 @defcodeindex ef
  34 @defindex em
  35 @defcodeindex ev
  36 @defcodeindex kw
  37 @defcodeindex op
  38 @syncodeindex ef em
  39 @syncodeindex ev em
  40 @syncodeindex fn cp
  41 @syncodeindex kw cp
  42 @ifclear texi2html
  43 @firstparagraphindent insert
  44 @end ifclear
  45 @c %**end of header
  46
  47 @include version.texi
  48
  49 @ifinfo
  50 @dircategory GNU Gettext Utilities
  51 @direntry
  52 * gettext: (gettext).                          GNU gettext utilities.
  53 * autopoint: (gettext)autopoint Invocation.    Copy gettext infrastructure.
  54 * envsubst: (gettext)envsubst Invocation.      Expand environment variables.
  55 * gettextize: (gettext)gettextize Invocation.  Prepare a package for gettext.
  56 * msgattrib: (gettext)msgattrib Invocation.    Select part of a PO file.
  57 * msgcat: (gettext)msgcat Invocation.          Combine several PO files.
  58 * msgcmp: (gettext)msgcmp Invocation.          Compare a PO file and template.
  59 * msgcomm: (gettext)msgcomm Invocation.        Match two PO files.
  60 * msgconv: (gettext)msgconv Invocation.        Convert PO file to encoding.
  61 * msgen: (gettext)msgen Invocation.            Create an English PO file.
  62 * msgexec: (gettext)msgexec Invocation.        Process a PO file.
  63 * msgfilter: (gettext)msgfilter Invocation.    Pipe a PO file through a filter.
  64 * msgfmt: (gettext)msgfmt Invocation.          Make MO files out of PO files.
  65 * msggrep: (gettext)msggrep Invocation.        Select part of a PO file.
  66 * msginit: (gettext)msginit Invocation.        Create a fresh PO file.
  67 * msgmerge: (gettext)msgmerge Invocation.      Update a PO file from template.
  68 * msgunfmt: (gettext)msgunfmt Invocation.      Uncompile MO file into PO file.
  69 * msguniq: (gettext)msguniq Invocation.        Unify duplicates for PO file.
  70 * ngettext: (gettext)ngettext Invocation.      Translate a message with plural.
  71 * xgettext: (gettext)xgettext Invocation.      Extract strings into a PO file.
  72 * ISO639: (gettext)Language Codes.             ISO 639 language codes.
  73 * ISO3166: (gettext)Country Codes.             ISO 3166 country codes.
  74 @end direntry
  75 @end ifinfo
  76
  77 @ifinfo
  78 This file provides documentation for GNU @code{gettext} utilities.
  79 It also serves as a reference for the free Translation Project.
  80
  81 @copying
  82 Copyright (C) 1995-1998, 2001-2012 Free Software Foundation, Inc.
  83
  84 This manual is free documentation.  It is dually licensed under the
  85 GNU FDL and the GNU GPL.  This means that you can redistribute this
  86 manual under either of these two licenses, at your choice.
  87
  88 This manual is covered by the GNU FDL.  Permission is granted to copy,
  89 distribute and/or modify this document under the terms of the
  90 GNU Free Documentation License (FDL), either version 1.2 of the
  91 License, or (at your option) any later version published by the
  92 Free Software Foundation (FSF); with no Invariant Sections, with no
  93 Front-Cover Text, and with no Back-Cover Texts.
  94 A copy of the license is included in @ref{GNU FDL}.
  95
  96 This manual is covered by the GNU GPL.  You can redistribute it and/or
  97 modify it under the terms of the GNU General Public License (GPL), either
  98 version 2 of the License, or (at your option) any later version published
  99 by the Free Software Foundation (FSF).
 100 A copy of the license is included in @ref{GNU GPL}.
 101 @end copying
 102 @end ifinfo
 103
 104 @titlepage
 105 @title GNU gettext tools, version @value{VERSION}
 106 @subtitle Native Language Support Library and Tools
 107 @subtitle Edition @value{EDITION}, @value{UPDATED}
 108 @author Ulrich Drepper
 109 @author Jim Meyering
 110 @author Fran@,{c}ois Pinard
 111 @author Bruno Haible
 112
 113 @ifnothtml
 114 @page
 115 @vskip 0pt plus 1filll
 116 @c @insertcopying
 117 Copyright (C) 1995-1998, 2001-2012 Free Software Foundation, Inc.
 118
 119 This manual is free documentation.  It is dually licensed under the
 120 GNU FDL and the GNU GPL.  This means that you can redistribute this
 121 manual under either of these two licenses, at your choice.
 122
 123 This manual is covered by the GNU FDL.  Permission is granted to copy,
 124 distribute and/or modify this document under the terms of the
 125 GNU Free Documentation License (FDL), either version 1.2 of the
 126 License, or (at your option) any later version published by the
 127 Free Software Foundation (FSF); with no Invariant Sections, with no
 128 Front-Cover Text, and with no Back-Cover Texts.
 129 A copy of the license is included in @ref{GNU FDL}.
 130
 131 This manual is covered by the GNU GPL.  You can redistribute it and/or
 132 modify it under the terms of the GNU General Public License (GPL), either
 133 version 2 of the License, or (at your option) any later version published
 134 by the Free Software Foundation (FSF).
 135 A copy of the license is included in @ref{GNU GPL}.
 136 @end ifnothtml
 137 @end titlepage
 138
 139 @c Table of Contents
 140 @contents
 141
 142 @ifset makeinfo
 143 @node Top, Introduction, (dir), (dir)
 144 @top GNU @code{gettext} utilities
 145
 146 This manual documents the GNU gettext tools and the GNU libintl library,
 147 version @value{VERSION}.
 148
 149 @menu
 150 * Introduction::                Introduction
 151 * Users::                       The User's View
 152 * PO Files::                    The Format of PO Files
 153 * Sources::                     Preparing Program Sources
 154 * Template::                    Making the PO Template File
 155 * Creating::                    Creating a New PO File
 156 * Updating::                    Updating Existing PO Files
 157 * Editing::                     Editing PO Files
 158 * Manipulating::                Manipulating PO Files
 159 * Binaries::                    Producing Binary MO Files
 160 * Programmers::                 The Programmer's View
 161 * Translators::                 The Translator's View
 162 * Maintainers::                 The Maintainer's View
 163 * Installers::                  The Installer's and Distributor's View
 164 * Programming Languages::       Other Programming Languages
 165 * Conclusion::                  Concluding Remarks
 166
 167 * Language Codes::              ISO 639 language codes
 168 * Country Codes::               ISO 3166 country codes
 169 * Licenses::                    Licenses
 170
 171 * Program Index::               Index of Programs
 172 * Option Index::                Index of Command-Line Options
 173 * Variable Index::              Index of Environment Variables
 174 * PO Mode Index::               Index of Emacs PO Mode Commands
 175 * Autoconf Macro Index::        Index of Autoconf Macros
 176 * Index::                       General Index
 177
 178 @detailmenu
 179  --- The Detailed Node Listing ---
 180
 181 Introduction
 182
 183 * Why::                         The Purpose of GNU @code{gettext}
 184 * Concepts::                    I18n, L10n, and Such
 185 * Aspects::                     Aspects in Native Language Support
 186 * Files::                       Files Conveying Translations
 187 * Overview::                    Overview of GNU @code{gettext}
 188
 189 The User's View
 190
 191 * System Installation::         Questions During Operating System Installation
 192 * Setting the GUI Locale::      How to Specify the Locale Used by GUI Programs
 193 * Setting the POSIX Locale::    How to Specify the Locale According to POSIX
 194 * Installing Localizations::    How to Install Additional Translations
 195
 196 Setting the POSIX Locale
 197
 198 * Locale Names::                How a Locale Specification Looks Like
 199 * Locale Environment Variables:: Which Environment Variable Specfies What
 200 * The LANGUAGE variable::       How to Specify a Priority List of Languages
 201
 202 Preparing Program Sources
 203
 204 * Importing::                   Importing the @code{gettext} declaration
 205 * Triggering::                  Triggering @code{gettext} Operations
 206 * Preparing Strings::           Preparing Translatable Strings
 207 * Mark Keywords::               How Marks Appear in Sources
 208 * Marking::                     Marking Translatable Strings
 209 * c-format Flag::               Telling something about the following string
 210 * Special cases::               Special Cases of Translatable Strings
 211 * Bug Report Address::          Letting Users Report Translation Bugs
 212 * Names::                       Marking Proper Names for Translation
 213 * Libraries::                   Preparing Library Sources
 214
 215 Making the PO Template File
 216
 217 * xgettext Invocation::         Invoking the @code{xgettext} Program
 218
 219 Creating a New PO File
 220
 221 * msginit Invocation::          Invoking the @code{msginit} Program
 222 * Header Entry::                Filling in the Header Entry
 223
 224 Updating Existing PO Files
 225
 226 * msgmerge Invocation::         Invoking the @code{msgmerge} Program
 227
 228 Editing PO Files
 229
 230 * KBabel::                      KDE's PO File Editor
 231 * Gtranslator::                 GNOME's PO File Editor
 232 * PO Mode::                     Emacs's PO File Editor
 233 * Compendium::                  Using Translation Compendia
 234
 235 Emacs's PO File Editor
 236
 237 * Installation::                Completing GNU @code{gettext} Installation
 238 * Main PO Commands::            Main Commands
 239 * Entry Positioning::           Entry Positioning
 240 * Normalizing::                 Normalizing Strings in Entries
 241 * Translated Entries::          Translated Entries
 242 * Fuzzy Entries::               Fuzzy Entries
 243 * Untranslated Entries::        Untranslated Entries
 244 * Obsolete Entries::            Obsolete Entries
 245 * Modifying Translations::      Modifying Translations
 246 * Modifying Comments::          Modifying Comments
 247 * Subedit::                     Mode for Editing Translations
 248 * C Sources Context::           C Sources Context
 249 * Auxiliary::                   Consulting Auxiliary PO Files
 250
 251 Using Translation Compendia
 252
 253 * Creating Compendia::          Merging translations for later use
 254 * Using Compendia::             Using older translations if they fit
 255
 256 Manipulating PO Files
 257
 258 * msgcat Invocation::           Invoking the @code{msgcat} Program
 259 * msgconv Invocation::          Invoking the @code{msgconv} Program
 260 * msggrep Invocation::          Invoking the @code{msggrep} Program
 261 * msgfilter Invocation::        Invoking the @code{msgfilter} Program
 262 * msguniq Invocation::          Invoking the @code{msguniq} Program
 263 * msgcomm Invocation::          Invoking the @code{msgcomm} Program
 264 * msgcmp Invocation::           Invoking the @code{msgcmp} Program
 265 * msgattrib Invocation::        Invoking the @code{msgattrib} Program
 266 * msgen Invocation::            Invoking the @code{msgen} Program
 267 * msgexec Invocation::          Invoking the @code{msgexec} Program
 268 * Colorizing::                  Highlighting parts of PO files
 269 * libgettextpo::                Writing your own programs that process PO files
 270
 271 Highlighting parts of PO files
 272
 273 * The --color option::          Triggering colorized output
 274 * The TERM variable::           The environment variable @code{TERM}
 275 * The --style option::          The @code{--style} option
 276 * Style rules::                 Style rules for PO files
 277 * Customizing less::            Customizing @code{less} for viewing PO files
 278
 279 Producing Binary MO Files
 280
 281 * msgfmt Invocation::           Invoking the @code{msgfmt} Program
 282 * msgunfmt Invocation::         Invoking the @code{msgunfmt} Program
 283 * MO Files::                    The Format of GNU MO Files
 284
 285 The Programmer's View
 286
 287 * catgets::                     About @code{catgets}
 288 * gettext::                     About @code{gettext}
 289 * Comparison::                  Comparing the two interfaces
 290 * Using libintl.a::             Using libintl.a in own programs
 291 * gettext grok::                Being a @code{gettext} grok
 292 * Temp Programmers::            Temporary Notes for the Programmers Chapter
 293
 294 About @code{catgets}
 295
 296 * Interface to catgets::        The interface
 297 * Problems with catgets::       Problems with the @code{catgets} interface?!
 298
 299 About @code{gettext}
 300
 301 * Interface to gettext::        The interface
 302 * Ambiguities::                 Solving ambiguities
 303 * Locating Catalogs::           Locating message catalog files
 304 * Charset conversion::          How to request conversion to Unicode
 305 * Contexts::                    Solving ambiguities in GUI programs
 306 * Plural forms::                Additional functions for handling plurals
 307 * Optimized gettext::           Optimization of the *gettext functions
 308
 309 Temporary Notes for the Programmers Chapter
 310
 311 * Temp Implementations::        Temporary - Two Possible Implementations
 312 * Temp catgets::                Temporary - About @code{catgets}
 313 * Temp WSI::                    Temporary - Why a single implementation
 314 * Temp Notes::                  Temporary - Notes
 315
 316 The Translator's View
 317
 318 * Trans Intro 0::               Introduction 0
 319 * Trans Intro 1::               Introduction 1
 320 * Discussions::                 Discussions
 321 * Organization::                Organization
 322 * Information Flow::            Information Flow
 323 * Translating plural forms::    How to fill in @code{msgstr[0]}, @code{msgstr[1]}
 324 * Prioritizing messages::       How to find which messages to translate first
 325
 326 Organization
 327
 328 * Central Coordination::        Central Coordination
 329 * National Teams::              National Teams
 330 * Mailing Lists::               Mailing Lists
 331
 332 National Teams
 333
 334 * Sub-Cultures::                Sub-Cultures
 335 * Organizational Ideas::        Organizational Ideas
 336
 337 The Maintainer's View
 338
 339 * Flat and Non-Flat::           Flat or Non-Flat Directory Structures
 340 * Prerequisites::               Prerequisite Works
 341 * gettextize Invocation::       Invoking the @code{gettextize} Program
 342 * Adjusting Files::             Files You Must Create or Alter
 343 * autoconf macros::             Autoconf macros for use in @file{configure.ac}
 344 * CVS Issues::                  Integrating with CVS
 345 * Release Management::          Creating a Distribution Tarball
 346
 347 Files You Must Create or Alter
 348
 349 * po/POTFILES.in::              @file{POTFILES.in} in @file{po/}
 350 * po/LINGUAS::                  @file{LINGUAS} in @file{po/}
 351 * po/Makevars::                 @file{Makevars} in @file{po/}
 352 * po/Rules-*::                  Extending @file{Makefile} in @file{po/}
 353 * configure.ac::                @file{configure.ac} at top level
 354 * config.guess::                @file{config.guess}, @file{config.sub} at top level
 355 * mkinstalldirs::               @file{mkinstalldirs} at top level
 356 * aclocal::                     @file{aclocal.m4} at top level
 357 * acconfig::                    @file{acconfig.h} at top level
 358 * config.h.in::                 @file{config.h.in} at top level
 359 * Makefile::                    @file{Makefile.in} at top level
 360 * src/Makefile::                @file{Makefile.in} in @file{src/}
 361 * lib/gettext.h::               @file{gettext.h} in @file{lib/}
 362
 363 Autoconf macros for use in @file{configure.ac}
 364
 365 * AM_GNU_GETTEXT::              AM_GNU_GETTEXT in @file{gettext.m4}
 366 * AM_GNU_GETTEXT_VERSION::      AM_GNU_GETTEXT_VERSION in @file{gettext.m4}
 367 * AM_GNU_GETTEXT_NEED::         AM_GNU_GETTEXT_NEED in @file{gettext.m4}
 368 * AM_GNU_GETTEXT_INTL_SUBDIR::  AM_GNU_GETTEXT_INTL_SUBDIR in @file{intldir.m4}
 369 * AM_PO_SUBDIRS::               AM_PO_SUBDIRS in @file{po.m4}
 370 * AM_ICONV::                    AM_ICONV in @file{iconv.m4}
 371
 372 Integrating with CVS
 373
 374 * Distributed CVS::             Avoiding version mismatch in distributed development
 375 * Files under CVS::             Files to put under CVS version control
 376 * autopoint Invocation::        Invoking the @code{autopoint} Program
 377
 378 Other Programming Languages
 379
 380 * Language Implementors::       The Language Implementor's View
 381 * Programmers for other Languages::  The Programmer's View
 382 * Translators for other Languages::  The Translator's View
 383 * Maintainers for other Languages::  The Maintainer's View
 384 * List of Programming Languages::  Individual Programming Languages
 385 * List of Data Formats::        Internationalizable Data
 386
 387 The Translator's View
 388
 389 * c-format::                    C Format Strings
 390 * objc-format::                 Objective C Format Strings
 391 * sh-format::                   Shell Format Strings
 392 * python-format::               Python Format Strings
 393 * lisp-format::                 Lisp Format Strings
 394 * elisp-format::                Emacs Lisp Format Strings
 395 * librep-format::               librep Format Strings
 396 * scheme-format::               Scheme Format Strings
 397 * smalltalk-format::            Smalltalk Format Strings
 398 * java-format::                 Java Format Strings
 399 * csharp-format::               C# Format Strings
 400 * awk-format::                  awk Format Strings
 401 * object-pascal-format::        Object Pascal Format Strings
 402 * ycp-format::                  YCP Format Strings
 403 * tcl-format::                  Tcl Format Strings
 404 * perl-format::                 Perl Format Strings
 405 * php-format::                  PHP Format Strings
 406 * gcc-internal-format::         GCC internal Format Strings
 407 * gfc-internal-format::         GFC internal Format Strings
 408 * qt-format::                   Qt Format Strings
 409 * qt-plural-format::            Qt Plural Format Strings
 410 * kde-format::                  KDE Format Strings
 411 * boost-format::                Boost Format Strings
 412 * lua-format::                  Lua Format Strings
 413 * javascript-format::           JavaScript Format Strings
 414
 415 Individual Programming Languages
 416
 417 * C::                           C, C++, Objective C
 418 * sh::                          sh - Shell Script
 419 * bash::                        bash - Bourne-Again Shell Script
 420 * Python::                      Python
 421 * Common Lisp::                 GNU clisp - Common Lisp
 422 * clisp C::                     GNU clisp C sources
 423 * Emacs Lisp::                  Emacs Lisp
 424 * librep::                      librep
 425 * Scheme::                      GNU guile - Scheme
 426 * Smalltalk::                   GNU Smalltalk
 427 * Java::                        Java
 428 * C#::                          C#
 429 * gawk::                        GNU awk
 430 * Pascal::                      Pascal - Free Pascal Compiler
 431 * wxWidgets::                   wxWidgets library
 432 * YCP::                         YCP - YaST2 scripting language
 433 * Tcl::                         Tcl - Tk's scripting language
 434 * Perl::                        Perl
 435 * PHP::                         PHP Hypertext Preprocessor
 436 * Pike::                        Pike
 437 * GCC-source::                  GNU Compiler Collection sources
 438 * Lua::                         Lua
 439 * JavaScript::                  JavaScript
 440
 441 sh - Shell Script
 442
 443 * Preparing Shell Scripts::     Preparing Shell Scripts for Internationalization
 444 * gettext.sh::                  Contents of @code{gettext.sh}
 445 * gettext Invocation::          Invoking the @code{gettext} program
 446 * ngettext Invocation::         Invoking the @code{ngettext} program
 447 * envsubst Invocation::         Invoking the @code{envsubst} program
 448 * eval_gettext Invocation::     Invoking the @code{eval_gettext} function
 449 * eval_ngettext Invocation::    Invoking the @code{eval_ngettext} function
 450
 451 Perl
 452
 453 * General Problems::            General Problems Parsing Perl Code
 454 * Default Keywords::            Which Keywords Will xgettext Look For?
 455 * Special Keywords::            How to Extract Hash Keys
 456 * Quote-like Expressions::      What are Strings And Quote-like Expressions?
 457 * Interpolation I::             Invalid String Interpolation
 458 * Interpolation II::            Valid String Interpolation
 459 * Parentheses::                 When To Use Parentheses
 460 * Long Lines::                  How To Grok with Long Lines
 461 * Perl Pitfalls::               Bugs, Pitfalls, and Things That Do Not Work
 462
 463 Internationalizable Data
 464
 465 * POT::                         POT - Portable Object Template
 466 * RST::                         Resource String Table
 467 * Glade::                       Glade - GNOME user interface description
 468
 469 Concluding Remarks
 470
 471 * History::                     History of GNU @code{gettext}
 472 * References::                  Related Readings
 473
 474 Language Codes
 475
 476 * Usual Language Codes::        Two-letter ISO 639 language codes
 477 * Rare Language Codes::         Three-letter ISO 639 language codes
 478
 479 Licenses
 480
 481 * GNU GPL::                     GNU General Public License
 482 * GNU LGPL::                    GNU Lesser General Public License
 483 * GNU FDL::                     GNU Free Documentation License
 484
 485 @end detailmenu
 486 @end menu
 487
 488 @end ifset
 489
 490 @node Introduction, Users, Top, Top
 491 @chapter Introduction
 492
 493 This chapter explains the goals sought in the creation
 494 of GNU @code{gettext} and the free Translation Project.
 495 Then, it explains a few broad concepts around
 496 Native Language Support, and positions message translation with regard
 497 to other aspects of national and cultural variance, as they apply
 498 to programs.  It also surveys those files used to convey the
 499 translations.  It explains how the various tools interact in the
 500 initial generation of these files, and later, how the maintenance
 501 cycle should usually operate.
 502
 503 @cindex sex
 504 @cindex he, she, and they
 505 @cindex she, he, and they
 506 In this manual, we use @emph{he} when speaking of the programmer or
 507 maintainer, @emph{she} when speaking of the translator, and @emph{they}
 508 when speaking of the installers or end users of the translated program.
 509 This is only a convenience for clarifying the documentation.  It is
 510 @emph{absolutely} not meant to imply that some roles are more appropriate
 511 to males or females.  Besides, as you might guess, GNU @code{gettext}
 512 is meant to be useful for people using computers, whatever their sex,
 513 race, religion or nationality!
 514
 515 @cindex bug report address
 516 Please send suggestions and corrections to:
 517
 518 @example
 519 @group
 520 @r{Internet address:}
 521     bug-gnu-gettext@@gnu.org
 522 @end group
 523 @end example
 524
 525 @noindent
 526 Please include the manual's edition number and update date in your messages.
 527
 528 @menu
 529 * Why::                         The Purpose of GNU @code{gettext}
 530 * Concepts::                    I18n, L10n, and Such
 531 * Aspects::                     Aspects in Native Language Support
 532 * Files::                       Files Conveying Translations
 533 * Overview::                    Overview of GNU @code{gettext}
 534 @end menu
 535
 536 @node Why, Concepts, Introduction, Introduction
 537 @section The Purpose of GNU @code{gettext}
 538
 539 Usually, programs are written and documented in English, and use
 540 English at execution time to interact with users.  This is true
 541 not only of GNU software, but also of a great deal of proprietary
 542 and free software.  Using a common language is quite handy for
 543 communication between developers, maintainers and users from all
 544 countries.  On the other hand, most people are less comfortable with
 545 English than with their own native language, and would prefer to
 546 use their mother tongue for day to day's work, as far as possible.
 547 Many would simply @emph{love} to see their computer screen showing
 548 a lot less of English, and far more of their own language.
 549
 550 @cindex Translation Project
 551 However, to many people, this dream might appear so far fetched that
 552 they may believe it is not even worth spending time thinking about
 553 it.  They have no confidence at all that the dream might ever
 554 become true.  Yet some have not lost hope, and have organized themselves.
 555 The Translation Project is a formalization of this hope into a
 556 workable structure, which has a good chance to get all of us nearer
 557 the achievement of a truly multi-lingual set of programs.
 558
 559 GNU @code{gettext} is an important step for the Translation Project,
 560 as it is an asset on which we may build many other steps.  This package
 561 offers to programmers, translators and even users, a well integrated
 562 set of tools and documentation.  Specifically, the GNU @code{gettext}
 563 utilities are a set of tools that provides a framework within which
 564 other free packages may produce multi-lingual messages.  These tools
 565 include
 566
 567 @itemize @bullet
 568 @item
 569 A set of conventions about how programs should be written to support
 570 message catalogs.
 571
 572 @item
 573 A directory and file naming organization for the message catalogs
 574 themselves.
 575
 576 @item
 577 A runtime library supporting the retrieval of translated messages.
 578
 579 @item
 580 A few stand-alone programs to massage in various ways the sets of
 581 translatable strings, or already translated strings.
 582
 583 @item
 584 A library supporting the parsing and creation of files containing
 585 translated messages.
 586
 587 @item
 588 A special mode for Emacs@footnote{In this manual, all mentions of Emacs
 589 refers to either GNU Emacs or to XEmacs, which people sometimes call FSF
 590 Emacs and Lucid Emacs, respectively.} which helps preparing these sets
 591 and bringing them up to date.
 592 @end itemize
 593
 594 GNU @code{gettext} is designed to minimize the impact of
 595 internationalization on program sources, keeping this impact as small
 596 and hardly noticeable as possible.  Internationalization has better
 597 chances of succeeding if it is very light weighted, or at least,
 598 appear to be so, when looking at program sources.
 599
 600 The Translation Project also uses the GNU @code{gettext} distribution
 601 as a vehicle for documenting its structure and methods.  This goes
 602 beyond the strict technicalities of documenting the GNU @code{gettext}
 603 proper.  By so doing, translators will find in a single place, as
 604 far as possible, all they need to know for properly doing their
 605 translating work.  Also, this supplemental documentation might also
 606 help programmers, and even curious users, in understanding how GNU
 607 @code{gettext} is related to the remainder of the Translation
 608 Project, and consequently, have a glimpse at the @emph{big picture}.
 609
 610 @node Concepts, Aspects, Why, Introduction
 611 @section I18n, L10n, and Such
 612
 613 @cindex i18n
 614 @cindex l10n
 615 Two long words appear all the time when we discuss support of native
 616 language in programs, and these words have a precise meaning, worth
 617 being explained here, once and for all in this document.  The words are
 618 @emph{internationalization} and @emph{localization}.  Many people,
 619 tired of writing these long words over and over again, took the
 620 habit of writing @dfn{i18n} and @dfn{l10n} instead, quoting the first
 621 and last letter of each word, and replacing the run of intermediate
 622 letters by a number merely telling how many such letters there are.
 623 But in this manual, in the sake of clarity, we will patiently write
 624 the names in full, each time@dots{}
 625
 626 @cindex internationalization
 627 By @dfn{internationalization}, one refers to the operation by which a
 628 program, or a set of programs turned into a package, is made aware of and
 629 able to support multiple languages.  This is a generalization process,
 630 by which the programs are untied from calling only English strings or
 631 other English specific habits, and connected to generic ways of doing
 632 the same, instead.  Program developers may use various techniques to
 633 internationalize their programs.  Some of these have been standardized.
 634 GNU @code{gettext} offers one of these standards.  @xref{Programmers}.
 635
 636 @cindex localization
 637 By @dfn{localization}, one means the operation by which, in a set
 638 of programs already internationalized, one gives the program all
 639 needed information so that it can adapt itself to handle its input
 640 and output in a fashion which is correct for some native language and
 641 cultural habits.  This is a particularisation process, by which generic
 642 methods already implemented in an internationalized program are used
 643 in specific ways.  The programming environment puts several functions
 644 to the programmers disposal which allow this runtime configuration.
 645 The formal description of specific set of cultural habits for some
 646 country, together with all associated translations targeted to the
 647 same native language, is called the @dfn{locale} for this language
 648 or country.  Users achieve localization of programs by setting proper
 649 values to special environment variables, prior to executing those
 650 programs, identifying which locale should be used.
 651
 652 In fact, locale message support is only one component of the cultural
 653 data that makes up a particular locale.  There are a whole host of
 654 routines and functions provided to aid programmers in developing
 655 internationalized software and which allow them to access the data
 656 stored in a particular locale.  When someone presently refers to a
 657 particular locale, they are obviously referring to the data stored
 658 within that particular locale.  Similarly, if a programmer is referring
 659 to ``accessing the locale routines'', they are referring to the
 660 complete suite of routines that access all of the locale's information.
 661
 662 @cindex NLS
 663 @cindex Native Language Support
 664 @cindex Natural Language Support
 665 One uses the expression @dfn{Native Language Support}, or merely NLS,
 666 for speaking of the overall activity or feature encompassing both
 667 internationalization and localization, allowing for multi-lingual
 668 interactions in a program.  In a nutshell, one could say that
 669 internationalization is the operation by which further localizations
 670 are made possible.
 671
 672 Also, very roughly said, when it comes to multi-lingual messages,
 673 internationalization is usually taken care of by programmers, and
 674 localization is usually taken care of by translators.
 675
 676 @node Aspects, Files, Concepts, Introduction
 677 @section Aspects in Native Language Support
 678
 679 @cindex translation aspects
 680 For a totally multi-lingual distribution, there are many things to
 681 translate beyond output messages.
 682
 683 @itemize @bullet
 684 @item
 685 As of today, GNU @code{gettext} offers a complete toolset for
 686 translating messages output by C programs.  Perl scripts and shell
 687 scripts will also need to be translated.  Even if there are today some hooks
 688 by which this can be done, these hooks are not integrated as well as they
 689 should be.
 690
 691 @item
 692 Some programs, like @code{autoconf} or @code{bison}, are able
 693 to produce other programs (or scripts).  Even if the generating
 694 programs themselves are internationalized, the generated programs they
 695 produce may need internationalization on their own, and this indirect
 696 internationalization could be automated right from the generating
 697 program.  In fact, quite usually, generating and generated programs
 698 could be internationalized independently, as the effort needed is
 699 fairly orthogonal.
 700
 701 @item
 702 A few programs include textual tables which might need translation
 703 themselves, independently of the strings contained in the program
 704 itself.  For example, @w{RFC 1345} gives an English description for each
 705 character which the @code{recode} program is able to reconstruct at execution.
 706 Since these descriptions are extracted from the RFC by mechanical means,
 707 translating them properly would require a prior translation of the RFC
 708 itself.
 709
 710 @item
 711 Almost all programs accept options, which are often worded out so to
 712 be descriptive for the English readers; one might want to consider
 713 offering translated versions for program options as well.
 714
 715 @item
 716 Many programs read, interpret, compile, or are somewhat driven by
 717 input files which are texts containing keywords, identifiers, or
 718 replies which are inherently translatable.  For example, one may want
 719 @code{gcc} to allow diacriticized characters in identifiers or use
 720 translated keywords; @samp{rm -i} might accept something else than
 721 @samp{y} or @samp{n} for replies, etc.  Even if the program will
 722 eventually make most of its output in the foreign languages, one has
 723 to decide whether the input syntax, option values, etc., are to be
 724 localized or not.
 725
 726 @item
 727 The manual accompanying a package, as well as all documentation files
 728 in the distribution, could surely be translated, too.  Translating a
 729 manual, with the intent of later keeping up with updates, is a major
 730 undertaking in itself, generally.
 731
 732 @end itemize
 733
 734 As we already stressed, translation is only one aspect of locales.
 735 Other internationalization aspects are system services and are handled
 736 in GNU @code{libc}.  There
 737 are many attributes that are needed to define a country's cultural
 738 conventions.  These attributes include beside the country's native
 739 language, the formatting of the date and time, the representation of
 740 numbers, the symbols for currency, etc.  These local @dfn{rules} are
 741 termed the country's locale.  The locale represents the knowledge
 742 needed to support the country's native attributes.
 743
 744 @cindex locale categories
 745 There are a few major areas which may vary between countries and
 746 hence, define what a locale must describe.  The following list helps
 747 putting multi-lingual messages into the proper context of other tasks
 748 related to locales.  See the GNU @code{libc} manual for details.
 749
 750 @table @emph
 751
 752 @item Characters and Codesets
 753 @cindex codeset
 754 @cindex encoding
 755 @cindex character encoding
 756 @cindex locale category, LC_CTYPE
 757
 758 The codeset most commonly used through out the USA and most English
 759 speaking parts of the world is the ASCII codeset.  However, there are
 760 many characters needed by various locales that are not found within
 761 this codeset.  The 8-bit @w{ISO 8859-1} code set has most of the special
 762 characters needed to handle the major European languages.  However, in
 763 many cases, choosing @w{ISO 8859-1} is nevertheless not adequate: it
 764 doesn't even handle the major European currency.  Hence each locale
 765 will need to specify which codeset they need to use and will need
 766 to have the appropriate character handling routines to cope with
 767 the codeset.
 768
 769 @item Currency
 770 @cindex currency symbols
 771 @cindex locale category, LC_MONETARY
 772
 773 The symbols used vary from country to country as does the position
 774 used by the symbol.  Software needs to be able to transparently
 775 display currency figures in the native mode for each locale.
 776
 777 @item Dates
 778 @cindex date format
 779 @cindex locale category, LC_TIME
 780
 781 The format of date varies between locales.  For example, Christmas day
 782 in 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia.
 783 Other countries might use @w{ISO 8601} dates, etc.
 784
 785 Time of the day may be noted as @var{hh}:@var{mm}, @var{hh}.@var{mm},
 786 or otherwise.  Some locales require time to be specified in 24-hour
 787 mode rather than as AM or PM.  Further, the nature and yearly extent
 788 of the Daylight Saving correction vary widely between countries.
 789
 790 @item Numbers
 791 @cindex number format
 792 @cindex locale category, LC_NUMERIC
 793
 794 Numbers can be represented differently in different locales.
 795 For example, the following numbers are all written correctly for
 796 their respective locales:
 797
 798 @example
 799 12,345.67       English
 800 12.345,67       German
 801  12345,67       French
 802 1,2345.67       Asia
 803 @end example
 804
 805 Some programs could go further and use different unit systems, like
 806 English units or Metric units, or even take into account variants
 807 about how numbers are spelled in full.
 808
 809 @item Messages
 810 @cindex messages
 811 @cindex locale category, LC_MESSAGES
 812
 813 The most obvious area is the language support within a locale.  This is
 814 where GNU @code{gettext} provides the means for developers and users to
 815 easily change the language that the software uses to communicate to
 816 the user.
 817
 818 @end table
 819
 820 @cindex locale categories
 821 These areas of cultural conventions are called @emph{locale categories}.
 822 It is an unfortunate term; @emph{locale aspects} or @emph{locale feature
 823 categories} would be a better term, because each ``locale category''
 824 describes an area or task that requires localization.  The concrete data
 825 that describes the cultural conventions for such an area and for a particular
 826 culture is also called a @emph{locale category}.  In this sense, a locale
 827 is composed of several locale categories: the locale category describing
 828 the codeset, the locale category describing the formatting of numbers,
 829 the locale category containing the translated messages, and so on.
 830
 831 @cindex Linux
 832 Components of locale outside of message handling are standardized in
 833 the ISO C standard and the POSIX:2001 standard (also known as the SUSV3
 834 specification).  GNU @code{libc}
 835 fully implements this, and most other modern systems provide a more
 836 or less reasonable support for at least some of the missing components.
 837
 838 @node Files, Overview, Aspects, Introduction
 839 @section Files Conveying Translations
 840
 841 @cindex files, @file{.po} and @file{.mo}
 842 The letters PO in @file{.po} files means Portable Object, to
 843 distinguish it from @file{.mo} files, where MO stands for Machine
 844 Object.  This paradigm, as well as the PO file format, is inspired
 845 by the NLS standard developed by Uniforum, and first implemented by
 846 Sun in their Solaris system.
 847
 848 PO files are meant to be read and edited by humans, and associate each
 849 original, translatable string of a given package with its translation
 850 in a particular target language.  A single PO file is dedicated to
 851 a single target language.  If a package supports many languages,
 852 there is one such PO file per language supported, and each package
 853 has its own set of PO files.  These PO files are best created by
 854 the @code{xgettext} program, and later updated or refreshed through
 855 the @code{msgmerge} program.  Program @code{xgettext} extracts all
 856 marked messages from a set of C files and initializes a PO file with
 857 empty translations.  Program @code{msgmerge} takes care of adjusting
 858 PO files between releases of the corresponding sources, commenting
 859 obsolete entries, initializing new ones, and updating all source
 860 line references.  Files ending with @file{.pot} are kind of base
 861 translation files found in distributions, in PO file format.
 862
 863 MO files are meant to be read by programs, and are binary in nature.
 864 A few systems already offer tools for creating and handling MO files
 865 as part of the Native Language Support coming with the system, but the
 866 format of these MO files is often different from system to system,
 867 and non-portable.  The tools already provided with these systems don't
 868 support all the features of GNU @code{gettext}.  Therefore GNU
 869 @code{gettext} uses its own format for MO files.  Files ending with
 870 @file{.gmo} are really MO files, when it is known that these files use
 871 the GNU format.
 872
 873 @node Overview,  , Files, Introduction
 874 @section Overview of GNU @code{gettext}
 875
 876 @cindex overview of @code{gettext}
 877 @cindex big picture
 878 @cindex tutorial of @code{gettext} usage
 879 The following diagram summarizes the relation between the files
 880 handled by GNU @code{gettext} and the tools acting on these files.
 881 It is followed by somewhat detailed explanations, which you should
 882 read while keeping an eye on the diagram.  Having a clear understanding
 883 of these interrelations will surely help programmers, translators
 884 and maintainers.
 885
 886 @ifhtml
 887 @example
 888 @group
 889 Original C Sources ───> Preparation ───> Marked C Sources ───╮
 890                                                              │
 891               ╭─────────<─── GNU gettext Library             │
 892 ╭─── make <───┤                                              │
 893 │             ╰─────────<────────────────────┬───────────────╯
 894 │                                            │
 895 │   ╭─────<─── PACKAGE.pot <─── xgettext <───╯   ╭───<─── PO Compendium
 896 │   │                                            │              ↑
 897 │   │                                            ╰───╮          │
 898 │   ╰───╮                                            ├───> PO editor ───╮
 899 │       ├────> msgmerge ──────> LANG.po ────>────────╯                  │
 900 │   ╭───╯                                                               │
 901 │   │                                                                   │
 902 │   ╰─────────────<───────────────╮                                     │
 903 │                                 ├─── New LANG.po <────────────────────╯
 904 │   ╭─── LANG.gmo <─── msgfmt <───╯
 905 │   │
 906 │   ╰───> install ───> /.../LANG/PACKAGE.mo ───╮
 907 │                                              ├───> "Hello world!"
 908 ╰───────> install ───> /.../bin/PROGRAM ───────╯
 909 @end group
 910 @end example
 911 @end ifhtml
 912 @ifnothtml
 913 @example
 914 @group
 915 Original C Sources ---> Preparation ---> Marked C Sources ---.
 916                                                              |
 917               .---------<--- GNU gettext Library             |
 918 .--- make <---+                                              |
 919 |             `---------<--------------------+---------------'
 920 |                                            |
 921 |   .-----<--- PACKAGE.pot <--- xgettext <---'   .---<--- PO Compendium
 922 |   |                                            |              ^
 923 |   |                                            `---.          |
 924 |   `---.                                            +---> PO editor ---.
 925 |       +----> msgmerge ------> LANG.po ---->--------'                  |
 926 |   .---'                                                               |
 927 |   |                                                                   |
 928 |   `-------------<---------------.                                     |
 929 |                                 +--- New LANG.po <--------------------'
 930 |   .--- LANG.gmo <--- msgfmt <---'
 931 |   |
 932 |   `---> install ---> /.../LANG/PACKAGE.mo ---.
 933 |                                              +---> "Hello world!"
 934 `-------> install ---> /.../bin/PROGRAM -------'
 935 @end group
 936 @end example
 937 @end ifnothtml
 938
 939 @cindex marking translatable strings
 940 As a programmer, the first step to bringing GNU @code{gettext}
 941 into your package is identifying, right in the C sources, those strings
 942 which are meant to be translatable, and those which are untranslatable.
 943 This tedious job can be done a little more comfortably using emacs PO
 944 mode, but you can use any means familiar to you for modifying your
 945 C sources.  Beside this some other simple, standard changes are needed to
 946 properly initialize the translation library.  @xref{Sources}, for
 947 more information about all this.
 948
 949 For newly written software the strings of course can and should be
 950 marked while writing it.  The @code{gettext} approach makes this
 951 very easy.  Simply put the following lines at the beginning of each file
 952 or in a central header file:
 953
 954 @example
 955 @group
 956 #define _(String) (String)
 957 #define N_(String) String
 958 #define textdomain(Domain)
 959 #define bindtextdomain(Package, Directory)
 960 @end group
 961 @end example
 962
 963 @noindent
 964 Doing this allows you to prepare the sources for internationalization.
 965 Later when you feel ready for the step to use the @code{gettext} library
 966 simply replace these definitions by the following:
 967
 968 @cindex include file @file{libintl.h}
 969 @example
 970 @group
 971 #include <libintl.h>
 972 #define _(String) gettext (String)
 973 #define gettext_noop(String) String
 974 #define N_(String) gettext_noop (String)
 975 @end group
 976 @end example
 977
 978 @cindex link with @file{libintl}
 979 @cindex Linux
 980 @noindent
 981 and link against @file{libintl.a} or @file{libintl.so}.  Note that on
 982 GNU systems, you don't need to link with @code{libintl} because the
 983 @code{gettext} library functions are already contained in GNU libc.
 984 That is all you have to change.
 985
 986 @cindex template PO file
 987 @cindex files, @file{.pot}
 988 Once the C sources have been modified, the @code{xgettext} program
 989 is used to find and extract all translatable strings, and create a
 990 PO template file out of all these.  This @file{@var{package}.pot} file
 991 contains all original program strings.  It has sets of pointers to
 992 exactly where in C sources each string is used.  All translations
 993 are set to empty.  The letter @code{t} in @file{.pot} marks this as
 994 a Template PO file, not yet oriented towards any particular language.
 995 @xref{xgettext Invocation}, for more details about how one calls the
 996 @code{xgettext} program.  If you are @emph{really} lazy, you might
 997 be interested at working a lot more right away, and preparing the
 998 whole distribution setup (@pxref{Maintainers}).  By doing so, you
 999 spare yourself typing the @code{xgettext} command, as @code{make}
1000 should now generate the proper things automatically for you!
1001
1002 The first time through, there is no @file{@var{lang}.po} yet, so the
1003 @code{msgmerge} step may be skipped and replaced by a mere copy of
1004 @file{@var{package}.pot} to @file{@var{lang}.po}, where @var{lang}
1005 represents the target language.  See @ref{Creating} for details.
1006
1007 Then comes the initial translation of messages.  Translation in
1008 itself is a whole matter, still exclusively meant for humans,
1009 and whose complexity far overwhelms the level of this manual.
1010 Nevertheless, a few hints are given in some other chapter of this
1011 manual (@pxref{Translators}).  You will also find there indications
1012 about how to contact translating teams, or becoming part of them,
1013 for sharing your translating concerns with others who target the same
1014 native language.
1015
1016 While adding the translated messages into the @file{@var{lang}.po}
1017 PO file, if you are not using one of the dedicated PO file editors
1018 (@pxref{Editing}), you are on your own
1019 for ensuring that your efforts fully respect the PO file format, and quoting
1020 conventions (@pxref{PO Files}).  This is surely not an impossible task,
1021 as this is the way many people have handled PO files around 1995.
1022 On the other hand, by using a PO file editor, most details
1023 of PO file format are taken care of for you, but you have to acquire
1024 some familiarity with PO file editor itself.
1025
1026 If some common translations have already been saved into a compendium
1027 PO file, translators may use PO mode for initializing untranslated
1028 entries from the compendium, and also save selected translations into
1029 the compendium, updating it (@pxref{Compendium}).  Compendium files
1030 are meant to be exchanged between members of a given translation team.
1031
1032 Programs, or packages of programs, are dynamic in nature: users write
1033 bug reports and suggestion for improvements, maintainers react by
1034 modifying programs in various ways.  The fact that a package has
1035 already been internationalized should not make maintainers shy
1036 of adding new strings, or modifying strings already translated.
1037 They just do their job the best they can.  For the Translation
1038 Project to work smoothly, it is important that maintainers do not
1039 carry translation concerns on their already loaded shoulders, and that
1040 translators be kept as free as possible of programming concerns.
1041
1042 The only concern maintainers should have is carefully marking new
1043 strings as translatable, when they should be, and do not otherwise
1044 worry about them being translated, as this will come in proper time.
1045 Consequently, when programs and their strings are adjusted in various
1046 ways by maintainers, and for matters usually unrelated to translation,
1047 @code{xgettext} would construct @file{@var{package}.pot} files which are
1048 evolving over time, so the translations carried by @file{@var{lang}.po}
1049 are slowly fading out of date.
1050
1051 @cindex evolution of packages
1052 It is important for translators (and even maintainers) to understand
1053 that package translation is a continuous process in the lifetime of a
1054 package, and not something which is done once and for all at the start.
1055 After an initial burst of translation activity for a given package,
1056 interventions are needed once in a while, because here and there,
1057 translated entries become obsolete, and new untranslated entries
1058 appear, needing translation.
1059
1060 The @code{msgmerge} program has the purpose of refreshing an already
1061 existing @file{@var{lang}.po} file, by comparing it with a newer
1062 @file{@var{package}.pot} template file, extracted by @code{xgettext}
1063 out of recent C sources.  The refreshing operation adjusts all
1064 references to C source locations for strings, since these strings
1065 move as programs are modified.  Also, @code{msgmerge} comments out as
1066 obsolete, in @file{@var{lang}.po}, those already translated entries
1067 which are no longer used in the program sources (@pxref{Obsolete
1068 Entries}).  It finally discovers new strings and inserts them in
1069 the resulting PO file as untranslated entries (@pxref{Untranslated
1070 Entries}).  @xref{msgmerge Invocation}, for more information about what
1071 @code{msgmerge} really does.
1072
1073 Whatever route or means taken, the goal is to obtain an updated
1074 @file{@var{lang}.po} file offering translations for all strings.
1075
1076 The temporal mobility, or fluidity of PO files, is an integral part of
1077 the translation game, and should be well understood, and accepted.
1078 People resisting it will have a hard time participating in the
1079 Translation Project, or will give a hard time to other participants!  In
1080 particular, maintainers should relax and include all available official
1081 PO files in their distributions, even if these have not recently been
1082 updated, without exerting pressure on the translator teams to get the
1083 job done.  The pressure should rather come
1084 from the community of users speaking a particular language, and
1085 maintainers should consider themselves fairly relieved of any concern
1086 about the adequacy of translation files.  On the other hand, translators
1087 should reasonably try updating the PO files they are responsible for,
1088 while the package is undergoing pretest, prior to an official
1089 distribution.
1090
1091 Once the PO file is complete and dependable, the @code{msgfmt} program
1092 is used for turning the PO file into a machine-oriented format, which
1093 may yield efficient retrieval of translations by the programs of the
1094 package, whenever needed at runtime (@pxref{MO Files}).  @xref{msgfmt
1095 Invocation}, for more information about all modes of execution
1096 for the @code{msgfmt} program.
1097
1098 Finally, the modified and marked C sources are compiled and linked
1099 with the GNU @code{gettext} library, usually through the operation of
1100 @code{make}, given a suitable @file{Makefile} exists for the project,
1101 and the resulting executable is installed somewhere users will find it.
1102 The MO files themselves should also be properly installed.  Given the
1103 appropriate environment variables are set (@pxref{Setting the POSIX Locale}),
1104 the program should localize itself automatically, whenever it executes.
1105
1106 The remainder of this manual has the purpose of explaining in depth the various
1107 steps outlined above.
1108
1109 @node Users, PO Files, Introduction, Top
1110 @chapter The User's View
1111
1112 Nowadays, when users log into a computer, they usually find that all
1113 their programs show messages in their native language -- at least for
1114 users of languages with an active free software community, like French or
1115 German; to a lesser extent for languages with a smaller participation in
1116 free software and the GNU project, like Hindi and Filipino.
1117
1118 How does this work?  How can the user influence the language that is used
1119 by the programs?  This chapter will answer it.
1120
1121 @menu
1122 * System Installation::         Questions During Operating System Installation
1123 * Setting the GUI Locale::      How to Specify the Locale Used by GUI Programs
1124 * Setting the POSIX Locale::    How to Specify the Locale According to POSIX
1125 * Installing Localizations::    How to Install Additional Translations
1126 @end menu
1127
1128 @node System Installation, Setting the GUI Locale, Users, Users
1129 @section Operating System Installation
1130
1131 The default language is often already specified during operating system
1132 installation.  When the operating system is installed, the installer
1133 typically asks for the language used for the installation process and,
1134 separately, for the language to use in the installed system.  Some OS
1135 installers only ask for the language once.
1136
1137 This determines the system-wide default language for all users.  But the
1138 installers often give the possibility to install extra localizations for
1139 additional languages.  For example, the localizations of KDE (the K
1140 Desktop Environment) and OpenOffice.org are often bundled separately,
1141 as one installable package per language.
1142
1143 At this point it is good to consider the intended use of the machine: If
1144 it is a machine designated for personal use, additional localizations are
1145 probably not necessary.  If, however, the machine is in use in an
1146 organization or company that has international relationships, one can
1147 consider the needs of guest users.  If you have a guest from abroad, for
1148 a week, what could be his preferred locales?  It may be worth installing
1149 these additional localizations ahead of time, since they cost only a bit
1150 of disk space at this point.
1151
1152 The system-wide default language is the locale configuration that is used
1153 when a new user account is created.  But the user can have his own locale
1154 configuration that is different from the one of the other users of the
1155 same machine.  He can specify it, typically after the first login, as
1156 described in the next section.
1157
1158 @node Setting the GUI Locale, Setting the POSIX Locale, System Installation, Users
1159 @section Setting the Locale Used by GUI Programs
1160
1161 The immediately available programs in a user's desktop come from a group
1162 of programs called a ``desktop environment''; it usually includes the window
1163 manager, a web browser, a text editor, and more.  The most common free
1164 desktop environments are KDE, GNOME, and Xfce.
1165
1166 The locale used by GUI programs of the desktop environment can be specified
1167 in a configuration screen called ``control center'', ``language settings''
1168 or ``country settings''.
1169
1170 Individual GUI programs that are not part of the desktop environment can
1171 have their locale specified either in a settings panel, or through environment
1172 variables.
1173
1174 For some programs, it is possible to specify the locale through environment
1175 variables, possibly even to a different locale than the desktop's locale.
1176 This means, instead of starting a program through a menu or from the file
1177 system, you can start it from the command-line, after having set some
1178 environment variables.  The environment variables can be those specified
1179 in the next section (@ref{Setting the POSIX Locale}); for some versions of
1180 KDE, however, the locale is specified through a variable @code{KDE_LANG},
1181 rather than @code{LANG} or @code{LC_ALL}.
1182
1183 @node Setting the POSIX Locale, Installing Localizations, Setting the GUI Locale, Users
1184 @section Setting the Locale through Environment Variables
1185
1186 As a user, if your language has been installed for this package, in the
1187 simplest case, you only have to set the @code{LANG} environment variable
1188 to the appropriate @samp{@var{ll}_@var{CC}} combination.  For example,
1189 let's suppose that you speak German and live in Germany.  At the shell
1190 prompt, merely execute
1191 @w{@samp{setenv LANG de_DE}} (in @code{csh}),
1192 @w{@samp{export LANG; LANG=de_DE}} (in @code{sh}) or
1193 @w{@samp{export LANG=de_DE}} (in @code{bash}).  This can be done from your
1194 @file{.login} or @file{.profile} file, once and for all.
1195
1196 @menu
1197 * Locale Names::                How a Locale Specification Looks Like
1198 * Locale Environment Variables:: Which Environment Variable Specfies What
1199 * The LANGUAGE variable::       How to Specify a Priority List of Languages
1200 @end menu
1201
1202 @node Locale Names, Locale Environment Variables, Setting the POSIX Locale, Setting the POSIX Locale
1203 @subsection Locale Names
1204
1205 A locale name usually has the form @samp{@var{ll}_@var{CC}}.  Here
1206 @samp{@var{ll}} is an @w{ISO 639} two-letter language code, and
1207 @samp{@var{CC}} is an @w{ISO 3166} two-letter country code.  For example,
1208 for German in Germany, @var{ll} is @code{de}, and @var{CC} is @code{DE}.
1209 You find a list of the language codes in appendix @ref{Language Codes} and
1210 a list of the country codes in appendix @ref{Country Codes}.
1211
1212 You might think that the country code specification is redundant.  But in
1213 fact, some languages have dialects in different countries.  For example,
1214 @samp{de_AT} is used for Austria, and @samp{pt_BR} for Brazil.  The country
1215 code serves to distinguish the dialects.
1216
1217 Many locale names have an extended syntax
1218 @samp{@var{ll}_@var{CC}.@var{encoding}} that also specifies the character
1219 encoding.  These are in use because between 2000 and 2005, most users have
1220 switched to locales in UTF-8 encoding.  For example, the German locale on
1221 glibc systems is nowadays @samp{de_DE.UTF-8}.  The older name @samp{de_DE}
1222 still refers to the German locale as of 2000 that stores characters in
1223 ISO-8859-1 encoding -- a text encoding that cannot even accommodate the Euro
1224 currency sign.
1225
1226 Some locale names use @samp{@var{ll}_@var{CC}.@@@var{variant}} instead of
1227 @samp{@var{ll}_@var{CC}}.  The @samp{@@@var{variant}} can denote any kind of
1228 characteristics that is not already implied by the language @var{ll} and
1229 the country @var{CC}.  It can denote a particular monetary unit.  For example,
1230 on glibc systems, @samp{de_DE@@euro} denotes the locale that uses the Euro
1231 currency, in contrast to the older locale @samp{de_DE} which implies the use
1232 of the currency before 2002.  It can also denote a dialect of the language,
1233 or the script used to write text (for example, @samp{sr_RS@@latin} uses the
1234 Latin script, whereas @samp{sr_RS} uses the Cyrillic script to write Serbian),
1235 or the orthography rules, or similar.
1236
1237 On other systems, some variations of this scheme are used, such as
1238 @samp{@var{ll}}.  You can get the list of locales supported by your system
1239 for your language by running the command @samp{locale -a | grep '^@var{ll}'}.
1240
1241 There is also a special locale, called @samp{C}.
1242 @c Don't mention that this locale also has the name "POSIX". When we talk about
1243 @c the "POSIX locale", we mean the "locale as specified in the POSIX way", and
1244 @c mentioning a locale called "POSIX" would bring total confusion.
1245 When it is used, it disables all localization: in this locale, all programs
1246 standardized by POSIX use English messages and an unspecified character
1247 encoding (often US-ASCII, but sometimes also ISO-8859-1 or UTF-8, depending on
1248 the operating system).
1249
1250 @node Locale Environment Variables, The LANGUAGE variable, Locale Names, Setting the POSIX Locale
1251 @subsection Locale Environment Variables
1252 @cindex setting up @code{gettext} at run time
1253 @cindex selecting message language
1254 @cindex language selection
1255
1256 A locale is composed of several @emph{locale categories}, see @ref{Aspects}.
1257 When a program looks up locale dependent values, it does this according to
1258 the following environment variables, in priority order:
1259
1260 @enumerate
1261 @vindex LANGUAGE@r{, environment variable}
1262 @item @code{LANGUAGE}
1263 @vindex LC_ALL@r{, environment variable}
1264 @item @code{LC_ALL}
1265 @vindex LC_CTYPE@r{, environment variable}
1266 @vindex LC_NUMERIC@r{, environment variable}
1267 @vindex LC_TIME@r{, environment variable}
1268 @vindex LC_COLLATE@r{, environment variable}
1269 @vindex LC_MONETARY@r{, environment variable}
1270 @vindex LC_MESSAGES@r{, environment variable}
1271 @item @code{LC_xxx}, according to selected locale category:
1272 @code{LC_CTYPE}, @code{LC_NUMERIC}, @code{LC_TIME}, @code{LC_COLLATE},
1273 @code{LC_MONETARY}, @code{LC_MESSAGES}, ...
1274 @vindex LANG@r{, environment variable}
1275 @item @code{LANG}
1276 @end enumerate
1277
1278 Variables whose value is set but is empty are ignored in this lookup.
1279
1280 @code{LANG} is the normal environment variable for specifying a locale.
1281 As a user, you normally set this variable (unless some of the other variables
1282 have already been set by the system, in @file{/etc/profile} or similar
1283 initialization files).
1284
1285 @code{LC_CTYPE}, @code{LC_NUMERIC}, @code{LC_TIME}, @code{LC_COLLATE},
1286 @code{LC_MONETARY}, @code{LC_MESSAGES}, and so on, are the environment
1287 variables meant to override @code{LANG} and affecting a single locale
1288 category only.  For example, assume you are a Swedish user in Spain, and you
1289 want your programs to handle numbers and dates according to Spanish
1290 conventions, and only the messages should be in Swedish.  Then you could
1291 create a locale named @samp{sv_ES} or @samp{sv_ES.UTF-8} by use of the
1292 @code{localedef} program.  But it is simpler, and achieves the same effect,
1293 to set the @code{LANG} variable to @code{es_ES.UTF-8} and the
1294 @code{LC_MESSAGES} variable to @code{sv_SE.UTF-8}; these two locales come
1295 already preinstalled with the operating system.
1296
1297 @code{LC_ALL} is an environment variable that overrides all of these.
1298 It is typically used in scripts that run particular programs.  For example,
1299 @code{configure} scripts generated by GNU autoconf use @code{LC_ALL} to make
1300 sure that the configuration tests don't operate in locale dependent ways.
1301
1302 Some systems, unfortunately, set @code{LC_ALL} in @file{/etc/profile} or in
1303 similar initialization files.  As a user, you therefore have to unset this
1304 variable if you want to set @code{LANG} and optionally some of the other
1305 @code{LC_xxx} variables.
1306
1307 The @code{LANGUAGE} variable is described in the next subsection.
1308
1309 @node The LANGUAGE variable,  , Locale Environment Variables, Setting the POSIX Locale
1310 @subsection Specifying a Priority List of Languages
1311
1312 Not all programs have translations for all languages.  By default, an
1313 English message is shown in place of a nonexistent translation.  If you
1314 understand other languages, you can set up a priority list of languages.
1315 This is done through a different environment variable, called
1316 @code{LANGUAGE}.  GNU @code{gettext} gives preference to @code{LANGUAGE}
1317 over @code{LC_ALL} and @code{LANG} for the purpose of message handling,
1318 but you still need to have @code{LANG} (or @code{LC_ALL}) set to the primary
1319 language; this is required by other parts of the system libraries.
1320 For example, some Swedish users who would rather read translations in
1321 German than English for when Swedish is not available, set @code{LANGUAGE}
1322 to @samp{sv:de} while leaving @code{LANG} to @samp{sv_SE}.
1323
1324 Special advice for Norwegian users: The language code for Norwegian
1325 bokm@ringaccent{a}l changed from @samp{no} to @samp{nb} recently (in 2003).
1326 During the transition period, while some message catalogs for this language
1327 are installed under @samp{nb} and some older ones under @samp{no}, it is
1328 recommended for Norwegian users to set @code{LANGUAGE} to @samp{nb:no} so that
1329 both newer and older translations are used.
1330
1331 In the @code{LANGUAGE} environment variable, but not in the other
1332 environment variables, @samp{@var{ll}_@var{CC}} combinations can be
1333 abbreviated as @samp{@var{ll}} to denote the language's main dialect.
1334 For example, @samp{de} is equivalent to @samp{de_DE} (German as spoken in
1335 Germany), and @samp{pt} to @samp{pt_PT} (Portuguese as spoken in Portugal)
1336 in this context.
1337
1338 Note: The variable @code{LANGUAGE} is ignored if the locale is set to
1339 @samp{C}.  In other words, you have to first enable localization, by setting
1340 @code{LANG} (or @code{LC_ALL}) to a value other than @samp{C}, before you can
1341 use a language priority list through the @code{LANGUAGE} variable.
1342
1343 @node Installing Localizations,  , Setting the POSIX Locale, Users
1344 @section Installing Translations for Particular Programs
1345 @cindex Translation Matrix
1346 @cindex available translations
1347
1348 Languages are not equally well supported in all packages using GNU
1349 @code{gettext}, and more translations are added over time.  Usually, you
1350 use the translations that are shipped with the operating system
1351 or with particular packages that you install afterwards.  But you can also
1352 install newer localizations directly.  For doing this, you will need an
1353 understanding where each localization file is stored on the file system.
1354
1355 @cindex @file{ABOUT-NLS} file
1356 For programs that participate in the Translation Project, you can start
1357 looking for translations here:
1358 @url{http://translationproject.org/team/index.html}.
1359 A snapshot of this information is also found in the @file{ABOUT-NLS} file
1360 that is shipped with GNU gettext.
1361
1362 For programs that are part of the KDE project, the starting point is:
1363 @url{http://i18n.kde.org/}.
1364
1365 For programs that are part of the GNOME project, the starting point is:
1366 @url{http://www.gnome.org/i18n/}.
1367
1368 For other programs, you may check whether the program's source code package
1369 contains some @file{@var{ll}.po} files; often they are kept together in a
1370 directory called @file{po/}.  Each @file{@var{ll}.po} file contains the
1371 message translations for the language whose abbreviation of @var{ll}.
1372
1373 @node PO Files, Sources, Users, Top
1374 @chapter The Format of PO Files
1375 @cindex PO files' format
1376 @cindex file format, @file{.po}
1377
1378 The GNU @code{gettext} toolset helps programmers and translators
1379 at producing, updating and using translation files, mainly those
1380 PO files which are textual, editable files.  This chapter explains
1381 the format of PO files.
1382
1383 A PO file is made up of many entries, each entry holding the relation
1384 between an original untranslated string and its corresponding
1385 translation.  All entries in a given PO file usually pertain
1386 to a single project, and all translations are expressed in a single
1387 target language.  One PO file @dfn{entry} has the following schematic
1388 structure:
1389
1390 @example
1391 @var{white-space}
1392 #  @var{translator-comments}
1393 #. @var{extracted-comments}
1394 #: @var{reference}@dots{}
1395 #, @var{flag}@dots{}
1396 #| msgid @var{previous-untranslated-string}
1397 msgid @var{untranslated-string}
1398 msgstr @var{translated-string}
1399 @end example
1400
1401 The general structure of a PO file should be well understood by
1402 the translator.  When using PO mode, very little has to be known
1403 about the format details, as PO mode takes care of them for her.
1404
1405 A simple entry can look like this:
1406
1407 @example
1408 #: lib/error.c:116
1409 msgid "Unknown system error"
1410 msgstr "Error desconegut del sistema"
1411 @end example
1412
1413 @cindex comments, translator
1414 @cindex comments, automatic
1415 @cindex comments, extracted
1416 Entries begin with some optional white space.  Usually, when generated
1417 through GNU @code{gettext} tools, there is exactly one blank line
1418 between entries.  Then comments follow, on lines all starting with the
1419 character @code{#}.  There are two kinds of comments: those which have
1420 some white space immediately following the @code{#} - the @var{translator
1421 comments} -, which comments are created and maintained exclusively by the
1422 translator, and those which have some non-white character just after the
1423 @code{#} - the @var{automatic comments} -, which comments are created and
1424 maintained automatically by GNU @code{gettext} tools.  Comment lines
1425 starting with @code{#.} contain comments given by the programmer, directed
1426 at the translator; these comments are called @var{extracted comments}
1427 because the @code{xgettext} program extracts them from the program's
1428 source code.  Comment lines starting with @code{#:} contain references to
1429 the program's source code.  Comment lines starting with @code{#,} contain
1430 flags; more about these below.  Comment lines starting with @code{#|}
1431 contain the previous untranslated string for which the translator gave
1432 a translation.
1433
1434 All comments, of either kind, are optional.
1435
1436 @kwindex msgid
1437 @kwindex msgstr
1438 After white space and comments, entries show two strings, namely
1439 first the untranslated string as it appears in the original program
1440 sources, and then, the translation of this string.  The original
1441 string is introduced by the keyword @code{msgid}, and the translation,
1442 by @code{msgstr}.  The two strings, untranslated and translated,
1443 are quoted in various ways in the PO file, using @code{"}
1444 delimiters and @code{\} escapes, but the translator does not really
1445 have to pay attention to the precise quoting format, as PO mode fully
1446 takes care of quoting for her.
1447
1448 The @code{msgid} strings, as well as automatic comments, are produced
1449 and managed by other GNU @code{gettext} tools, and PO mode does not
1450 provide means for the translator to alter these.  The most she can
1451 do is merely deleting them, and only by deleting the whole entry.
1452 On the other hand, the @code{msgstr} string, as well as translator
1453 comments, are really meant for the translator, and PO mode gives her
1454 the full control she needs.
1455
1456 The comment lines beginning with @code{#,} are special because they are
1457 not completely ignored by the programs as comments generally are.  The
1458 comma separated list of @var{flag}s is used by the @code{msgfmt}
1459 program to give the user some better diagnostic messages.  Currently
1460 there are two forms of flags defined:
1461
1462 @table @code
1463 @item fuzzy
1464 @kwindex fuzzy@r{ flag}
1465 This flag can be generated by the @code{msgmerge} program or it can be
1466 inserted by the translator herself.  It shows that the @code{msgstr}
1467 string might not be a correct translation (anymore).  Only the translator
1468 can judge if the translation requires further modification, or is
1469 acceptable as is.  Once satisfied with the translation, she then removes
1470 this @code{fuzzy} attribute.  The @code{msgmerge} program inserts this
1471 when it combined the @code{msgid} and @code{msgstr} entries after fuzzy
1472 search only.  @xref{Fuzzy Entries}.
1473
1474 @item c-format
1475 @kwindex c-format@r{ flag}
1476 @itemx no-c-format
1477 @kwindex no-c-format@r{ flag}
1478 These flags should not be added by a human.  Instead only the
1479 @code{xgettext} program adds them.  In an automated PO file processing
1480 system as proposed here, the user's changes would be thrown away again as
1481 soon as the @code{xgettext} program generates a new template file.
1482
1483 The @code{c-format} flag indicates that the untranslated string and the
1484 translation are supposed to be C format strings.  The @code{no-c-format}
1485 flag indicates that they are not C format strings, even though the untranslated
1486 string happens to look like a C format string (with @samp{%} directives).
1487
1488 When the @code{c-format} flag is given for a string the @code{msgfmt}
1489 program does some more tests to check the validity of the translation.
1490 @xref{msgfmt Invocation}, @ref{c-format Flag} and @ref{c-format}.
1491
1492 @item objc-format
1493 @kwindex objc-format@r{ flag}
1494 @itemx no-objc-format
1495 @kwindex no-objc-format@r{ flag}
1496 Likewise for Objective C, see @ref{objc-format}.
1497
1498 @item sh-format
1499 @kwindex sh-format@r{ flag}
1500 @itemx no-sh-format
1501 @kwindex no-sh-format@r{ flag}
1502 Likewise for Shell, see @ref{sh-format}.
1503
1504 @item python-format
1505 @kwindex python-format@r{ flag}
1506 @itemx no-python-format
1507 @kwindex no-python-format@r{ flag}
1508 Likewise for Python, see @ref{python-format}.
1509
1510 @item python-brace-format
1511 @kwindex python-brace-format@r{ flag}
1512 @itemx no-python-brace-format
1513 @kwindex no-python-brace-format@r{ flag}
1514 Likewise for Python brace, see @ref{python-format}.
1515
1516 @item lisp-format
1517 @kwindex lisp-format@r{ flag}
1518 @itemx no-lisp-format
1519 @kwindex no-lisp-format@r{ flag}
1520 Likewise for Lisp, see @ref{lisp-format}.
1521
1522 @item elisp-format
1523 @kwindex elisp-format@r{ flag}
1524 @itemx no-elisp-format
1525 @kwindex no-elisp-format@r{ flag}
1526 Likewise for Emacs Lisp, see @ref{elisp-format}.
1527
1528 @item librep-format
1529 @kwindex librep-format@r{ flag}
1530 @itemx no-librep-format
1531 @kwindex no-librep-format@r{ flag}
1532 Likewise for librep, see @ref{librep-format}.
1533
1534 @item scheme-format
1535 @kwindex scheme-format@r{ flag}
1536 @itemx no-scheme-format
1537 @kwindex no-scheme-format@r{ flag}
1538 Likewise for Scheme, see @ref{scheme-format}.
1539
1540 @item smalltalk-format
1541 @kwindex smalltalk-format@r{ flag}
1542 @itemx no-smalltalk-format
1543 @kwindex no-smalltalk-format@r{ flag}
1544 Likewise for Smalltalk, see @ref{smalltalk-format}.
1545
1546 @item java-format
1547 @kwindex java-format@r{ flag}
1548 @itemx no-java-format
1549 @kwindex no-java-format@r{ flag}
1550 Likewise for Java, see @ref{java-format}.
1551
1552 @item csharp-format
1553 @kwindex csharp-format@r{ flag}
1554 @itemx no-csharp-format
1555 @kwindex no-csharp-format@r{ flag}
1556 Likewise for C#, see @ref{csharp-format}.
1557
1558 @item awk-format
1559 @kwindex awk-format@r{ flag}
1560 @itemx no-awk-format
1561 @kwindex no-awk-format@r{ flag}
1562 Likewise for awk, see @ref{awk-format}.
1563
1564 @item object-pascal-format
1565 @kwindex object-pascal-format@r{ flag}
1566 @itemx no-object-pascal-format
1567 @kwindex no-object-pascal-format@r{ flag}
1568 Likewise for Object Pascal, see @ref{object-pascal-format}.
1569
1570 @item ycp-format
1571 @kwindex ycp-format@r{ flag}
1572 @itemx no-ycp-format
1573 @kwindex no-ycp-format@r{ flag}
1574 Likewise for YCP, see @ref{ycp-format}.
1575
1576 @item tcl-format
1577 @kwindex tcl-format@r{ flag}
1578 @itemx no-tcl-format
1579 @kwindex no-tcl-format@r{ flag}
1580 Likewise for Tcl, see @ref{tcl-format}.
1581
1582 @item perl-format
1583 @kwindex perl-format@r{ flag}
1584 @itemx no-perl-format
1585 @kwindex no-perl-format@r{ flag}
1586 Likewise for Perl, see @ref{perl-format}.
1587
1588 @item perl-brace-format
1589 @kwindex perl-brace-format@r{ flag}
1590 @itemx no-perl-brace-format
1591 @kwindex no-perl-brace-format@r{ flag}
1592 Likewise for Perl brace, see @ref{perl-format}.
1593
1594 @item php-format
1595 @kwindex php-format@r{ flag}
1596 @itemx no-php-format
1597 @kwindex no-php-format@r{ flag}
1598 Likewise for PHP, see @ref{php-format}.
1599
1600 @item gcc-internal-format
1601 @kwindex gcc-internal-format@r{ flag}
1602 @itemx no-gcc-internal-format
1603 @kwindex no-gcc-internal-format@r{ flag}
1604 Likewise for the GCC sources, see @ref{gcc-internal-format}.
1605
1606 @item gfc-internal-format
1607 @kwindex gfc-internal-format@r{ flag}
1608 @itemx no-gfc-internal-format
1609 @kwindex no-gfc-internal-format@r{ flag}
1610 Likewise for the GNU Fortran Compiler sources, see @ref{gfc-internal-format}.
1611
1612 @item qt-format
1613 @kwindex qt-format@r{ flag}
1614 @itemx no-qt-format
1615 @kwindex no-qt-format@r{ flag}
1616 Likewise for Qt, see @ref{qt-format}.
1617
1618 @item qt-plural-format
1619 @kwindex qt-plural-format@r{ flag}
1620 @itemx no-qt-plural-format
1621 @kwindex no-qt-plural-format@r{ flag}
1622 Likewise for Qt plural forms, see @ref{qt-plural-format}.
1623
1624 @item kde-format
1625 @kwindex kde-format@r{ flag}
1626 @itemx no-kde-format
1627 @kwindex no-kde-format@r{ flag}
1628 Likewise for KDE, see @ref{kde-format}.
1629
1630 @item boost-format
1631 @kwindex boost-format@r{ flag}
1632 @itemx no-boost-format
1633 @kwindex no-boost-format@r{ flag}
1634 Likewise for Boost, see @ref{boost-format}.
1635
1636 @item lua-format
1637 @kwindex lua-format@r{ flag}
1638 @itemx no-lua-format
1639 @kwindex no-lua-format@r{ flag}
1640 Likewise for Lua, see @ref{lua-format}.
1641
1642 @item javascript-format
1643 @kwindex javascript-format@r{ flag}
1644 @itemx no-javascript-format
1645 @kwindex no-javascript-format@r{ flag}
1646 Likewise for JavaScript, see @ref{javascript-format}.
1647
1648 @end table
1649
1650 @kwindex msgctxt
1651 @cindex context, in PO files
1652 It is also possible to have entries with a context specifier. They look like
1653 this:
1654
1655 @example
1656 @var{white-space}
1657 #  @var{translator-comments}
1658 #. @var{extracted-comments}
1659 #: @var{reference}@dots{}
1660 #, @var{flag}@dots{}
1661 #| msgctxt @var{previous-context}
1662 #| msgid @var{previous-untranslated-string}
1663 msgctxt @var{context}
1664 msgid @var{untranslated-string}
1665 msgstr @var{translated-string}
1666 @end example
1667
1668 The context serves to disambiguate messages with the same
1669 @var{untranslated-string}.  It is possible to have several entries with
1670 the same @var{untranslated-string} in a PO file, provided that they each
1671 have a different @var{context}.  Note that an empty @var{context} string
1672 and an absent @code{msgctxt} line do not mean the same thing.
1673
1674 @kwindex msgid_plural
1675 @cindex plural forms, in PO files
1676 A different kind of entries is used for translations which involve
1677 plural forms.
1678
1679 @example
1680 @var{white-space}
1681 #  @var{translator-comments}
1682 #. @var{extracted-comments}
1683 #: @var{reference}@dots{}
1684 #, @var{flag}@dots{}
1685 #| msgid @var{previous-untranslated-string-singular}
1686 #| msgid_plural @var{previous-untranslated-string-plural}
1687 msgid @var{untranslated-string-singular}
1688 msgid_plural @var{untranslated-string-plural}
1689 msgstr[0] @var{translated-string-case-0}
1690 ...
1691 msgstr[N] @var{translated-string-case-n}
1692 @end example
1693
1694 Such an entry can look like this:
1695
1696 @example
1697 #: src/msgcmp.c:338 src/po-lex.c:699
1698 #, c-format
1699 msgid "found %d fatal error"
1700 msgid_plural "found %d fatal errors"
1701 msgstr[0] "s'ha trobat %d error fatal"
1702 msgstr[1] "s'han trobat %d errors fatals"
1703 @end example
1704
1705 Here also, a @code{msgctxt} context can be specified before @code{msgid},
1706 like above.
1707
1708 Here, additional kinds of flags can be used:
1709
1710 @table @code
1711 @item range:
1712 @kwindex range:@r{ flag}
1713 This flag is followed by a range of non-negative numbers, using the syntax
1714 @code{range: @var{minimum-value}..@var{maximum-value}}.  It designates the
1715 possible values that the numeric parameter of the message can take.  In some
1716 languages, translators may produce slightly better translations if they know
1717 that the value can only take on values between 0 and 10, for example.
1718 @end table
1719
1720 The @var{previous-untranslated-string} is optionally inserted by the
1721 @code{msgmerge} program, at the same time when it marks a message fuzzy.
1722 It helps the translator to see which changes were done by the developers
1723 on the @var{untranslated-string}.
1724
1725 It happens that some lines, usually whitespace or comments, follow the
1726 very last entry of a PO file.  Such lines are not part of any entry,
1727 and will be dropped when the PO file is processed by the tools, or may
1728 disturb some PO file editors.
1729
1730 The remainder of this section may be safely skipped by those using
1731 a PO file editor, yet it may be interesting for everybody to have a better
1732 idea of the precise format of a PO file.  On the other hand, those
1733 wishing to modify PO files by hand should carefully continue reading on.
1734
1735 An empty @var{untranslated-string} is reserved to contain the header
1736 entry with the meta information (@pxref{Header Entry}).  This header
1737 entry should be the first entry of the file.  The empty
1738 @var{untranslated-string} is reserved for this purpose and must
1739 not be used anywhere else.
1740
1741 Each of @var{untranslated-string} and @var{translated-string} respects
1742 the C syntax for a character string, including the surrounding quotes
1743 and embedded backslashed escape sequences.  When the time comes
1744 to write multi-line strings, one should not use escaped newlines.
1745 Instead, a closing quote should follow the last character on the
1746 line to be continued, and an opening quote should resume the string
1747 at the beginning of the following PO file line.  For example:
1748
1749 @example
1750 msgid ""
1751 "Here is an example of how one might continue a very long string\n"
1752 "for the common case the string represents multi-line output.\n"
1753 @end example
1754
1755 @noindent
1756 In this example, the empty string is used on the first line, to
1757 allow better alignment of the @code{H} from the word @samp{Here}
1758 over the @code{f} from the word @samp{for}.  In this example, the
1759 @code{msgid} keyword is followed by three strings, which are meant
1760 to be concatenated.  Concatenating the empty string does not change
1761 the resulting overall string, but it is a way for us to comply with
1762 the necessity of @code{msgid} to be followed by a string on the same
1763 line, while keeping the multi-line presentation left-justified, as
1764 we find this to be a cleaner disposition.  The empty string could have
1765 been omitted, but only if the string starting with @samp{Here} was
1766 promoted on the first line, right after @code{msgid}.@footnote{This
1767 limitation is not imposed by GNU @code{gettext}, but is for compatibility
1768 with the @code{msgfmt} implementation on Solaris.} It was not really necessary
1769 either to switch between the two last quoted strings immediately after
1770 the newline @samp{\n}, the switch could have occurred after @emph{any}
1771 other character, we just did it this way because it is neater.
1772
1773 @cindex newlines in PO files
1774 One should carefully distinguish between end of lines marked as
1775 @samp{\n} @emph{inside} quotes, which are part of the represented
1776 string, and end of lines in the PO file itself, outside string quotes,
1777 which have no incidence on the represented string.
1778
1779 @cindex comments in PO files
1780 Outside strings, white lines and comments may be used freely.
1781 Comments start at the beginning of a line with @samp{#} and extend
1782 until the end of the PO file line.  Comments written by translators
1783 should have the initial @samp{#} immediately followed by some white
1784 space.  If the @samp{#} is not immediately followed by white space,
1785 this comment is most likely generated and managed by specialized GNU
1786 tools, and might disappear or be replaced unexpectedly when the PO
1787 file is given to @code{msgmerge}.
1788
1789 @node Sources, Template, PO Files, Top
1790 @chapter Preparing Program Sources
1791 @cindex preparing programs for translation
1792
1793 @c FIXME: Rewrite (the whole chapter).
1794
1795 For the programmer, changes to the C source code fall into three
1796 categories.  First, you have to make the localization functions
1797 known to all modules needing message translation.  Second, you should
1798 properly trigger the operation of GNU @code{gettext} when the program
1799 initializes, usually from the @code{main} function.  Last, you should
1800 identify, adjust and mark all constant strings in your program
1801 needing translation.
1802
1803 @menu
1804 * Importing::                   Importing the @code{gettext} declaration
1805 * Triggering::                  Triggering @code{gettext} Operations
1806 * Preparing Strings::           Preparing Translatable Strings
1807 * Mark Keywords::               How Marks Appear in Sources
1808 * Marking::                     Marking Translatable Strings
1809 * c-format Flag::               Telling something about the following string
1810 * Special cases::               Special Cases of Translatable Strings
1811 * Bug Report Address::          Letting Users Report Translation Bugs
1812 * Names::                       Marking Proper Names for Translation
1813 * Libraries::                   Preparing Library Sources
1814 @end menu
1815
1816 @node Importing, Triggering, Sources, Sources
1817 @section Importing the @code{gettext} declaration
1818
1819 Presuming that your set of programs, or package, has been adjusted
1820 so all needed GNU @code{gettext} files are available, and your
1821 @file{Makefile} files are adjusted (@pxref{Maintainers}), each C module
1822 having translated C strings should contain the line:
1823
1824 @cindex include file @file{libintl.h}
1825 @example
1826 #include <libintl.h>
1827 @end example
1828
1829 Similarly, each C module containing @code{printf()}/@code{fprintf()}/...
1830 calls with a format string that could be a translated C string (even if
1831 the C string comes from a different C module) should contain the line:
1832
1833 @example
1834 #include <libintl.h>
1835 @end example
1836
1837 @node Triggering, Preparing Strings, Importing, Sources
1838 @section Triggering @code{gettext} Operations
1839
1840 @cindex initialization
1841 The initialization of locale data should be done with more or less
1842 the same code in every program, as demonstrated below:
1843
1844 @example
1845 @group
1846 int
1847 main (int argc, char *argv[])
1848 @{
1849   @dots{}
1850   setlocale (LC_ALL, "");
1851   bindtextdomain (PACKAGE, LOCALEDIR);
1852   textdomain (PACKAGE);
1853   @dots{}
1854 @}
1855 @end group
1856 @end example
1857
1858 @var{PACKAGE} and @var{LOCALEDIR} should be provided either by
1859 @file{config.h} or by the Makefile.  For now consult the @code{gettext}
1860 or @code{hello} sources for more information.
1861
1862 @cindex locale category, LC_ALL
1863 @cindex locale category, LC_CTYPE
1864 The use of @code{LC_ALL} might not be appropriate for you.
1865 @code{LC_ALL} includes all locale categories and especially
1866 @code{LC_CTYPE}.  This latter category is responsible for determining
1867 character classes with the @code{isalnum} etc. functions from
1868 @file{ctype.h} which could especially for programs, which process some
1869 kind of input language, be wrong.  For example this would mean that a
1870 source code using the @,{c} (c-cedilla character) is runnable in
1871 France but not in the U.S.
1872
1873 Some systems also have problems with parsing numbers using the
1874 @code{scanf} functions if an other but the @code{LC_ALL} locale category is
1875 used.  The standards say that additional formats but the one known in the
1876 @code{"C"} locale might be recognized.  But some systems seem to reject
1877 numbers in the @code{"C"} locale format.  In some situation, it might
1878 also be a problem with the notation itself which makes it impossible to
1879 recognize whether the number is in the @code{"C"} locale or the local
1880 format.  This can happen if thousands separator characters are used.
1881 Some locales define this character according to the national
1882 conventions to @code{'.'} which is the same character used in the
1883 @code{"C"} locale to denote the decimal point.
1884
1885 So it is sometimes necessary to replace the @code{LC_ALL} line in the
1886 code above by a sequence of @code{setlocale} lines
1887
1888 @example
1889 @group
1890 @{
1891   @dots{}
1892   setlocale (LC_CTYPE, "");
1893   setlocale (LC_MESSAGES, "");
1894   @dots{}
1895 @}
1896 @end group
1897 @end example
1898
1899 @cindex locale category, LC_CTYPE
1900 @cindex locale category, LC_COLLATE
1901 @cindex locale category, LC_MONETARY
1902 @cindex locale category, LC_NUMERIC
1903 @cindex locale category, LC_TIME
1904 @cindex locale category, LC_MESSAGES
1905 @cindex locale category, LC_RESPONSES
1906 @noindent
1907 On all POSIX conformant systems the locale categories @code{LC_CTYPE},
1908 @code{LC_MESSAGES}, @code{LC_COLLATE}, @code{LC_MONETARY},
1909 @code{LC_NUMERIC}, and @code{LC_TIME} are available.  On some systems
1910 which are only ISO C compliant, @code{LC_MESSAGES} is missing, but
1911 a substitute for it is defined in GNU gettext's @code{<libintl.h>} and
1912 in GNU gnulib's @code{<locale.h>}.
1913
1914 Note that changing the @code{LC_CTYPE} also affects the functions
1915 declared in the @code{<ctype.h>} standard header and some functions
1916 declared in the @code{<string.h>} and @code{<stdlib.h>} standard headers.
1917 If this is not
1918 desirable in your application (for example in a compiler's parser),
1919 you can use a set of substitute functions which hardwire the C locale,
1920 such as found in the modules @samp{c-ctype}, @samp{c-strcase},
1921 @samp{c-strcasestr}, @samp{c-strtod}, @samp{c-strtold} in the GNU gnulib
1922 source distribution.
1923
1924 It is also possible to switch the locale forth and back between the
1925 environment dependent locale and the C locale, but this approach is
1926 normally avoided because a @code{setlocale} call is expensive,
1927 because it is tedious to determine the places where a locale switch
1928 is needed in a large program's source, and because switching a locale
1929 is not multithread-safe.
1930
1931 @node Preparing Strings, Mark Keywords, Triggering, Sources
1932 @section Preparing Translatable Strings
1933
1934 @cindex marking strings, preparations
1935 Before strings can be marked for translations, they sometimes need to
1936 be adjusted.  Usually preparing a string for translation is done right
1937 before marking it, during the marking phase which is described in the
1938 next sections.  What you have to keep in mind while doing that is the
1939 following.
1940
1941 @itemize @bullet
1942 @item
1943 Decent English style.
1944
1945 @item
1946 Entire sentences.
1947
1948 @item
1949 Split at paragraphs.
1950
1951 @item
1952 Use format strings instead of string concatenation.
1953
1954 @item
1955 Avoid unusual markup and unusual control characters.
1956 @end itemize
1957
1958 @noindent
1959 Let's look at some examples of these guidelines.
1960
1961 @cindex style
1962 Translatable strings should be in good English style.  If slang language
1963 with abbreviations and shortcuts is used, often translators will not
1964 understand the message and will produce very inappropriate translations.
1965
1966 @example
1967 "%s: is parameter\n"
1968 @end example
1969
1970 @noindent
1971 This is nearly untranslatable: Is the displayed item @emph{a} parameter or
1972 @emph{the} parameter?
1973
1974 @example
1975 "No match"
1976 @end example
1977
1978 @noindent
1979 The ambiguity in this message makes it unintelligible: Is the program
1980 attempting to set something on fire? Does it mean "The given object does
1981 not match the template"? Does it mean "The template does not fit for any
1982 of the objects"?
1983
1984 @cindex ambiguities
1985 In both cases, adding more words to the message will help both the
1986 translator and the English speaking user.
1987
1988 @cindex sentences
1989 Translatable strings should be entire sentences.  It is often not possible
1990 to translate single verbs or adjectives in a substitutable way.
1991
1992 @example
1993 printf ("File %s is %s protected", filename, rw ? "write" : "read");
1994 @end example
1995
1996 @noindent
1997 Most translators will not look at the source and will thus only see the
1998 string @code{"File %s is %s protected"}, which is unintelligible.  Change
1999 this to
2000
2001 @example
2002 printf (rw ? "File %s is write protected" : "File %s is read protected",
2003         filename);
2004 @end example
2005
2006 @noindent
2007 This way the translator will not only understand the message, she will
2008 also be able to find the appropriate grammatical construction.  A French
2009 translator for example translates "write protected" like "protected
2010 against writing".
2011
2012 Entire sentences are also important because in many languages, the
2013 declination of some word in a sentence depends on the gender or the
2014 number (singular/plural) of another part of the sentence.  There are
2015 usually more interdependencies between words than in English.  The
2016 consequence is that asking a translator to translate two half-sentences
2017 and then combining these two half-sentences through dumb string concatenation
2018 will not work, for many languages, even though it would work for English.
2019 That's why translators need to handle entire sentences.
2020
2021 Often sentences don't fit into a single line.  If a sentence is output
2022 using two subsequent @code{printf} statements, like this
2023
2024 @example
2025 printf ("Locale charset \"%s\" is different from\n", lcharset);
2026 printf ("input file charset \"%s\".\n", fcharset);
2027 @end example
2028
2029 @noindent
2030 the translator would have to translate two half sentences, but nothing
2031 in the POT file would tell her that the two half sentences belong together.
2032 It is necessary to merge the two @code{printf} statements so that the
2033 translator can handle the entire sentence at once and decide at which
2034 place to insert a line break in the translation (if at all):
2035
2036 @example
2037 printf ("Locale charset \"%s\" is different from\n\
2038 input file charset \"%s\".\n", lcharset, fcharset);
2039 @end example
2040
2041 You may now ask: how about two or more adjacent sentences? Like in this case:
2042
2043 @example
2044 puts ("Apollo 13 scenario: Stack overflow handling failed.");
2045 puts ("On the next stack overflow we will crash!!!");
2046 @end example
2047
2048 @noindent
2049 Should these two statements merged into a single one? I would recommend to
2050 merge them if the two sentences are related to each other, because then it
2051 makes it easier for the translator to understand and translate both.  On
2052 the other hand, if one of the two messages is a stereotypic one, occurring
2053 in other places as well, you will do a favour to the translator by not
2054 merging the two.  (Identical messages occurring in several places are
2055 combined by xgettext, so the translator has to handle them once only.)
2056
2057 @cindex paragraphs
2058 Translatable strings should be limited to one paragraph; don't let a
2059 single message be longer than ten lines.  The reason is that when the
2060 translatable string changes, the translator is faced with the task of
2061 updating the entire translated string.  Maybe only a single word will
2062 have changed in the English string, but the translator doesn't see that
2063 (with the current translation tools), therefore she has to proofread
2064 the entire message.
2065
2066 @cindex help option
2067 Many GNU programs have a @samp{--help} output that extends over several
2068 screen pages.  It is a courtesy towards the translators to split such a
2069 message into several ones of five to ten lines each.  While doing that,
2070 you can also attempt to split the documented options into groups,
2071 such as the input options, the output options, and the informative
2072 output options.  This will help every user to find the option he is
2073 looking for.
2074
2075 @cindex string concatenation
2076 @cindex concatenation of strings
2077 Hardcoded string concatenation is sometimes used to construct English
2078 strings:
2079
2080 @example
2081 strcpy (s, "Replace ");
2082 strcat (s, object1);
2083 strcat (s, " with ");
2084 strcat (s, object2);
2085 strcat (s, "?");
2086 @end example
2087
2088 @noindent
2089 In order to present to the translator only entire sentences, and also
2090 because in some languages the translator might want to swap the order
2091 of @code{object1} and @code{object2}, it is necessary to change this
2092 to use a format string:
2093
2094 @example
2095 sprintf (s, "Replace %s with %s?", object1, object2);
2096 @end example
2097
2098 @cindex @code{inttypes.h}
2099 A similar case is compile time concatenation of strings.  The ISO C 99
2100 include file @code{<inttypes.h>} contains a macro @code{PRId64} that
2101 can be used as a formatting directive for outputting an @samp{int64_t}
2102 integer through @code{printf}.  It expands to a constant string, usually
2103 "d" or "ld" or "lld" or something like this, depending on the platform.
2104 Assume you have code like
2105
2106 @example
2107 printf ("The amount is %0" PRId64 "\n", number);
2108 @end example
2109
2110 @noindent
2111 The @code{gettext} tools and library have special support for these
2112 @code{<inttypes.h>} macros.  You can therefore simply write
2113
2114 @example
2115 printf (gettext ("The amount is %0" PRId64 "\n"), number);
2116 @end example
2117
2118 @noindent
2119 The PO file will contain the string "The amount is %0<PRId64>\n".
2120 The translators will provide a translation containing "%0<PRId64>"
2121 as well, and at runtime the @code{gettext} function's result will
2122 contain the appropriate constant string, "d" or "ld" or "lld".
2123
2124 This works only for the predefined @code{<inttypes.h>} macros.  If
2125 you have defined your own similar macros, let's say @samp{MYPRId64},
2126 that are not known to @code{xgettext}, the solution for this problem
2127 is to change the code like this:
2128
2129 @example
2130 char buf1[100];
2131 sprintf (buf1, "%0" MYPRId64, number);
2132 printf (gettext ("The amount is %s\n"), buf1);
2133 @end example
2134
2135 This means, you put the platform dependent code in one statement, and the
2136 internationalization code in a different statement.  Note that a buffer length
2137 of 100 is safe, because all available hardware integer types are limited to
2138 128 bits, and to print a 128 bit integer one needs at most 54 characters,
2139 regardless whether in decimal, octal or hexadecimal.
2140
2141 @cindex Java, string concatenation
2142 @cindex C#, string concatenation
2143 All this applies to other programming languages as well.  For example, in
2144 Java and C#, string concatenation is very frequently used, because it is a
2145 compiler built-in operator.  Like in C, in Java, you would change
2146
2147 @example
2148 System.out.println("Replace "+object1+" with "+object2+"?");
2149 @end example
2150
2151 @noindent
2152 into a statement involving a format string:
2153
2154 @example
2155 System.out.println(
2156     MessageFormat.format("Replace @{0@} with @{1@}?",
2157                          new Object[] @{ object1, object2 @}));
2158 @end example
2159
2160 @noindent
2161 Similarly, in C#, you would change
2162
2163 @example
2164 Console.WriteLine("Replace "+object1+" with "+object2+"?");
2165 @end example
2166
2167 @noindent
2168 into a statement involving a format string:
2169
2170 @example
2171 Console.WriteLine(
2172     String.Format("Replace @{0@} with @{1@}?", object1, object2));
2173 @end example
2174
2175 @cindex markup
2176 @cindex control characters
2177 Unusual markup or control characters should not be used in translatable
2178 strings.  Translators will likely not understand the particular meaning
2179 of the markup or control characters.
2180
2181 For example, if you have a convention that @samp{|} delimits the
2182 left-hand and right-hand part of some GUI elements, translators will
2183 often not understand it without specific comments.  It might be
2184 better to have the translator translate the left-hand and right-hand
2185 part separately.
2186
2187 Another example is the @samp{argp} convention to use a single @samp{\v}
2188 (vertical tab) control character to delimit two sections inside a
2189 string.  This is flawed.  Some translators may convert it to a simple
2190 newline, some to blank lines.  With some PO file editors it may not be
2191 easy to even enter a vertical tab control character.  So, you cannot
2192 be sure that the translation will contain a @samp{\v} character, at the
2193 corresponding position.  The solution is, again, to let the translator
2194 translate two separate strings and combine at run-time the two translated
2195 strings with the @samp{\v} required by the convention.
2196
2197 HTML markup, however, is common enough that it's probably ok to use in
2198 translatable strings.  But please bear in mind that the GNU gettext tools
2199 don't verify that the translations are well-formed HTML.
2200
2201 @node Mark Keywords, Marking, Preparing Strings, Sources
2202 @section How Marks Appear in Sources
2203 @cindex marking strings that require translation
2204
2205 All strings requiring translation should be marked in the C sources.  Marking
2206 is done in such a way that each translatable string appears to be
2207 the sole argument of some function or preprocessor macro.  There are
2208 only a few such possible functions or macros meant for translation,
2209 and their names are said to be marking keywords.  The marking is
2210 attached to strings themselves, rather than to what we do with them.
2211 This approach has more uses.  A blatant example is an error message
2212 produced by formatting.  The format string needs translation, as
2213 well as some strings inserted through some @samp{%s} specification
2214 in the format, while the result from @code{sprintf} may have so many
2215 different instances that it is impractical to list them all in some
2216 @samp{error_string_out()} routine, say.
2217
2218 This marking operation has two goals.  The first goal of marking
2219 is for triggering the retrieval of the translation, at run time.
2220 The keyword is possibly resolved into a routine able to dynamically
2221 return the proper translation, as far as possible or wanted, for the
2222 argument string.  Most localizable strings are found in executable
2223 positions, that is, attached to variables or given as parameters to
2224 functions.  But this is not universal usage, and some translatable
2225 strings appear in structured initializations.  @xref{Special cases}.
2226
2227 The second goal of the marking operation is to help @code{xgettext}
2228 at properly extracting all translatable strings when it scans a set
2229 of program sources and produces PO file templates.
2230
2231 The canonical keyword for marking translatable strings is
2232 @samp{gettext}, it gave its name to the whole GNU @code{gettext}
2233 package.  For packages making only light use of the @samp{gettext}
2234 keyword, macro or function, it is easily used @emph{as is}.  However,
2235 for packages using the @code{gettext} interface more heavily, it
2236 is usually more convenient to give the main keyword a shorter, less
2237 obtrusive name.  Indeed, the keyword might appear on a lot of strings
2238 all over the package, and programmers usually do not want nor need
2239 their program sources to remind them forcefully, all the time, that they
2240 are internationalized.  Further, a long keyword has the disadvantage
2241 of using more horizontal space, forcing more indentation work on
2242 sources for those trying to keep them within 79 or 80 columns.
2243
2244 @cindex @code{_}, a macro to mark strings for translation
2245 Many packages use @samp{_} (a simple underline) as a keyword,
2246 and write @samp{_("Translatable string")} instead of @samp{gettext
2247 ("Translatable string")}.  Further, the coding rule, from GNU standards,
2248 wanting that there is a space between the keyword and the opening
2249 parenthesis is relaxed, in practice, for this particular usage.
2250 So, the textual overhead per translatable string is reduced to
2251 only three characters: the underline and the two parentheses.
2252 However, even if GNU @code{gettext} uses this convention internally,
2253 it does not offer it officially.  The real, genuine keyword is truly
2254 @samp{gettext} indeed.  It is fairly easy for those wanting to use
2255 @samp{_} instead of @samp{gettext} to declare:
2256
2257 @example
2258 #include <libintl.h>
2259 #define _(String) gettext (String)
2260 @end example
2261
2262 @noindent
2263 instead of merely using @samp{#include <libintl.h>}.
2264
2265 The marking keywords @samp{gettext} and @samp{_} take the translatable
2266 string as sole argument.  It is also possible to define marking functions
2267 that take it at another argument position.  It is even possible to make
2268 the marked argument position depend on the total number of arguments of
2269 the function call; this is useful in C++.  All this is achieved using
2270 @code{xgettext}'s @samp{--keyword} option.  How to pass such an option
2271 to @code{xgettext}, assuming that @code{gettextize} is used, is described
2272 in @ref{po/Makevars} and @ref{AM_XGETTEXT_OPTION}.
2273
2274 Note also that long strings can be split across lines, into multiple
2275 adjacent string tokens.  Automatic string concatenation is performed
2276 at compile time according to ISO C and ISO C++; @code{xgettext} also
2277 supports this syntax.
2278
2279 Later on, the maintenance is relatively easy.  If, as a programmer,
2280 you add or modify a string, you will have to ask yourself if the
2281 new or altered string requires translation, and include it within
2282 @samp{_()} if you think it should be translated.  For example, @samp{"%s"}
2283 is an example of string @emph{not} requiring translation.  But
2284 @samp{"%s: %d"} @emph{does} require translation, because in French, unlike
2285 in English, it's customary to put a space before a colon.
2286
2287 @node Marking, c-format Flag, Mark Keywords, Sources
2288 @section Marking Translatable Strings
2289 @emindex marking strings for translation
2290
2291 In PO mode, one set of features is meant more for the programmer than
2292 for the translator, and allows him to interactively mark which strings,
2293 in a set of program sources, are translatable, and which are not.
2294 Even if it is a fairly easy job for a programmer to find and mark
2295 such strings by other means, using any editor of his choice, PO mode
2296 makes this work more comfortable.  Further, this gives translators
2297 who feel a little like programmers, or programmers who feel a little
2298 like translators, a tool letting them work at marking translatable
2299 strings in the program sources, while simultaneously producing a set of
2300 translation in some language, for the package being internationalized.
2301
2302 @emindex @code{etags}, using for marking strings
2303 The set of program sources, targeted by the PO mode commands describe
2304 here, should have an Emacs tags table constructed for your project,
2305 prior to using these PO file commands.  This is easy to do.  In any
2306 shell window, change the directory to the root of your project, then
2307 execute a command resembling:
2308
2309 @example
2310 etags src/*.[hc] lib/*.[hc]
2311 @end example
2312
2313 @noindent
2314 presuming here you want to process all @file{.h} and @file{.c} files
2315 from the @file{src/} and @file{lib/} directories.  This command will
2316 explore all said files and create a @file{TAGS} file in your root
2317 directory, somewhat summarizing the contents using a special file
2318 format Emacs can understand.
2319
2320 @emindex @file{TAGS}, and marking translatable strings
2321 For packages following the GNU coding standards, there is
2322 a make goal @code{tags} or @code{TAGS} which constructs the tag files in
2323 all directories and for all files containing source code.
2324
2325 Once your @file{TAGS} file is ready, the following commands assist
2326 the programmer at marking translatable strings in his set of sources.
2327 But these commands are necessarily driven from within a PO file
2328 window, and it is likely that you do not even have such a PO file yet.
2329 This is not a problem at all, as you may safely open a new, empty PO
2330 file, mainly for using these commands.  This empty PO file will slowly
2331 fill in while you mark strings as translatable in your program sources.
2332
2333 @table @kbd
2334 @item ,
2335 @efindex ,@r{, PO Mode command}
2336 Search through program sources for a string which looks like a
2337 candidate for translation (@code{po-tags-search}).
2338
2339 @item M-,
2340 @efindex M-,@r{, PO Mode command}
2341 Mark the last string found with @samp{_()} (@code{po-mark-translatable}).
2342
2343 @item M-.
2344 @efindex M-.@r{, PO Mode command}
2345 Mark the last string found with a keyword taken from a set of possible
2346 keywords.  This command with a prefix allows some management of these
2347 keywords (@code{po-select-mark-and-mark}).
2348
2349 @end table
2350
2351 @efindex po-tags-search@r{, PO Mode command}
2352 The @kbd{,} (@code{po-tags-search}) command searches for the next
2353 occurrence of a string which looks like a possible candidate for
2354 translation, and displays the program source in another Emacs window,
2355 positioned in such a way that the string is near the top of this other
2356 window.  If the string is too big to fit whole in this window, it is
2357 positioned so only its end is shown.  In any case, the cursor
2358 is left in the PO file window.  If the shown string would be better
2359 presented differently in different native languages, you may mark it
2360 using @kbd{M-,} or @kbd{M-.}.  Otherwise, you might rather ignore it
2361 and skip to the next string by merely repeating the @kbd{,} command.
2362
2363 A string is a good candidate for translation if it contains a sequence
2364 of three or more letters.  A string containing at most two letters in
2365 a row will be considered as a candidate if it has more letters than
2366 non-letters.  The command disregards strings containing no letters,
2367 or isolated letters only.  It also disregards strings within comments,
2368 or strings already marked with some keyword PO mode knows (see below).
2369
2370 If you have never told Emacs about some @file{TAGS} file to use, the
2371 command will request that you specify one from the minibuffer, the
2372 first time you use the command.  You may later change your @file{TAGS}
2373 file by using the regular Emacs command @w{@kbd{M-x visit-tags-table}},
2374 which will ask you to name the precise @file{TAGS} file you want
2375 to use.  @xref{Tags, , Tag Tables, emacs, The Emacs Editor}.
2376
2377 Each time you use the @kbd{,} command, the search resumes from where it was
2378 left by the previous search, and goes through all program sources,
2379 obeying the @file{TAGS} file, until all sources have been processed.
2380 However, by giving a prefix argument to the command @w{(@kbd{C-u
2381 ,})}, you may request that the search be restarted all over again
2382 from the first program source; but in this case, strings that you
2383 recently marked as translatable will be automatically skipped.
2384
2385 Using this @kbd{,} command does not prevent using of other regular
2386 Emacs tags commands.  For example, regular @code{tags-search} or
2387 @code{tags-query-replace} commands may be used without disrupting the
2388 independent @kbd{,} search sequence.  However, as implemented, the
2389 @emph{initial} @kbd{,} command (or the @kbd{,} command is used with a
2390 prefix) might also reinitialize the regular Emacs tags searching to the
2391 first tags file, this reinitialization might be considered spurious.
2392
2393 @efindex po-mark-translatable@r{, PO Mode command}
2394 @efindex po-select-mark-and-mark@r{, PO Mode command}
2395 The @kbd{M-,} (@code{po-mark-translatable}) command will mark the
2396 recently found string with the @samp{_} keyword.  The @kbd{M-.}
2397 (@code{po-select-mark-and-mark}) command will request that you type
2398 one keyword from the minibuffer and use that keyword for marking
2399 the string.  Both commands will automatically create a new PO file
2400 untranslated entry for the string being marked, and make it the
2401 current entry (making it easy for you to immediately proceed to its
2402 translation, if you feel like doing it right away).  It is possible
2403 that the modifications made to the program source by @kbd{M-,} or
2404 @kbd{M-.} render some source line longer than 80 columns, forcing you
2405 to break and re-indent this line differently.  You may use the @kbd{O}
2406 command from PO mode, or any other window changing command from
2407 Emacs, to break out into the program source window, and do any
2408 needed adjustments.  You will have to use some regular Emacs command
2409 to return the cursor to the PO file window, if you want command
2410 @kbd{,} for the next string, say.
2411
2412 The @kbd{M-.} command has a few built-in speedups, so you do not
2413 have to explicitly type all keywords all the time.  The first such
2414 speedup is that you are presented with a @emph{preferred} keyword,
2415 which you may accept by merely typing @kbd{@key{RET}} at the prompt.
2416 The second speedup is that you may type any non-ambiguous prefix of the
2417 keyword you really mean, and the command will complete it automatically
2418 for you.  This also means that PO mode has to @emph{know} all
2419 your possible keywords, and that it will not accept mistyped keywords.
2420
2421 If you reply @kbd{?} to the keyword request, the command gives a
2422 list of all known keywords, from which you may choose.  When the
2423 command is prefixed by an argument @w{(@kbd{C-u M-.})}, it inhibits
2424 updating any program source or PO file buffer, and does some simple
2425 keyword management instead.  In this case, the command asks for a
2426 keyword, written in full, which becomes a new allowed keyword for
2427 later @kbd{M-.} commands.  Moreover, this new keyword automatically
2428 becomes the @emph{preferred} keyword for later commands.  By typing
2429 an already known keyword in response to @w{@kbd{C-u M-.}}, one merely
2430 changes the @emph{preferred} keyword and does nothing more.
2431
2432 All keywords known for @kbd{M-.} are recognized by the @kbd{,} command
2433 when scanning for strings, and strings already marked by any of those
2434 known keywords are automatically skipped.  If many PO files are opened
2435 simultaneously, each one has its own independent set of known keywords.
2436 There is no provision in PO mode, currently, for deleting a known
2437 keyword, you have to quit the file (maybe using @kbd{q}) and reopen
2438 it afresh.  When a PO file is newly brought up in an Emacs window, only
2439 @samp{gettext} and @samp{_} are known as keywords, and @samp{gettext}
2440 is preferred for the @kbd{M-.} command.  In fact, this is not useful to
2441 prefer @samp{_}, as this one is already built in the @kbd{M-,} command.
2442
2443 @node c-format Flag, Special cases, Marking, Sources
2444 @section Special Comments preceding Keywords
2445
2446 @c FIXME document c-format and no-c-format.
2447
2448 @cindex format strings
2449 In C programs strings are often used within calls of functions from the
2450 @code{printf} family.  The special thing about these format strings is
2451 that they can contain format specifiers introduced with @kbd{%}.  Assume
2452 we have the code
2453
2454 @example
2455 printf (gettext ("String `%s' has %d characters\n"), s, strlen (s));
2456 @end example
2457
2458 @noindent
2459 A possible German translation for the above string might be:
2460
2461 @example
2462 "%d Zeichen lang ist die Zeichenkette `%s'"
2463 @end example
2464
2465 A C programmer, even if he cannot speak German, will recognize that
2466 there is something wrong here.  The order of the two format specifiers
2467 is changed but of course the arguments in the @code{printf} don't have.
2468 This will most probably lead to problems because now the length of the
2469 string is regarded as the address.
2470
2471 To prevent errors at runtime caused by translations the @code{msgfmt}
2472 tool can check statically whether the arguments in the original and the
2473 translation string match in type and number.  If this is not the case
2474 and the @samp{-c} option has been passed to @code{msgfmt}, @code{msgfmt}
2475 will give an error and refuse to produce a MO file.  Thus consequent
2476 use of @samp{msgfmt -c} will catch the error, so that it cannot cause
2477 cause problems at runtime.
2478
2479 @noindent
2480 If the word order in the above German translation would be correct one
2481 would have to write
2482
2483 @example
2484 "%2$d Zeichen lang ist die Zeichenkette `%1$s'"
2485 @end example
2486
2487 @noindent
2488 The routines in @code{msgfmt} know about this special notation.
2489
2490 Because not all strings in a program must be format strings it is not
2491 useful for @code{msgfmt} to test all the strings in the @file{.po} file.
2492 This might cause problems because the string might contain what looks
2493 like a format specifier, but the string is not used in @code{printf}.
2494
2495 Therefore the @code{xgettext} adds a special tag to those messages it
2496 thinks might be a format string.  There is no absolute rule for this,
2497 only a heuristic.  In the @file{.po} file the entry is marked using the
2498 @code{c-format} flag in the @code{#,} comment line (@pxref{PO Files}).
2499
2500 @kwindex c-format@r{, and @code{xgettext}}
2501 @kwindex no-c-format@r{, and @code{xgettext}}
2502 The careful reader now might say that this again can cause problems.
2503 The heuristic might guess it wrong.  This is true and therefore
2504 @code{xgettext} knows about a special kind of comment which lets
2505 the programmer take over the decision.  If in the same line as or
2506 the immediately preceding line to the @code{gettext} keyword
2507 the @code{xgettext} program finds a comment containing the words
2508 @code{xgettext:c-format}, it will mark the string in any case with
2509 the @code{c-format} flag.  This kind of comment should be used when
2510 @code{xgettext} does not recognize the string as a format string but
2511 it really is one and it should be tested.  Please note that when the
2512 comment is in the same line as the @code{gettext} keyword, it must be
2513 before the string to be translated.
2514
2515 This situation happens quite often.  The @code{printf} function is often
2516 called with strings which do not contain a format specifier.  Of course
2517 one would normally use @code{fputs} but it does happen.  In this case
2518 @code{xgettext} does not recognize this as a format string but what
2519 happens if the translation introduces a valid format specifier?  The
2520 @code{printf} function will try to access one of the parameters but none
2521 exists because the original code does not pass any parameters.
2522
2523 @code{xgettext} of course could make a wrong decision the other way
2524 round, i.e.@: a string marked as a format string actually is not a format
2525 string.  In this case the @code{msgfmt} might give too many warnings and
2526 would prevent translating the @file{.po} file.  The method to prevent
2527 this wrong decision is similar to the one used above, only the comment
2528 to use must contain the string @code{xgettext:no-c-format}.
2529
2530 If a string is marked with @code{c-format} and this is not correct the
2531 user can find out who is responsible for the decision.  See
2532 @ref{xgettext Invocation} to see how the @code{--debug} option can be
2533 used for solving this problem.
2534
2535 @node Special cases, Bug Report Address, c-format Flag, Sources
2536 @section Special Cases of Translatable Strings
2537
2538 @cindex marking string initializers
2539 The attentive reader might now point out that it is not always possible
2540 to mark translatable string with @code{gettext} or something like this.
2541 Consider the following case:
2542
2543 @example
2544 @group
2545 @{
2546   static const char *messages[] = @{
2547     "some very meaningful message",
2548     "and another one"
2549   @};
2550   const char *string;
2551   @dots{}
2552   string
2553     = index > 1 ? "a default message" : messages[index];
2554
2555   fputs (string);
2556   @dots{}
2557 @}
2558 @end group
2559 @end example
2560
2561 While it is no problem to mark the string @code{"a default message"} it
2562 is not possible to mark the string initializers for @code{messages}.
2563 What is to be done?  We have to fulfill two tasks.  First we have to mark the
2564 strings so that the @code{xgettext} program (@pxref{xgettext Invocation})
2565 can find them, and second we have to translate the string at runtime
2566 before printing them.
2567
2568 The first task can be fulfilled by creating a new keyword, which names a
2569 no-op.  For the second we have to mark all access points to a string
2570 from the array.  So one solution can look like this:
2571
2572 @example
2573 @group
2574 #define gettext_noop(String) String
2575
2576 @{
2577   static const char *messages[] = @{
2578     gettext_noop ("some very meaningful message"),
2579     gettext_noop ("and another one")
2580   @};
2581   const char *string;
2582   @dots{}
2583   string
2584     = index > 1 ? gettext ("a default message") : gettext (messages[index]);
2585
2586   fputs (string);
2587   @dots{}
2588 @}
2589 @end group
2590 @end example
2591
2592 Please convince yourself that the string which is written by
2593 @code{fputs} is translated in any case.  How to get @code{xgettext} know
2594 the additional keyword @code{gettext_noop} is explained in @ref{xgettext
2595 Invocation}.
2596
2597 The above is of course not the only solution.  You could also come along
2598 with the following one:
2599
2600 @example
2601 @group
2602 #define gettext_noop(String) String
2603
2604 @{
2605   static const char *messages[] = @{
2606     gettext_noop ("some very meaningful message",
2607     gettext_noop ("and another one")
2608   @};
2609   const char *string;
2610   @dots{}
2611   string
2612     = index > 1 ? gettext_noop ("a default message") : messages[index];
2613
2614   fputs (gettext (string));
2615   @dots{}
2616 @}
2617 @end group
2618 @end example
2619
2620 But this has a drawback.  The programmer has to take care that
2621 he uses @code{gettext_noop} for the string @code{"a default message"}.
2622 A use of @code{gettext} could have in rare cases unpredictable results.
2623
2624 One advantage is that you need not make control flow analysis to make
2625 sure the output is really translated in any case.  But this analysis is
2626 generally not very difficult.  If it should be in any situation you can
2627 use this second method in this situation.
2628
2629 @node Bug Report Address, Names, Special cases, Sources
2630 @section Letting Users Report Translation Bugs
2631
2632 Code sometimes has bugs, but translations sometimes have bugs too.  The
2633 users need to be able to report them.  Reporting translation bugs to the
2634 programmer or maintainer of a package is not very useful, since the
2635 maintainer must never change a translation, except on behalf of the
2636 translator.  Hence the translation bugs must be reported to the
2637 translators.
2638
2639 Here is a way to organize this so that the maintainer does not need to
2640 forward translation bug reports, nor even keep a list of the addresses of
2641 the translators or their translation teams.
2642
2643 Every program has a place where is shows the bug report address.  For
2644 GNU programs, it is the code which handles the ``--help'' option,
2645 typically in a function called ``usage''.  In this place, instruct the
2646 translator to add her own bug reporting address.  For example, if that
2647 code has a statement
2648
2649 @example
2650 @group
2651 printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT);
2652 @end group
2653 @end example
2654
2655 you can add some translator instructions like this:
2656
2657 @example
2658 @group
2659 /* TRANSLATORS: The placeholder indicates the bug-reporting address
2660    for this package.  Please add _another line_ saying
2661    "Report translation bugs to <...>\n" with the address for translation
2662    bugs (typically your translation team's web or email address).  */
2663 printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT);
2664 @end group
2665 @end example
2666
2667 These will be extracted by @samp{xgettext}, leading to a .pot file that
2668 contains this:
2669
2670 @example
2671 @group
2672 #. TRANSLATORS: The placeholder indicates the bug-reporting address
2673 #. for this package.  Please add _another line_ saying
2674 #. "Report translation bugs to <...>\n" with the address for translation
2675 #. bugs (typically your translation team's web or email address).
2676 #: src/hello.c:178
2677 #, c-format
2678 msgid "Report bugs to <%s>.\n"
2679 msgstr ""
2680 @end group
2681 @end example
2682
2683 @node Names, Libraries, Bug Report Address, Sources
2684 @section Marking Proper Names for Translation
2685
2686 Should names of persons, cities, locations etc. be marked for translation
2687 or not?  People who only know languages that can be written with Latin
2688 letters (English, Spanish, French, German, etc.) are tempted to say ``no'',
2689 because names usually do not change when transported between these languages.
2690 However, in general when translating from one script to another, names
2691 are translated too, usually phonetically or by transliteration.  For
2692 example, Russian or Greek names are converted to the Latin alphabet when
2693 being translated to English, and English or French names are converted
2694 to the Katakana script when being translated to Japanese.  This is
2695 necessary because the speakers of the target language in general cannot
2696 read the script the name is originally written in.
2697
2698 As a programmer, you should therefore make sure that names are marked
2699 for translation, with a special comment telling the translators that it
2700 is a proper name and how to pronounce it.  In its simple form, it looks
2701 like this:
2702
2703 @example
2704 @group
2705 printf (_("Written by %s.\n"),
2706         /* TRANSLATORS: This is a proper name.  See the gettext
2707            manual, section Names.  Note this is actually a non-ASCII
2708            name: The first name is (with Unicode escapes)
2709            "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois".
2710            Pronunciation is like "fraa-swa pee-nar".  */
2711         _("Francois Pinard"));
2712 @end group
2713 @end example
2714
2715 @noindent
2716 The GNU gnulib library offers a module @samp{propername}
2717 (@url{http://www.gnu.org/software/gnulib/MODULES.html#module=propername})
2718 which takes care to automatically append the original name, in parentheses,
2719 to the translated name.  For names that cannot be written in ASCII, it
2720 also frees the translator from the task of entering the appropriate non-ASCII
2721 characters if no script change is needed.  In this more comfortable form,
2722 it looks like this:
2723
2724 @example
2725 @group
2726 printf (_("Written by %s and %s.\n"),
2727         proper_name ("Ulrich Drepper"),
2728         /* TRANSLATORS: This is a proper name.  See the gettext
2729            manual, section Names.  Note this is actually a non-ASCII
2730            name: The first name is (with Unicode escapes)
2731            "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois".
2732            Pronunciation is like "fraa-swa pee-nar".  */
2733         proper_name_utf8 ("Francois Pinard", "Fran\303\247ois Pinard"));
2734 @end group
2735 @end example
2736
2737 @noindent
2738 You can also write the original name directly in Unicode (rather than with
2739 Unicode escapes or HTML entities) and denote the pronunciation using the
2740 International Phonetic Alphabet (see
2741 @url{http://www.wikipedia.org/wiki/International_Phonetic_Alphabet}).
2742
2743 As a translator, you should use some care when translating names, because
2744 it is frustrating if people see their names mutilated or distorted.
2745
2746 If your language uses the Latin script, all you need to do is to reproduce
2747 the name as perfectly as you can within the usual character set of your
2748 language.  In this particular case, this means to provide a translation
2749 containing the c-cedilla character.  If your language uses a different
2750 script and the people speaking it don't usually read Latin words, it means
2751 transliteration.  If the programmer used the simple case, you should still
2752 give, in parentheses, the original writing of the name -- for the sake of
2753 the people that do read the Latin script.  If the programmer used the
2754 @samp{propername} module mentioned above, you don't need to give the original
2755 writing of the name in parentheses, because the program will already do so.
2756 Here is an example, using Greek as the target script:
2757
2758 @example
2759 @group
2760 #. This is a proper name.  See the gettext
2761 #. manual, section Names.  Note this is actually a non-ASCII
2762 #. name: The first name is (with Unicode escapes)
2763 #. "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois".
2764 #. Pronunciation is like "fraa-swa pee-nar".
2765 msgid "Francois Pinard"
2766 msgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho"
2767        " (Francois Pinard)"
2768 @end group
2769 @end example
2770
2771 Because translation of names is such a sensitive domain, it is a good
2772 idea to test your translation before submitting it.
2773
2774 @node Libraries,  , Names, Sources
2775 @section Preparing Library Sources
2776
2777 When you are preparing a library, not a program, for the use of
2778 @code{gettext}, only a few details are different.  Here we assume that
2779 the library has a translation domain and a POT file of its own.  (If
2780 it uses the translation domain and POT file of the main program, then
2781 the previous sections apply without changes.)
2782
2783 @enumerate
2784 @item
2785 The library code doesn't call @code{setlocale (LC_ALL, "")}.  It's the
2786 responsibility of the main program to set the locale.  The library's
2787 documentation should mention this fact, so that developers of programs
2788 using the library are aware of it.
2789
2790 @item
2791 The library code doesn't call @code{textdomain (PACKAGE)}, because it
2792 would interfere with the text domain set by the main program.
2793
2794 @item
2795 The initialization code for a program was
2796
2797 @smallexample
2798   setlocale (LC_ALL, "");
2799   bindtextdomain (PACKAGE, LOCALEDIR);
2800   textdomain (PACKAGE);
2801 @end smallexample
2802
2803 @noindent
2804 For a library it is reduced to
2805
2806 @smallexample
2807   bindtextdomain (PACKAGE, LOCALEDIR);
2808 @end smallexample
2809
2810 @noindent
2811 If your library's API doesn't already have an initialization function,
2812 you need to create one, containing at least the @code{bindtextdomain}
2813 invocation.  However, you usually don't need to export and document this
2814 initialization function: It is sufficient that all entry points of the
2815 library call the initialization function if it hasn't been called before.
2816 The typical idiom used to achieve this is a static boolean variable that
2817 indicates whether the initialization function has been called. Like this:
2818
2819 @example
2820 @group
2821 static bool libfoo_initialized;
2822
2823 static void
2824 libfoo_initialize (void)
2825 @{
2826   bindtextdomain (PACKAGE, LOCALEDIR);
2827   libfoo_initialized = true;
2828 @}
2829
2830 /* This function is part of the exported API.  */
2831 struct foo *
2832 create_foo (...)
2833 @{
2834   /* Must ensure the initialization is performed.  */
2835   if (!libfoo_initialized)
2836     libfoo_initialize ();
2837   ...
2838 @}
2839
2840 /* This function is part of the exported API.  The argument must be
2841    non-NULL and have been created through create_foo().  */
2842 int
2843 foo_refcount (struct foo *argument)
2844 @{
2845   /* No need to invoke the initialization function here, because
2846      create_foo() must already have been called before.  */
2847   ...
2848 @}
2849 @end group
2850 @end example
2851
2852 @item
2853 The usual declaration of the @samp{_} macro in each source file was
2854
2855 @smallexample
2856 #include <libintl.h>
2857 #define _(String) gettext (String)
2858 @end smallexample
2859
2860 @noindent
2861 for a program.  For a library, which has its own translation domain,
2862 it reads like this:
2863
2864 @smallexample
2865 #include <libintl.h>
2866 #define _(String) dgettext (PACKAGE, String)
2867 @end smallexample
2868
2869 In other words, @code{dgettext} is used instead of @code{gettext}.
2870 Similarly, the @code{dngettext} function should be used in place of the
2871 @code{ngettext} function.
2872 @end enumerate
2873
2874 @node Template, Creating, Sources, Top
2875 @chapter Making the PO Template File
2876 @cindex PO template file
2877
2878 After preparing the sources, the programmer creates a PO template file.
2879 This section explains how to use @code{xgettext} for this purpose.
2880
2881 @code{xgettext} creates a file named @file{@var{domainname}.po}.  You
2882 should then rename it to @file{@var{domainname}.pot}.  (Why doesn't
2883 @code{xgettext} create it under the name @file{@var{domainname}.pot}
2884 right away?  The answer is: for historical reasons.  When @code{xgettext}
2885 was specified, the distinction between a PO file and PO file template
2886 was fuzzy, and the suffix @samp{.pot} wasn't in use at that time.)
2887
2888 @c FIXME: Rewrite.
2889
2890 @menu
2891 * xgettext Invocation::         Invoking the @code{xgettext} Program
2892 @end menu
2893
2894 @node xgettext Invocation,  , Template, Template
2895 @section Invoking the @code{xgettext} Program
2896
2897 @include xgettext.texi
2898
2899 @node Creating, Updating, Template, Top
2900 @chapter Creating a New PO File
2901 @cindex creating a new PO file
2902
2903 When starting a new translation, the translator creates a file called
2904 @file{@var{LANG}.po}, as a copy of the @file{@var{package}.pot} template
2905 file with modifications in the initial comments (at the beginning of the file)
2906 and in the header entry (the first entry, near the beginning of the file).
2907
2908 The easiest way to do so is by use of the @samp{msginit} program.
2909 For example:
2910
2911 @example
2912 $ cd @var{PACKAGE}-@var{VERSION}
2913 $ cd po
2914 $ msginit
2915 @end example
2916
2917 The alternative way is to do the copy and modifications by hand.
2918 To do so, the translator copies @file{@var{package}.pot} to
2919 @file{@var{LANG}.po}.  Then she modifies the initial comments and
2920 the header entry of this file.
2921
2922 @menu
2923 * msginit Invocation::          Invoking the @code{msginit} Program
2924 * Header Entry::                Filling in the Header Entry
2925 @end menu
2926
2927 @node msginit Invocation, Header Entry, Creating, Creating
2928 @section Invoking the @code{msginit} Program
2929
2930 @include msginit.texi
2931
2932 @node Header Entry,  , msginit Invocation, Creating
2933 @section Filling in the Header Entry
2934 @cindex header entry of a PO file
2935
2936 The initial comments "SOME DESCRIPTIVE TITLE", "YEAR" and
2937 "FIRST AUTHOR <EMAIL@@ADDRESS>, YEAR" ought to be replaced by sensible
2938 information.  This can be done in any text editor; if Emacs is used
2939 and it switched to PO mode automatically (because it has recognized
2940 the file's suffix), you can disable it by typing @kbd{M-x fundamental-mode}.
2941
2942 Modifying the header entry can already be done using PO mode: in Emacs,
2943 type @kbd{M-x po-mode RET} and then @kbd{RET} again to start editing the
2944 entry.  You should fill in the following fields.
2945
2946 @table @asis
2947 @item Project-Id-Version
2948 This is the name and version of the package.  Fill it in if it has not
2949 already been filled in by @code{xgettext}.
2950
2951 @item Report-Msgid-Bugs-To
2952 This has already been filled in by @code{xgettext}.  It contains an email
2953 address or URL where you can report bugs in the untranslated strings:
2954
2955 @itemize -
2956 @item Strings which are not entire sentences, see the maintainer guidelines
2957 in @ref{Preparing Strings}.
2958 @item Strings which use unclear terms or require additional context to be
2959 understood.
2960 @item Strings which make invalid assumptions about notation of date, time or
2961 money.
2962 @item Pluralisation problems.
2963 @item Incorrect English spelling.
2964 @item Incorrect formatting.
2965 @end itemize
2966
2967 @item POT-Creation-Date
2968 This has already been filled in by @code{xgettext}.
2969
2970 @item PO-Revision-Date
2971 You don't need to fill this in.  It will be filled by the PO file editor
2972 when you save the file.
2973
2974 @item Last-Translator
2975 Fill in your name and email address (without double quotes).
2976
2977 @item Language-Team
2978 Fill in the English name of the language, and the email address or
2979 homepage URL of the language team you are part of.
2980
2981 Before starting a translation, it is a good idea to get in touch with
2982 your translation team, not only to make sure you don't do duplicated work,
2983 but also to coordinate difficult linguistic issues.
2984
2985 @cindex list of translation teams, where to find
2986 In the Free Translation Project, each translation team has its own mailing
2987 list.  The up-to-date list of teams can be found at the Free Translation
2988 Project's homepage, @uref{http://translationproject.org/}, in the "Teams"
2989 area.
2990
2991 @item Language
2992 @c The purpose of this field is to make it possible to automatically
2993 @c - convert PO files to translation memory,
2994 @c - initialize a spell checker based on the PO file,
2995 @c - perform language specific checks.
2996 Fill in the language code of the language.  This can be in one of three
2997 forms:
2998
2999 @itemize -
3000 @item
3001 @samp{@var{ll}}, an @w{ISO 639} two-letter language code (lowercase).
3002 See @ref{Language Codes} for the list of codes.
3003
3004 @item
3005 @samp{@var{ll}_@var{CC}}, where @samp{@var{ll}} is an @w{ISO 639} two-letter
3006 language code (lowercase) and @samp{@var{CC}} is an @w{ISO 3166} two-letter
3007 country code (uppercase).  The country code specification is not redundant:
3008 Some languages have dialects in different countries.  For example,
3009 @samp{de_AT} is used for Austria, and @samp{pt_BR} for Brazil.  The country
3010 code serves to distinguish the dialects. See @ref{Language Codes} and
3011 @ref{Country Codes} for the lists of codes.
3012
3013 @item
3014 @samp{@var{ll}_@var{CC}@@@var{variant}}, where @samp{@var{ll}} is an
3015 @w{ISO 639} two-letter language code (lowercase), @samp{@var{CC}} is an
3016 @w{ISO 3166} two-letter country code (uppercase), and @samp{@var{variant}} is
3017 a variant designator. The variant designator (lowercase) can be a script
3018 designator, such as @samp{latin} or @samp{cyrillic}.
3019 @end itemize
3020
3021 The naming convention @samp{@var{ll}_@var{CC}} is also the way locales are
3022 named on systems based on GNU libc.  But there are three important differences:
3023
3024 @itemize @bullet
3025 @item
3026 In this PO file field, but not in locale names, @samp{@var{ll}_@var{CC}}
3027 combinations denoting a language's main dialect are abbreviated as
3028 @samp{@var{ll}}.  For example, @samp{de} is equivalent to @samp{de_DE}
3029 (German as spoken in Germany), and @samp{pt} to @samp{pt_PT} (Portuguese as
3030 spoken in Portugal) in this context.
3031
3032 @item
3033 In this PO file field, suffixes like @samp{.@var{encoding}} are not used.
3034
3035 @item
3036 In this PO file field, variant designators that are not relevant to message
3037 translation, such as @samp{@@euro}, are not used.
3038 @end itemize
3039
3040 So, if your locale name is @samp{de_DE.UTF-8}, the language specification in
3041 PO files is just @samp{de}.
3042
3043 @item Content-Type
3044 @cindex encoding of PO files
3045 @cindex charset of PO files
3046 Replace @samp{CHARSET} with the character encoding used for your language,
3047 in your locale, or UTF-8.  This field is needed for correct operation of the
3048 @code{msgmerge} and @code{msgfmt} programs, as well as for users whose
3049 locale's character encoding differs from yours (see @ref{Charset conversion}).
3050
3051 @cindex @code{locale} program
3052 You get the character encoding of your locale by running the shell command
3053 @samp{locale charmap}.  If the result is @samp{C} or @samp{ANSI_X3.4-1968},
3054 which is equivalent to @samp{ASCII} (= @samp{US-ASCII}), it means that your
3055 locale is not correctly configured.  In this case, ask your translation
3056 team which charset to use.  @samp{ASCII} is not usable for any language
3057 except Latin.
3058
3059 @cindex encoding list
3060 Because the PO files must be portable to operating systems with less advanced
3061 internationalization facilities, the character encodings that can be used
3062 are limited to those supported by both GNU @code{libc} and GNU
3063 @code{libiconv}.  These are:
3064 @code{ASCII}, @code{ISO-8859-1}, @code{ISO-8859-2}, @code{ISO-8859-3},
3065 @code{ISO-8859-4}, @code{ISO-8859-5}, @code{ISO-8859-6}, @code{ISO-8859-7},
3066 @code{ISO-8859-8}, @code{ISO-8859-9}, @code{ISO-8859-13}, @code{ISO-8859-14},
3067 @code{ISO-8859-15},
3068 @code{KOI8-R}, @code{KOI8-U}, @code{KOI8-T},
3069 @code{CP850}, @code{CP866}, @code{CP874},
3070 @code{CP932}, @code{CP949}, @code{CP950}, @code{CP1250}, @code{CP1251},
3071 @code{CP1252}, @code{CP1253}, @code{CP1254}, @code{CP1255}, @code{CP1256},
3072 @code{CP1257}, @code{GB2312}, @code{EUC-JP}, @code{EUC-KR}, @code{EUC-TW},
3073 @code{BIG5}, @code{BIG5-HKSCS}, @code{GBK}, @code{GB18030}, @code{SHIFT_JIS},
3074 @code{JOHAB}, @code{TIS-620}, @code{VISCII}, @code{GEORGIAN-PS}, @code{UTF-8}.
3075
3076 @c This data is taken from glibc/localedata/SUPPORTED.
3077 @cindex Linux
3078 In the GNU system, the following encodings are frequently used for the
3079 corresponding languages.
3080
3081 @cindex encoding for your language
3082 @itemize
3083 @item @code{ISO-8859-1} for
3084 Afrikaans, Albanian, Basque, Breton, Catalan, Cornish, Danish, Dutch,
3085 English, Estonian, Faroese, Finnish, French, Galician, German,
3086 Greenlandic, Icelandic, Indonesian, Irish, Italian, Malay, Manx,
3087 Norwegian, Occitan, Portuguese, Spanish, Swedish, Tagalog, Uzbek,
3088 Walloon,
3089 @item @code{ISO-8859-2} for
3090 Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak,
3091 Slovenian,
3092 @item @code{ISO-8859-3} for Maltese,
3093 @item @code{ISO-8859-5} for Macedonian, Serbian,
3094 @item @code{ISO-8859-6} for Arabic,
3095 @item @code{ISO-8859-7} for Greek,
3096 @item @code{ISO-8859-8} for Hebrew,
3097 @item @code{ISO-8859-9} for Turkish,
3098 @item @code{ISO-8859-13} for Latvian, Lithuanian, Maori,
3099 @item @code{ISO-8859-14} for Welsh,
3100 @item @code{ISO-8859-15} for
3101 Basque, Catalan, Dutch, English, Finnish, French, Galician, German, Irish,
3102 Italian, Portuguese, Spanish, Swedish, Walloon,
3103 @item @code{KOI8-R} for Russian,
3104 @item @code{KOI8-U} for Ukrainian,
3105 @item @code{KOI8-T} for Tajik,
3106 @item @code{CP1251} for Bulgarian, Belarusian,
3107 @item @code{GB2312}, @code{GBK}, @code{GB18030}
3108 for simplified writing of Chinese,
3109 @item @code{BIG5}, @code{BIG5-HKSCS}
3110 for traditional writing of Chinese,
3111 @item @code{EUC-JP} for Japanese,
3112 @item @code{EUC-KR} for Korean,
3113 @item @code{TIS-620} for Thai,
3114 @item @code{GEORGIAN-PS} for Georgian,
3115 @item @code{UTF-8} for any language, including those listed above.
3116 @end itemize
3117
3118 @cindex quote characters, use in PO files
3119 @cindex quotation marks
3120 When single quote characters or double quote characters are used in
3121 translations for your language, and your locale's encoding is one of the
3122 ISO-8859-* charsets, it is best if you create your PO files in UTF-8
3123 encoding, instead of your locale's encoding.  This is because in UTF-8
3124 the real quote characters can be represented (single quote characters:
3125 U+2018, U+2019, double quote characters: U+201C, U+201D), whereas none of
3126 ISO-8859-* charsets has them all.  Users in UTF-8 locales will see the
3127 real quote characters, whereas users in ISO-8859-* locales will see the
3128 vertical apostrophe and the vertical double quote instead (because that's
3129 what the character set conversion will transliterate them to).
3130
3131 @cindex @code{xmodmap} program, and typing quotation marks
3132 To enter such quote characters under X11, you can change your keyboard
3133 mapping using the @code{xmodmap} program.  The X11 names of the quote
3134 characters are "leftsinglequotemark", "rightsinglequotemark",
3135 "leftdoublequotemark", "rightdoublequotemark", "singlelowquotemark",
3136 "doublelowquotemark".
3137
3138 Note that only recent versions of GNU Emacs support the UTF-8 encoding:
3139 Emacs 20 with Mule-UCS, and Emacs 21.  As of January 2001, XEmacs doesn't
3140 support the UTF-8 encoding.
3141
3142 The character encoding name can be written in either upper or lower case.
3143 Usually upper case is preferred.
3144
3145 @item Content-Transfer-Encoding
3146 Set this to @code{8bit}.
3147
3148 @item Plural-Forms
3149 This field is optional.  It is only needed if the PO file has plural forms.
3150 You can find them by searching for the @samp{msgid_plural} keyword.  The
3151 format of the plural forms field is described in @ref{Plural forms} and
3152 @ref{Translating plural forms}.
3153 @end table
3154
3155 @node Updating, Editing, Creating, Top
3156 @chapter Updating Existing PO Files
3157
3158 @menu
3159 * msgmerge Invocation::         Invoking the @code{msgmerge} Program
3160 @end menu
3161
3162 @node msgmerge Invocation,  , Updating, Updating
3163 @section Invoking the @code{msgmerge} Program
3164
3165 @include msgmerge.texi
3166
3167 @node Editing, Manipulating, Updating, Top
3168 @chapter Editing PO Files
3169 @cindex Editing PO Files
3170
3171 @menu
3172 * KBabel::                      KDE's PO File Editor
3173 * Gtranslator::                 GNOME's PO File Editor
3174 * PO Mode::                     Emacs's PO File Editor
3175 * Compendium::                  Using Translation Compendia
3176 @end menu
3177
3178 @node KBabel, Gtranslator, Editing, Editing
3179 @section KDE's PO File Editor
3180 @cindex KDE PO file editor
3181
3182 @node Gtranslator, PO Mode, KBabel, Editing
3183 @section GNOME's PO File Editor
3184 @cindex GNOME PO file editor
3185
3186 @node PO Mode, Compendium, Gtranslator, Editing
3187 @section Emacs's PO File Editor
3188 @cindex Emacs PO Mode
3189
3190 @c FIXME: Rewrite.
3191
3192 For those of you being
3193 the lucky users of Emacs, PO mode has been specifically created
3194 for providing a cozy environment for editing or modifying PO files.
3195 While editing a PO file, PO mode allows for the easy browsing of
3196 auxiliary and compendium PO files, as well as for following references into
3197 the set of C program sources from which PO files have been derived.
3198 It has a few special features, among which are the interactive marking
3199 of program strings as translatable, and the validation of PO files
3200 with easy repositioning to PO file lines showing errors.
3201
3202 For the beginning, besides main PO mode commands
3203 (@pxref{Main PO Commands}), you should know how to move between entries
3204 (@pxref{Entry Positioning}), and how to handle untranslated entries
3205 (@pxref{Untranslated Entries}).
3206
3207 @menu
3208 * Installation::                Completing GNU @code{gettext} Installation
3209 * Main PO Commands::            Main Commands
3210 * Entry Positioning::           Entry Positioning
3211 * Normalizing::                 Normalizing Strings in Entries
3212 * Translated Entries::          Translated Entries
3213 * Fuzzy Entries::               Fuzzy Entries
3214 * Untranslated Entries::        Untranslated Entries
3215 * Obsolete Entries::            Obsolete Entries
3216 * Modifying Translations::      Modifying Translations
3217 * Modifying Comments::          Modifying Comments
3218 * Subedit::                     Mode for Editing Translations
3219 * C Sources Context::           C Sources Context
3220 * Auxiliary::                   Consulting Auxiliary PO Files
3221 @end menu
3222
3223 @node Installation, Main PO Commands, PO Mode, PO Mode
3224 @subsection Completing GNU @code{gettext} Installation
3225
3226 @cindex installing @code{gettext}
3227 @cindex @code{gettext} installation
3228 Once you have received, unpacked, configured and compiled the GNU
3229 @code{gettext} distribution, the @samp{make install} command puts in
3230 place the programs @code{xgettext}, @code{msgfmt}, @code{gettext}, and
3231 @code{msgmerge}, as well as their available message catalogs.  To
3232 top off a comfortable installation, you might also want to make the
3233 PO mode available to your Emacs users.
3234
3235 @emindex @file{.emacs} customizations
3236 @emindex installing PO mode
3237 During the installation of the PO mode, you might want to modify your
3238 file @file{.emacs}, once and for all, so it contains a few lines looking
3239 like:
3240
3241 @example
3242 (setq auto-mode-alist
3243       (cons '("\\.po\\'\\|\\.po\\." . po-mode) auto-mode-alist))
3244 (autoload 'po-mode "po-mode" "Major mode for translators to edit PO files" t)
3245 @end example
3246
3247 Later, whenever you edit some @file{.po}
3248 file, or any file having the string @samp{.po.} within its name,
3249 Emacs loads @file{po-mode.elc} (or @file{po-mode.el}) as needed, and
3250 automatically activates PO mode commands for the associated buffer.
3251 The string @emph{PO} appears in the mode line for any buffer for
3252 which PO mode is active.  Many PO files may be active at once in a
3253 single Emacs session.
3254
3255 If you are using Emacs version 20 or newer, and have already installed
3256 the appropriate international fonts on your system, you may also tell
3257 Emacs how to determine automatically the coding system of every PO file.
3258 This will often (but not always) cause the necessary fonts to be loaded
3259 and used for displaying the translations on your Emacs screen.  For this
3260 to happen, add the lines:
3261
3262 @example
3263 (modify-coding-system-alist 'file "\\.po\\'\\|\\.po\\."
3264                             'po-find-file-coding-system)
3265 (autoload 'po-find-file-coding-system "po-mode")
3266 @end example
3267
3268 @noindent
3269 to your @file{.emacs} file.  If, with this, you still see boxes instead
3270 of international characters, try a different font set (via Shift Mouse
3271 button 1).
3272
3273 @node Main PO Commands, Entry Positioning, Installation, PO Mode
3274 @subsection Main PO mode Commands
3275
3276 @cindex PO mode (Emacs) commands
3277 @emindex commands
3278 After setting up Emacs with something similar to the lines in
3279 @ref{Installation}, PO mode is activated for a window when Emacs finds a
3280 PO file in that window.  This puts the window read-only and establishes a
3281 po-mode-map, which is a genuine Emacs mode, in a way that is not derived
3282 from text mode in any way.  Functions found on @code{po-mode-hook},
3283 if any, will be executed.
3284
3285 When PO mode is active in a window, the letters @samp{PO} appear
3286 in the mode line for that window.  The mode line also displays how
3287 many entries of each kind are held in the PO file.  For example,
3288 the string @samp{132t+3f+10u+2o} would tell the translator that the
3289 PO mode contains 132 translated entries (@pxref{Translated Entries},
3290 3 fuzzy entries (@pxref{Fuzzy Entries}), 10 untranslated entries
3291 (@pxref{Untranslated Entries}) and 2 obsolete entries (@pxref{Obsolete
3292 Entries}).  Zero-coefficients items are not shown.  So, in this example, if
3293 the fuzzy entries were unfuzzied, the untranslated entries were translated
3294 and the obsolete entries were deleted, the mode line would merely display
3295 @samp{145t} for the counters.
3296
3297 The main PO commands are those which do not fit into the other categories of
3298 subsequent sections.  These allow for quitting PO mode or for managing windows
3299 in special ways.
3300
3301 @table @kbd
3302 @item _
3303 @efindex _@r{, PO Mode command}
3304 Undo last modification to the PO file (@code{po-undo}).
3305
3306 @item Q
3307 @efindex Q@r{, PO Mode command}
3308 Quit processing and save the PO file (@code{po-quit}).
3309
3310 @item q
3311 @efindex q@r{, PO Mode command}
3312 Quit processing, possibly after confirmation (@code{po-confirm-and-quit}).
3313
3314 @item 0
3315 @efindex 0@r{, PO Mode command}
3316 Temporary leave the PO file window (@code{po-other-window}).
3317
3318 @item ?
3319 @itemx h
3320 @efindex ?@r{, PO Mode command}
3321 @efindex h@r{, PO Mode command}
3322 Show help about PO mode (@code{po-help}).
3323
3324 @item =
3325 @efindex =@r{, PO Mode command}
3326 Give some PO file statistics (@code{po-statistics}).
3327
3328 @item V
3329 @efindex V@r{, PO Mode command}
3330 Batch validate the format of the whole PO file (@code{po-validate}).
3331
3332 @end table
3333
3334 @efindex _@r{, PO Mode command}
3335 @efindex po-undo@r{, PO Mode command}
3336 The command @kbd{_} (@code{po-undo}) interfaces to the Emacs
3337 @emph{undo} facility.  @xref{Undo, , Undoing Changes, emacs, The Emacs
3338 Editor}.  Each time @kbd{_} is typed, modifications which the translator
3339 did to the PO file are undone a little more.  For the purpose of
3340 undoing, each PO mode command is atomic.  This is especially true for
3341 the @kbd{@key{RET}} command: the whole edition made by using a single
3342 use of this command is undone at once, even if the edition itself
3343 implied several actions.  However, while in the editing window, one
3344 can undo the edition work quite parsimoniously.
3345
3346 @efindex Q@r{, PO Mode command}
3347 @efindex q@r{, PO Mode command}
3348 @efindex po-quit@r{, PO Mode command}
3349 @efindex po-confirm-and-quit@r{, PO Mode command}
3350 The commands @kbd{Q} (@code{po-quit}) and @kbd{q}
3351 (@code{po-confirm-and-quit}) are used when the translator is done with the
3352 PO file.  The former is a bit less verbose than the latter.  If the file
3353 has been modified, it is saved to disk first.  In both cases, and prior to
3354 all this, the commands check if any untranslated messages remain in the
3355 PO file and, if so, the translator is asked if she really wants to leave
3356 off working with this PO file.  This is the preferred way of getting rid
3357 of an Emacs PO file buffer.  Merely killing it through the usual command
3358 @w{@kbd{C-x k}} (@code{kill-buffer}) is not the tidiest way to proceed.
3359
3360 @efindex 0@r{, PO Mode command}
3361 @efindex po-other-window@r{, PO Mode command}
3362 The command @kbd{0} (@code{po-other-window}) is another, softer way,
3363 to leave PO mode, temporarily.  It just moves the cursor to some other
3364 Emacs window, and pops one if necessary.  For example, if the translator
3365 just got PO mode to show some source context in some other, she might
3366 discover some apparent bug in the program source that needs correction.
3367 This command allows the translator to change sex, become a programmer,
3368 and have the cursor right into the window containing the program she
3369 (or rather @emph{he}) wants to modify.  By later getting the cursor back
3370 in the PO file window, or by asking Emacs to edit this file once again,
3371 PO mode is then recovered.
3372
3373 @efindex ?@r{, PO Mode command}
3374 @efindex h@r{, PO Mode command}
3375 @efindex po-help@r{, PO Mode command}
3376 The command @kbd{h} (@code{po-help}) displays a summary of all available PO
3377 mode commands.  The translator should then type any character to resume
3378 normal PO mode operations.  The command @kbd{?} has the same effect
3379 as @kbd{h}.
3380
3381 @efindex =@r{, PO Mode command}
3382 @efindex po-statistics@r{, PO Mode command}
3383 The command @kbd{=} (@code{po-statistics}) computes the total number of
3384 entries in the PO file, the ordinal of the current entry (counted from
3385 1), the number of untranslated entries, the number of obsolete entries,
3386 and displays all these numbers.
3387
3388 @efindex V@r{, PO Mode command}
3389 @efindex po-validate@r{, PO Mode command}
3390 The command @kbd{V} (@code{po-validate}) launches @code{msgfmt} in
3391 checking and verbose
3392 mode over the current PO file.  This command first offers to save the
3393 current PO file on disk.  The @code{msgfmt} tool, from GNU @code{gettext},
3394 has the purpose of creating a MO file out of a PO file, and PO mode uses
3395 the features of this program for checking the overall format of a PO file,
3396 as well as all individual entries.
3397
3398 @efindex next-error@r{, stepping through PO file validation results}
3399 The program @code{msgfmt} runs asynchronously with Emacs, so the
3400 translator regains control immediately while her PO file is being studied.
3401 Error output is collected in the Emacs @samp{*compilation*} buffer,
3402 displayed in another window.  The regular Emacs command @kbd{C-x`}
3403 (@code{next-error}), as well as other usual compile commands, allow the
3404 translator to reposition quickly to the offending parts of the PO file.
3405 Once the cursor is on the line in error, the translator may decide on
3406 any PO mode action which would help correcting the error.
3407
3408 @node Entry Positioning, Normalizing, Main PO Commands, PO Mode
3409 @subsection Entry Positioning
3410
3411 @emindex current entry of a PO file
3412 The cursor in a PO file window is almost always part of
3413 an entry.  The only exceptions are the special case when the cursor
3414 is after the last entry in the file, or when the PO file is
3415 empty.  The entry where the cursor is found to be is said to be the
3416 current entry.  Many PO mode commands operate on the current entry,
3417 so moving the cursor does more than allowing the translator to browse
3418 the PO file, this also selects on which entry commands operate.
3419
3420 @emindex moving through a PO file
3421 Some PO mode commands alter the position of the cursor in a specialized
3422 way.  A few of those special purpose positioning are described here,
3423 the others are described in following sections (for a complete list try
3424 @kbd{C-h m}):
3425
3426 @table @kbd
3427
3428 @item .
3429 @efindex .@r{, PO Mode command}
3430 Redisplay the current entry (@code{po-current-entry}).
3431
3432 @item n
3433 @efindex n@r{, PO Mode command}
3434 Select the entry after the current one (@code{po-next-entry}).
3435
3436 @item p
3437 @efindex p@r{, PO Mode command}
3438 Select the entry before the current one (@code{po-previous-entry}).
3439
3440 @item <
3441 @efindex <@r{, PO Mode command}
3442 Select the first entry in the PO file (@code{po-first-entry}).
3443
3444 @item >
3445 @efindex >@r{, PO Mode command}
3446 Select the last entry in the PO file (@code{po-last-entry}).
3447
3448 @item m
3449 @efindex m@r{, PO Mode command}
3450 Record the location of the current entry for later use
3451 (@code{po-push-location}).
3452
3453 @item r
3454 @efindex r@r{, PO Mode command}
3455 Return to a previously saved entry location (@code{po-pop-location}).
3456
3457 @item x
3458 @efindex x@r{, PO Mode command}
3459 Exchange the current entry location with the previously saved one
3460 (@code{po-exchange-location}).
3461
3462 @end table
3463
3464 @efindex .@r{, PO Mode command}
3465 @efindex po-current-entry@r{, PO Mode command}
3466 Any Emacs command able to reposition the cursor may be used
3467 to select the current entry in PO mode, including commands which
3468 move by characters, lines, paragraphs, screens or pages, and search
3469 commands.  However, there is a kind of standard way to display the
3470 current entry in PO mode, which usual Emacs commands moving
3471 the cursor do not especially try to enforce.  The command @kbd{.}
3472 (@code{po-current-entry}) has the sole purpose of redisplaying the
3473 current entry properly, after the current entry has been changed by
3474 means external to PO mode, or the Emacs screen otherwise altered.
3475
3476 It is yet to be decided if PO mode helps the translator, or otherwise
3477 irritates her, by forcing a rigid window disposition while she
3478 is doing her work.  We originally had quite precise ideas about
3479 how windows should behave, but on the other hand, anyone used to
3480 Emacs is often happy to keep full control.  Maybe a fixed window
3481 disposition might be offered as a PO mode option that the translator
3482 might activate or deactivate at will, so it could be offered on an
3483 experimental basis.  If nobody feels a real need for using it, or
3484 a compulsion for writing it, we should drop this whole idea.
3485 The incentive for doing it should come from translators rather than
3486 programmers, as opinions from an experienced translator are surely
3487 more worth to me than opinions from programmers @emph{thinking} about
3488 how @emph{others} should do translation.
3489
3490 @efindex n@r{, PO Mode command}
3491 @efindex po-next-entry@r{, PO Mode command}
3492 @efindex p@r{, PO Mode command}
3493 @efindex po-previous-entry@r{, PO Mode command}
3494 The commands @kbd{n} (@code{po-next-entry}) and @kbd{p}
3495 (@code{po-previous-entry}) move the cursor the entry following,
3496 or preceding, the current one.  If @kbd{n} is given while the
3497 cursor is on the last entry of the PO file, or if @kbd{p}
3498 is given while the cursor is on the first entry, no move is done.
3499
3500 @efindex <@r{, PO Mode command}
3501 @efindex po-first-entry@r{, PO Mode command}
3502 @efindex >@r{, PO Mode command}
3503 @efindex po-last-entry@r{, PO Mode command}
3504 The commands @kbd{<} (@code{po-first-entry}) and @kbd{>}
3505 (@code{po-last-entry}) move the cursor to the first entry, or last
3506 entry, of the PO file.  When the cursor is located past the last
3507 entry in a PO file, most PO mode commands will return an error saying
3508 @samp{After last entry}.  Moreover, the commands @kbd{<} and @kbd{>}
3509 have the special property of being able to work even when the cursor
3510 is not into some PO file entry, and one may use them for nicely
3511 correcting this situation.  But even these commands will fail on a
3512 truly empty PO file.  There are development plans for the PO mode for it
3513 to interactively fill an empty PO file from sources.  @xref{Marking}.
3514
3515 The translator may decide, before working at the translation of
3516 a particular entry, that she needs to browse the remainder of the
3517 PO file, maybe for finding the terminology or phraseology used
3518 in related entries.  She can of course use the standard Emacs idioms
3519 for saving the current cursor location in some register, and use that
3520 register for getting back, or else, use the location ring.
3521
3522 @efindex m@r{, PO Mode command}
3523 @efindex po-push-location@r{, PO Mode command}
3524 @efindex r@r{, PO Mode command}
3525 @efindex po-pop-location@r{, PO Mode command}
3526 PO mode offers another approach, by which cursor locations may be saved
3527 onto a special stack.  The command @kbd{m} (@code{po-push-location})
3528 merely adds the location of current entry to the stack, pushing
3529 the already saved locations under the new one.  The command
3530 @kbd{r} (@code{po-pop-location}) consumes the top stack element and
3531 repositions the cursor to the entry associated with that top element.
3532 This position is then lost, for the next @kbd{r} will move the cursor
3533 to the previously saved location, and so on until no locations remain
3534 on the stack.
3535
3536 If the translator wants the position to be kept on the location stack,
3537 maybe for taking a look at the entry associated with the top
3538 element, then go elsewhere with the intent of getting back later, she
3539 ought to use @kbd{m} immediately after @kbd{r}.
3540
3541 @efindex x@r{, PO Mode command}
3542 @efindex po-exchange-location@r{, PO Mode command}
3543 The command @kbd{x} (@code{po-exchange-location}) simultaneously
3544 repositions the cursor to the entry associated with the top element of
3545 the stack of saved locations, and replaces that top element with the
3546 location of the current entry before the move.  Consequently, repeating
3547 the @kbd{x} command toggles alternatively between two entries.
3548 For achieving this, the translator will position the cursor on the
3549 first entry, use @kbd{m}, then position to the second entry, and
3550 merely use @kbd{x} for making the switch.
3551
3552 @node Normalizing, Translated Entries, Entry Positioning, PO Mode
3553 @subsection Normalizing Strings in Entries
3554 @cindex string normalization in entries
3555
3556 There are many different ways for encoding a particular string into a
3557 PO file entry, because there are so many different ways to split and
3558 quote multi-line strings, and even, to represent special characters
3559 by backslashed escaped sequences.  Some features of PO mode rely on
3560 the ability for PO mode to scan an already existing PO file for a
3561 particular string encoded into the @code{msgid} field of some entry.
3562 Even if PO mode has internally all the built-in machinery for
3563 implementing this recognition easily, doing it fast is technically
3564 difficult.  To facilitate a solution to this efficiency problem,
3565 we decided on a canonical representation for strings.
3566
3567 A conventional representation of strings in a PO file is currently
3568 under discussion, and PO mode experiments with a canonical representation.
3569 Having both @code{xgettext} and PO mode converging towards a uniform
3570 way of representing equivalent strings would be useful, as the internal
3571 normalization needed by PO mode could be automatically satisfied
3572 when using @code{xgettext} from GNU @code{gettext}.  An explicit
3573 PO mode normalization should then be only necessary for PO files
3574 imported from elsewhere, or for when the convention itself evolves.
3575
3576 So, for achieving normalization of at least the strings of a given
3577 PO file needing a canonical representation, the following PO mode
3578 command is available:
3579
3580 @emindex string normalization in entries
3581 @table @kbd
3582 @item M-x po-normalize
3583 @efindex po-normalize@r{, PO Mode command}
3584 Tidy the whole PO file by making entries more uniform.
3585
3586 @end table
3587
3588 The special command @kbd{M-x po-normalize}, which has no associated
3589 keys, revises all entries, ensuring that strings of both original
3590 and translated entries use uniform internal quoting in the PO file.
3591 It also removes any crumb after the last entry.  This command may be
3592 useful for PO files freshly imported from elsewhere, or if we ever
3593 improve on the canonical quoting format we use.  This canonical format
3594 is not only meant for getting cleaner PO files, but also for greatly
3595 speeding up @code{msgid} string lookup for some other PO mode commands.
3596
3597 @kbd{M-x po-normalize} presently makes three passes over the entries.
3598 The first implements heuristics for converting PO files for GNU
3599 @code{gettext} 0.6 and earlier, in which @code{msgid} and @code{msgstr}
3600 fields were using K&R style C string syntax for multi-line strings.
3601 These heuristics may fail for comments not related to obsolete
3602 entries and ending with a backslash; they also depend on subsequent
3603 passes for finalizing the proper commenting of continued lines for
3604 obsolete entries.  This first pass might disappear once all oldish PO
3605 files would have been adjusted.  The second and third pass normalize
3606 all @code{msgid} and @code{msgstr} strings respectively.  They also
3607 clean out those trailing backslashes used by XView's @code{msgfmt}
3608 for continued lines.
3609
3610 @cindex importing PO files
3611 Having such an explicit normalizing command allows for importing PO
3612 files from other sources, but also eases the evolution of the current
3613 convention, evolution driven mostly by aesthetic concerns, as of now.
3614 It is easy to make suggested adjustments at a later time, as the
3615 normalizing command and eventually, other GNU @code{gettext} tools
3616 should greatly automate conformance.  A description of the canonical
3617 string format is given below, for the particular benefit of those not
3618 having Emacs handy, and who would nevertheless want to handcraft
3619 their PO files in nice ways.
3620
3621 @cindex multi-line strings
3622 Right now, in PO mode, strings are single line or multi-line.  A string
3623 goes multi-line if and only if it has @emph{embedded} newlines, that
3624 is, if it matches @samp{[^\n]\n+[^\n]}.  So, we would have:
3625
3626 @example
3627 msgstr "\n\nHello, world!\n\n\n"
3628 @end example
3629
3630 but, replacing the space by a newline, this becomes:
3631
3632 @example
3633 msgstr ""
3634 "\n"
3635 "\n"
3636 "Hello,\n"
3637 "world!\n"
3638 "\n"
3639 "\n"
3640 @end example
3641
3642 We are deliberately using a caricatural example, here, to make the
3643 point clearer.  Usually, multi-lines are not that bad looking.
3644 It is probable that we will implement the following suggestion.
3645 We might lump together all initial newlines into the empty string,
3646 and also all newlines introducing empty lines (that is, for @w{@var{n}
3647 > 1}, the @var{n}-1'th last newlines would go together on a separate
3648 string), so making the previous example appear:
3649
3650 @example
3651 msgstr "\n\n"
3652 "Hello,\n"
3653 "world!\n"
3654 "\n\n"
3655 @end example
3656
3657 There are a few yet undecided little points about string normalization,
3658 to be documented in this manual, once these questions settle.
3659
3660 @node Translated Entries, Fuzzy Entries, Normalizing, PO Mode
3661 @subsection Translated Entries
3662 @cindex translated entries
3663
3664 Each PO file entry for which the @code{msgstr} field has been filled with
3665 a translation, and which is not marked as fuzzy (@pxref{Fuzzy Entries}),
3666 is said to be a @dfn{translated} entry.  Only translated entries will
3667 later be compiled by GNU @code{msgfmt} and become usable in programs.
3668 Other entry types will be excluded; translation will not occur for them.
3669
3670 @emindex moving by translated entries
3671 Some commands are more specifically related to translated entry processing.
3672
3673 @table @kbd
3674 @item t
3675 @efindex t@r{, PO Mode command}
3676 Find the next translated entry (@code{po-next-translated-entry}).
3677
3678 @item T
3679 @efindex T@r{, PO Mode command}
3680 Find the previous translated entry (@code{po-previous-translated-entry}).
3681
3682 @end table
3683
3684 @efindex t@r{, PO Mode command}
3685 @efindex po-next-translated-entry@r{, PO Mode command}
3686 @efindex T@r{, PO Mode command}
3687 @efindex po-previous-translated-entry@r{, PO Mode command}
3688 The commands @kbd{t} (@code{po-next-translated-entry}) and @kbd{T}
3689 (@code{po-previous-translated-entry}) move forwards or backwards, chasing
3690 for an translated entry.  If none is found, the search is extended and
3691 wraps around in the PO file buffer.
3692
3693 @evindex po-auto-fuzzy-on-edit@r{, PO Mode variable}
3694 Translated entries usually result from the translator having edited in
3695 a translation for them, @ref{Modifying Translations}.  However, if the
3696 variable @code{po-auto-fuzzy-on-edit} is not @code{nil}, the entry having
3697 received a new translation first becomes a fuzzy entry, which ought to
3698 be later unfuzzied before becoming an official, genuine translated entry.
3699 @xref{Fuzzy Entries}.
3700
3701 @node Fuzzy Entries, Untranslated Entries, Translated Entries, PO Mode
3702 @subsection Fuzzy Entries
3703 @cindex fuzzy entries
3704
3705 @cindex attributes of a PO file entry
3706 @cindex attribute, fuzzy
3707 Each PO file entry may have a set of @dfn{attributes}, which are
3708 qualities given a name and explicitly associated with the translation,
3709 using a special system comment.  One of these attributes
3710 has the name @code{fuzzy}, and entries having this attribute are said
3711 to have a fuzzy translation.  They are called fuzzy entries, for short.
3712
3713 Fuzzy entries, even if they account for translated entries for
3714 most other purposes, usually call for revision by the translator.
3715 Those may be produced by applying the program @code{msgmerge} to
3716 update an older translated PO files according to a new PO template
3717 file, when this tool hypothesises that some new @code{msgid} has
3718 been modified only slightly out of an older one, and chooses to pair
3719 what it thinks to be the old translation for the new modified entry.
3720 The slight alteration in the original string (the @code{msgid} string)
3721 should often be reflected in the translated string, and this requires
3722 the intervention of the translator.  For this reason, @code{msgmerge}
3723 might mark some entries as being fuzzy.
3724
3725 @emindex moving by fuzzy entries
3726 Also, the translator may decide herself to mark an entry as fuzzy
3727 for her own convenience, when she wants to remember that the entry
3728 has to be later revisited.  So, some commands are more specifically
3729 related to fuzzy entry processing.
3730
3731 @table @kbd
3732 @item f
3733 @efindex f@r{, PO Mode command}
3734 @c better append "-entry" all the time. -ke-
3735 Find the next fuzzy entry (@code{po-next-fuzzy-entry}).
3736
3737 @item F
3738 @efindex F@r{, PO Mode command}
3739 Find the previous fuzzy entry (@code{po-previous-fuzzy-entry}).
3740
3741 @item @key{TAB}
3742 @efindex TAB@r{, PO Mode command}
3743 Remove the fuzzy attribute of the current entry (@code{po-unfuzzy}).
3744
3745 @end table
3746
3747 @efindex f@r{, PO Mode command}
3748 @efindex po-next-fuzzy-entry@r{, PO Mode command}
3749 @efindex F@r{, PO Mode command}
3750 @efindex po-previous-fuzzy-entry@r{, PO Mode command}
3751 The commands @kbd{f} (@code{po-next-fuzzy-entry}) and @kbd{F}
3752 (@code{po-previous-fuzzy-entry}) move forwards or backwards, chasing for
3753 a fuzzy entry.  If none is found, the search is extended and wraps
3754 around in the PO file buffer.
3755
3756 @efindex TAB@r{, PO Mode command}
3757 @efindex po-unfuzzy@r{, PO Mode command}
3758 @evindex po-auto-select-on-unfuzzy@r{, PO Mode variable}
3759 The command @kbd{@key{TAB}} (@code{po-unfuzzy}) removes the fuzzy
3760 attribute associated with an entry, usually leaving it translated.
3761 Further, if the variable @code{po-auto-select-on-unfuzzy} has not
3762 the @code{nil} value, the @kbd{@key{TAB}} command will automatically chase
3763 for another interesting entry to work on.  The initial value of
3764 @code{po-auto-select-on-unfuzzy} is @code{nil}.
3765
3766 The initial value of @code{po-auto-fuzzy-on-edit} is @code{nil}.  However,
3767 if the variable @code{po-auto-fuzzy-on-edit} is set to @code{t}, any entry
3768 edited through the @kbd{@key{RET}} command is marked fuzzy, as a way to
3769 ensure some kind of double check, later.  In this case, the usual paradigm
3770 is that an entry becomes fuzzy (if not already) whenever the translator
3771 modifies it.  If she is satisfied with the translation, she then uses
3772 @kbd{@key{TAB}} to pick another entry to work on, clearing the fuzzy attribute
3773 on the same blow.  If she is not satisfied yet, she merely uses @kbd{@key{SPC}}
3774 to chase another entry, leaving the entry fuzzy.
3775
3776 @efindex DEL@r{, PO Mode command}
3777 @efindex po-fade-out-entry@r{, PO Mode command}
3778 The translator may also use the @kbd{@key{DEL}} command
3779 (@code{po-fade-out-entry}) over any translated entry to mark it as being
3780 fuzzy, when she wants to easily leave a trace she wants to later return
3781 working at this entry.
3782
3783 Also, when time comes to quit working on a PO file buffer with the @kbd{q}
3784 command, the translator is asked for confirmation, if fuzzy string
3785 still exists.
3786
3787 @node Untranslated Entries, Obsolete Entries, Fuzzy Entries, PO Mode
3788 @subsection Untranslated Entries
3789 @cindex untranslated entries
3790
3791 When @code{xgettext} originally creates a PO file, unless told
3792 otherwise, it initializes the @code{msgid} field with the untranslated
3793 string, and leaves the @code{msgstr} string to be empty.  Such entries,
3794 having an empty translation, are said to be @dfn{untranslated} entries.
3795 Later, when the programmer slightly modifies some string right in
3796 the program, this change is later reflected in the PO file
3797 by the appearance of a new untranslated entry for the modified string.
3798
3799 The usual commands moving from entry to entry consider untranslated
3800 entries on the same level as active entries.  Untranslated entries
3801 are easily recognizable by the fact they end with @w{@samp{msgstr ""}}.
3802
3803 @emindex moving by untranslated entries
3804 The work of the translator might be (quite naively) seen as the process
3805 of seeking for an untranslated entry, editing a translation for
3806 it, and repeating these actions until no untranslated entries remain.
3807 Some commands are more specifically related to untranslated entry
3808 processing.
3809
3810 @table @kbd
3811 @item u
3812 @efindex u@r{, PO Mode command}
3813 Find the next untranslated entry (@code{po-next-untranslated-entry}).
3814
3815 @item U
3816 @efindex U@r{, PO Mode command}
3817 Find the previous untranslated entry (@code{po-previous-untransted-entry}).
3818
3819 @item k
3820 @efindex k@r{, PO Mode command}
3821 Turn the current entry into an untranslated one (@code{po-kill-msgstr}).
3822
3823 @end table
3824
3825 @efindex u@r{, PO Mode command}
3826 @efindex po-next-untranslated-entry@r{, PO Mode command}
3827 @efindex U@r{, PO Mode command}
3828 @efindex po-previous-untransted-entry@r{, PO Mode command}
3829 The commands @kbd{u} (@code{po-next-untranslated-entry}) and @kbd{U}
3830 (@code{po-previous-untransted-entry}) move forwards or backwards,
3831 chasing for an untranslated entry.  If none is found, the search is
3832 extended and wraps around in the PO file buffer.
3833
3834 @efindex k@r{, PO Mode command}
3835 @efindex po-kill-msgstr@r{, PO Mode command}
3836 An entry can be turned back into an untranslated entry by
3837 merely emptying its translation, using the command @kbd{k}
3838 (@code{po-kill-msgstr}).  @xref{Modifying Translations}.
3839
3840 Also, when time comes to quit working on a PO file buffer
3841 with the @kbd{q} command, the translator is asked for confirmation,
3842 if some untranslated string still exists.
3843
3844 @node Obsolete Entries, Modifying Translations, Untranslated Entries, PO Mode
3845 @subsection Obsolete Entries
3846 @cindex obsolete entries
3847
3848 By @dfn{obsolete} PO file entries, we mean those entries which are
3849 commented out, usually by @code{msgmerge} when it found that the
3850 translation is not needed anymore by the package being localized.
3851
3852 The usual commands moving from entry to entry consider obsolete
3853 entries on the same level as active entries.  Obsolete entries are
3854 easily recognizable by the fact that all their lines start with
3855 @code{#}, even those lines containing @code{msgid} or @code{msgstr}.
3856
3857 Commands exist for emptying the translation or reinitializing it
3858 to the original untranslated string.  Commands interfacing with the
3859 kill ring may force some previously saved text into the translation.
3860 The user may interactively edit the translation.  All these commands
3861 may apply to obsolete entries, carefully leaving the entry obsolete
3862 after the fact.
3863
3864 @emindex moving by obsolete entries
3865 Moreover, some commands are more specifically related to obsolete
3866 entry processing.
3867
3868 @table @kbd
3869 @item o
3870 @efindex o@r{, PO Mode command}
3871 Find the next obsolete entry (@code{po-next-obsolete-entry}).
3872
3873 @item O
3874 @efindex O@r{, PO Mode command}
3875 Find the previous obsolete entry (@code{po-previous-obsolete-entry}).
3876
3877 @item @key{DEL}
3878 @efindex DEL@r{, PO Mode command}
3879 Make an active entry obsolete, or zap out an obsolete entry
3880 (@code{po-fade-out-entry}).
3881
3882 @end table
3883
3884 @efindex o@r{, PO Mode command}
3885 @efindex po-next-obsolete-entry@r{, PO Mode command}
3886 @efindex O@r{, PO Mode command}
3887 @efindex po-previous-obsolete-entry@r{, PO Mode command}
3888 The commands @kbd{o} (@code{po-next-obsolete-entry}) and @kbd{O}
3889 (@code{po-previous-obsolete-entry}) move forwards or backwards,
3890 chasing for an obsolete entry.  If none is found, the search is
3891 extended and wraps around in the PO file buffer.
3892
3893 PO mode does not provide ways for un-commenting an obsolete entry
3894 and making it active, because this would reintroduce an original
3895 untranslated string which does not correspond to any marked string
3896 in the program sources.  This goes with the philosophy of never
3897 introducing useless @code{msgid} values.
3898
3899 @efindex DEL@r{, PO Mode command}
3900 @efindex po-fade-out-entry@r{, PO Mode command}
3901 @emindex obsolete active entry
3902 @emindex comment out PO file entry
3903 However, it is possible to comment out an active entry, so making
3904 it obsolete.  GNU @code{gettext} utilities will later react to the
3905 disappearance of a translation by using the untranslated string.
3906 The command @kbd{@key{DEL}} (@code{po-fade-out-entry}) pushes the current entry
3907 a little further towards annihilation.  If the entry is active (it is a
3908 translated entry), then it is first made fuzzy.  If it is already fuzzy,
3909 then the entry is merely commented out, with confirmation.  If the entry
3910 is already obsolete, then it is completely deleted from the PO file.
3911 It is easy to recycle the translation so deleted into some other PO file
3912 entry, usually one which is untranslated.  @xref{Modifying Translations}.
3913
3914 Here is a quite interesting problem to solve for later development of
3915 PO mode, for those nights you are not sleepy.  The idea would be that
3916 PO mode might become bright enough, one of these days, to make good
3917 guesses at retrieving the most probable candidate, among all obsolete
3918 entries, for initializing the translation of a newly appeared string.
3919 I think it might be a quite hard problem to do this algorithmically, as
3920 we have to develop good and efficient measures of string similarity.
3921 Right now, PO mode completely lets the decision to the translator,
3922 when the time comes to find the adequate obsolete translation, it
3923 merely tries to provide handy tools for helping her to do so.
3924
3925 @node Modifying Translations, Modifying Comments, Obsolete Entries, PO Mode
3926 @subsection Modifying Translations
3927 @cindex editing translations
3928 @emindex editing translations
3929
3930 PO mode prevents direct modification of the PO file, by the usual
3931 means Emacs gives for altering a buffer's contents.  By doing so,
3932 it pretends helping the translator to avoid little clerical errors
3933 about the overall file format, or the proper quoting of strings,
3934 as those errors would be easily made.  Other kinds of errors are
3935 still possible, but some may be caught and diagnosed by the batch
3936 validation process, which the translator may always trigger by the
3937 @kbd{V} command.  For all other errors, the translator has to rely on
3938 her own judgment, and also on the linguistic reports submitted to her
3939 by the users of the translated package, having the same mother tongue.
3940
3941 When the time comes to create a translation, correct an error diagnosed
3942 mechanically or reported by a user, the translators have to resort to
3943 using the following commands for modifying the translations.
3944
3945 @table @kbd
3946 @item @key{RET}
3947 @efindex RET@r{, PO Mode command}
3948 Interactively edit the translation (@code{po-edit-msgstr}).
3949
3950 @item @key{LFD}
3951 @itemx C-j
3952 @efindex LFD@r{, PO Mode command}
3953 @efindex C-j@r{, PO Mode command}
3954 Reinitialize the translation with the original, untranslated string
3955 (@code{po-msgid-to-msgstr}).
3956
3957 @item k
3958 @efindex k@r{, PO Mode command}
3959 Save the translation on the kill ring, and delete it (@code{po-kill-msgstr}).
3960
3961 @item w
3962 @efindex w@r{, PO Mode command}
3963 Save the translation on the kill ring, without deleting it
3964 (@code{po-kill-ring-save-msgstr}).
3965
3966 @item y
3967 @efindex y@r{, PO Mode command}
3968 Replace the translation, taking the new from the kill ring
3969 (@code{po-yank-msgstr}).
3970
3971 @end table
3972
3973 @efindex RET@r{, PO Mode command}
3974 @efindex po-edit-msgstr@r{, PO Mode command}
3975 The command @kbd{@key{RET}} (@code{po-edit-msgstr}) opens a new Emacs
3976 window meant to edit in a new translation, or to modify an already existing
3977 translation.  The new window contains a copy of the translation taken from
3978 the current PO file entry, all ready for edition, expunged of all quoting
3979 marks, fully modifiable and with the complete extent of Emacs modifying
3980 commands.  When the translator is done with her modifications, she may use
3981 @w{@kbd{C-c C-c}} to close the subedit window with the automatically requoted
3982 results, or @w{@kbd{C-c C-k}} to abort her modifications.  @xref{Subedit},
3983 for more information.
3984
3985 @efindex LFD@r{, PO Mode command}
3986 @efindex C-j@r{, PO Mode command}
3987 @efindex po-msgid-to-msgstr@r{, PO Mode command}
3988 The command @kbd{@key{LFD}} (@code{po-msgid-to-msgstr}) initializes, or
3989 reinitializes the translation with the original string.  This command is
3990 normally used when the translator wants to redo a fresh translation of
3991 the original string, disregarding any previous work.
3992
3993 @evindex po-auto-edit-with-msgid@r{, PO Mode variable}
3994 It is possible to arrange so, whenever editing an untranslated
3995 entry, the @kbd{@key{LFD}} command be automatically executed.  If you set
3996 @code{po-auto-edit-with-msgid} to @code{t}, the translation gets
3997 initialised with the original string, in case none exists already.
3998 The default value for @code{po-auto-edit-with-msgid} is @code{nil}.
3999
4000 @emindex starting a string translation
4001 In fact, whether it is best to start a translation with an empty
4002 string, or rather with a copy of the original string, is a matter of
4003 taste or habit.  Sometimes, the source language and the
4004 target language are so different that is simply best to start writing
4005 on an empty page.  At other times, the source and target languages
4006 are so close that it would be a waste to retype a number of words
4007 already being written in the original string.  A translator may also
4008 like having the original string right under her eyes, as she will
4009 progressively overwrite the original text with the translation, even
4010 if this requires some extra editing work to get rid of the original.
4011
4012 @emindex cut and paste for translated strings
4013 @efindex k@r{, PO Mode command}
4014 @efindex po-kill-msgstr@r{, PO Mode command}
4015 @efindex w@r{, PO Mode command}
4016 @efindex po-kill-ring-save-msgstr@r{, PO Mode command}
4017 The command @kbd{k} (@code{po-kill-msgstr}) merely empties the
4018 translation string, so turning the entry into an untranslated
4019 one.  But while doing so, its previous contents is put apart in
4020 a special place, known as the kill ring.  The command @kbd{w}
4021 (@code{po-kill-ring-save-msgstr}) has also the effect of taking a
4022 copy of the translation onto the kill ring, but it otherwise leaves
4023 the entry alone, and does @emph{not} remove the translation from the
4024 entry.  Both commands use exactly the Emacs kill ring, which is shared
4025 between buffers, and which is well known already to Emacs lovers.
4026
4027 The translator may use @kbd{k} or @kbd{w} many times in the course
4028 of her work, as the kill ring may hold several saved translations.
4029 From the kill ring, strings may later be reinserted in various
4030 Emacs buffers.  In particular, the kill ring may be used for moving
4031 translation strings between different entries of a single PO file
4032 buffer, or if the translator is handling many such buffers at once,
4033 even between PO files.
4034
4035 To facilitate exchanges with buffers which are not in PO mode, the
4036 translation string put on the kill ring by the @kbd{k} command is fully
4037 unquoted before being saved: external quotes are removed, multi-line
4038 strings are concatenated, and backslash escaped sequences are turned
4039 into their corresponding characters.  In the special case of obsolete
4040 entries, the translation is also uncommented prior to saving.
4041
4042 @efindex y@r{, PO Mode command}
4043 @efindex po-yank-msgstr@r{, PO Mode command}
4044 The command @kbd{y} (@code{po-yank-msgstr}) completely replaces the
4045 translation of the current entry by a string taken from the kill ring.
4046 Following Emacs terminology, we then say that the replacement
4047 string is @dfn{yanked} into the PO file buffer.
4048 @xref{Yanking, , , emacs, The Emacs Editor}.
4049 The first time @kbd{y} is used, the translation receives the value of
4050 the most recent addition to the kill ring.  If @kbd{y} is typed once
4051 again, immediately, without intervening keystrokes, the translation
4052 just inserted is taken away and replaced by the second most recent
4053 addition to the kill ring.  By repeating @kbd{y} many times in a row,
4054 the translator may travel along the kill ring for saved strings,
4055 until she finds the string she really wanted.
4056
4057 When a string is yanked into a PO file entry, it is fully and
4058 automatically requoted for complying with the format PO files should
4059 have.  Further, if the entry is obsolete, PO mode then appropriately
4060 push the inserted string inside comments.  Once again, translators
4061 should not burden themselves with quoting considerations besides, of
4062 course, the necessity of the translated string itself respective to
4063 the program using it.
4064
4065 Note that @kbd{k} or @kbd{w} are not the only commands pushing strings
4066 on the kill ring, as almost any PO mode command replacing translation
4067 strings (or the translator comments) automatically saves the old string
4068 on the kill ring.  The main exceptions to this general rule are the
4069 yanking commands themselves.
4070
4071 @emindex using obsolete translations to make new entries
4072 To better illustrate the operation of killing and yanking, let's
4073 use an actual example, taken from a common situation.  When the
4074 programmer slightly modifies some string right in the program, his
4075 change is later reflected in the PO file by the appearance
4076 of a new untranslated entry for the modified string, and the fact
4077 that the entry translating the original or unmodified string becomes
4078 obsolete.  In many cases, the translator might spare herself some work
4079 by retrieving the unmodified translation from the obsolete entry,
4080 then initializing the untranslated entry @code{msgstr} field with
4081 this retrieved translation.  Once this done, the obsolete entry is
4082 not wanted anymore, and may be safely deleted.
4083
4084 When the translator finds an untranslated entry and suspects that a
4085 slight variant of the translation exists, she immediately uses @kbd{m}
4086 to mark the current entry location, then starts chasing obsolete
4087 entries with @kbd{o}, hoping to find some translation corresponding
4088 to the unmodified string.  Once found, she uses the @kbd{@key{DEL}} command
4089 for deleting the obsolete entry, knowing that @kbd{@key{DEL}} also @emph{kills}
4090 the translation, that is, pushes the translation on the kill ring.
4091 Then, @kbd{r} returns to the initial untranslated entry, and @kbd{y}
4092 then @emph{yanks} the saved translation right into the @code{msgstr}
4093 field.  The translator is then free to use @kbd{@key{RET}} for fine
4094 tuning the translation contents, and maybe to later use @kbd{u},
4095 then @kbd{m} again, for going on with the next untranslated string.
4096
4097 When some sequence of keys has to be typed over and over again, the
4098 translator may find it useful to become better acquainted with the Emacs
4099 capability of learning these sequences and playing them back under request.
4100 @xref{Keyboard Macros, , , emacs, The Emacs Editor}.
4101
4102 @node Modifying Comments, Subedit, Modifying Translations, PO Mode
4103 @subsection Modifying Comments
4104 @cindex editing comments in PO files
4105 @emindex editing comments
4106
4107 Any translation work done seriously will raise many linguistic
4108 difficulties, for which decisions have to be made, and the choices
4109 further documented.  These documents may be saved within the
4110 PO file in form of translator comments, which the translator
4111 is free to create, delete, or modify at will.  These comments may
4112 be useful to herself when she returns to this PO file after a while.
4113
4114 Comments not having whitespace after the initial @samp{#}, for example,
4115 those beginning with @samp{#.} or @samp{#:}, are @emph{not} translator
4116 comments, they are exclusively created by other @code{gettext} tools.
4117 So, the commands below will never alter such system added comments,
4118 they are not meant for the translator to modify.  @xref{PO Files}.
4119
4120 The following commands are somewhat similar to those modifying translations,
4121 so the general indications given for those apply here.  @xref{Modifying
4122 Translations}.
4123
4124 @table @kbd
4125
4126 @item #
4127 @efindex #@r{, PO Mode command}
4128 Interactively edit the translator comments (@code{po-edit-comment}).
4129
4130 @item K
4131 @efindex K@r{, PO Mode command}
4132 Save the translator comments on the kill ring, and delete it
4133 (@code{po-kill-comment}).
4134
4135 @item W
4136 @efindex W@r{, PO Mode command}
4137 Save the translator comments on the kill ring, without deleting it
4138 (@code{po-kill-ring-save-comment}).
4139
4140 @item Y
4141 @efindex Y@r{, PO Mode command}
4142 Replace the translator comments, taking the new from the kill ring
4143 (@code{po-yank-comment}).
4144
4145 @end table
4146
4147 These commands parallel PO mode commands for modifying the translation
4148 strings, and behave much the same way as they do, except that they handle
4149 this part of PO file comments meant for translator usage, rather
4150 than the translation strings.  So, if the descriptions given below are
4151 slightly succinct, it is because the full details have already been given.
4152 @xref{Modifying Translations}.
4153
4154 @efindex #@r{, PO Mode command}
4155 @efindex po-edit-comment@r{, PO Mode command}
4156 The command @kbd{#} (@code{po-edit-comment}) opens a new Emacs window
4157 containing a copy of the translator comments on the current PO file entry.
4158 If there are no such comments, PO mode understands that the translator wants
4159 to add a comment to the entry, and she is presented with an empty screen.
4160 Comment marks (@code{#}) and the space following them are automatically
4161 removed before edition, and reinstated after.  For translator comments
4162 pertaining to obsolete entries, the uncommenting and recommenting operations
4163 are done twice.  Once in the editing window, the keys @w{@kbd{C-c C-c}}
4164 allow the translator to tell she is finished with editing the comment.
4165 @xref{Subedit}, for further details.
4166
4167 @evindex po-subedit-mode-hook@r{, PO Mode variable}
4168 Functions found on @code{po-subedit-mode-hook}, if any, are executed after
4169 the string has been inserted in the edit buffer.
4170
4171 @efindex K@r{, PO Mode command}
4172 @efindex po-kill-comment@r{, PO Mode command}
4173 @efindex W@r{, PO Mode command}
4174 @efindex po-kill-ring-save-comment@r{, PO Mode command}
4175 @efindex Y@r{, PO Mode command}
4176 @efindex po-yank-comment@r{, PO Mode command}
4177 The command @kbd{K} (@code{po-kill-comment}) gets rid of all
4178 translator comments, while saving those comments on the kill ring.
4179 The command @kbd{W} (@code{po-kill-ring-save-comment}) takes
4180 a copy of the translator comments on the kill ring, but leaves
4181 them undisturbed in the current entry.  The command @kbd{Y}
4182 (@code{po-yank-comment}) completely replaces the translator comments
4183 by a string taken at the front of the kill ring.  When this command
4184 is immediately repeated, the comments just inserted are withdrawn,
4185 and replaced by other strings taken along the kill ring.
4186
4187 On the kill ring, all strings have the same nature.  There is no
4188 distinction between @emph{translation} strings and @emph{translator
4189 comments} strings.  So, for example, let's presume the translator
4190 has just finished editing a translation, and wants to create a new
4191 translator comment to document why the previous translation was
4192 not good, just to remember what was the problem.  Foreseeing that she
4193 will do that in her documentation, the translator may want to quote
4194 the previous translation in her translator comments.  To do so, she
4195 may initialize the translator comments with the previous translation,
4196 still at the head of the kill ring.  Because editing already pushed the
4197 previous translation on the kill ring, she merely has to type @kbd{M-w}
4198 prior to @kbd{#}, and the previous translation will be right there,
4199 all ready for being introduced by some explanatory text.
4200
4201 On the other hand, presume there are some translator comments already
4202 and that the translator wants to add to those comments, instead
4203 of wholly replacing them.  Then, she should edit the comment right
4204 away with @kbd{#}.  Once inside the editing window, she can use the
4205 regular Emacs commands @kbd{C-y} (@code{yank}) and @kbd{M-y}
4206 (@code{yank-pop}) to get the previous translation where she likes.
4207
4208 @node Subedit, C Sources Context, Modifying Comments, PO Mode
4209 @subsection Details of Sub Edition
4210 @emindex subedit minor mode
4211
4212 The PO subedit minor mode has a few peculiarities worth being described
4213 in fuller detail.  It installs a few commands over the usual editing set
4214 of Emacs, which are described below.
4215
4216 @table @kbd
4217 @item C-c C-c
4218 @efindex C-c C-c@r{, PO Mode command}
4219 Complete edition (@code{po-subedit-exit}).
4220
4221 @item C-c C-k
4222 @efindex C-c C-k@r{, PO Mode command}
4223 Abort edition (@code{po-subedit-abort}).
4224
4225 @item C-c C-a
4226 @efindex C-c C-a@r{, PO Mode command}
4227 Consult auxiliary PO files (@code{po-subedit-cycle-auxiliary}).
4228
4229 @end table
4230
4231 @emindex exiting PO subedit
4232 @efindex C-c C-c@r{, PO Mode command}
4233 @efindex po-subedit-exit@r{, PO Mode command}
4234 The window's contents represents a translation for a given message,
4235 or a translator comment.  The translator may modify this window to
4236 her heart's content.  Once this is done, the command @w{@kbd{C-c C-c}}
4237 (@code{po-subedit-exit}) may be used to return the edited translation into
4238 the PO file, replacing the original translation, even if it moved out of
4239 sight or if buffers were switched.
4240
4241 @efindex C-c C-k@r{, PO Mode command}
4242 @efindex po-subedit-abort@r{, PO Mode command}
4243 If the translator becomes unsatisfied with her translation or comment,
4244 to the extent she prefers keeping what was existent prior to the
4245 @kbd{@key{RET}} or @kbd{#} command, she may use the command @w{@kbd{C-c C-k}}
4246 (@code{po-subedit-abort}) to merely get rid of edition, while preserving
4247 the original translation or comment.  Another way would be for her to exit
4248 normally with @w{@kbd{C-c C-c}}, then type @code{U} once for undoing the
4249 whole effect of last edition.
4250
4251 @efindex C-c C-a@r{, PO Mode command}
4252 @efindex po-subedit-cycle-auxiliary@r{, PO Mode command}
4253 The command @w{@kbd{C-c C-a}} (@code{po-subedit-cycle-auxiliary})
4254 allows for glancing through translations
4255 already achieved in other languages, directly while editing the current
4256 translation.  This may be quite convenient when the translator is fluent
4257 at many languages, but of course, only makes sense when such completed
4258 auxiliary PO files are already available to her (@pxref{Auxiliary}).
4259
4260 Functions found on @code{po-subedit-mode-hook}, if any, are executed after
4261 the string has been inserted in the edit buffer.
4262
4263 While editing her translation, the translator should pay attention to not
4264 inserting unwanted @kbd{@key{RET}} (newline) characters at the end of
4265 the translated string if those are not meant to be there, or to removing
4266 such characters when they are required.  Since these characters are not
4267 visible in the editing buffer, they are easily introduced by mistake.
4268 To help her, @kbd{@key{RET}} automatically puts the character @code{<}
4269 at the end of the string being edited, but this @code{<} is not really
4270 part of the string.  On exiting the editing window with @w{@kbd{C-c C-c}},
4271 PO mode automatically removes such @kbd{<} and all whitespace added after
4272 it.  If the translator adds characters after the terminating @code{<}, it
4273 looses its delimiting property and integrally becomes part of the string.
4274 If she removes the delimiting @code{<}, then the edited string is taken
4275 @emph{as is}, with all trailing newlines, even if invisible.  Also, if
4276 the translated string ought to end itself with a genuine @code{<}, then
4277 the delimiting @code{<} may not be removed; so the string should appear,
4278 in the editing window, as ending with two @code{<} in a row.
4279
4280 @emindex editing multiple entries
4281 When a translation (or a comment) is being edited, the translator may move
4282 the cursor back into the PO file buffer and freely move to other entries,
4283 browsing at will.  If, with an edition pending, the translator wanders in the
4284 PO file buffer, she may decide to start modifying another entry.  Each entry
4285 being edited has its own subedit buffer.  It is possible to simultaneously
4286 edit the translation @emph{and} the comment of a single entry, or to
4287 edit entries in different PO files, all at once.  Typing @kbd{@key{RET}}
4288 on a field already being edited merely resumes that particular edit.  Yet,
4289 the translator should better be comfortable at handling many Emacs windows!
4290
4291 @emindex pending subedits
4292 Pending subedits may be completed or aborted in any order, regardless
4293 of how or when they were started.  When many subedits are pending and the
4294 translator asks for quitting the PO file (with the @kbd{q} command), subedits
4295 are automatically resumed one at a time, so she may decide for each of them.
4296
4297 @node C Sources Context, Auxiliary, Subedit, PO Mode
4298 @subsection C Sources Context
4299 @emindex consulting program sources
4300 @emindex looking at the source to aid translation
4301 @emindex use the source, Luke
4302
4303 PO mode is particularly powerful when used with PO files
4304 created through GNU @code{gettext} utilities, as those utilities
4305 insert special comments in the PO files they generate.
4306 Some of these special comments relate the PO file entry to
4307 exactly where the untranslated string appears in the program sources.
4308
4309 When the translator gets to an untranslated entry, she is fairly
4310 often faced with an original string which is not as informative as
4311 it normally should be, being succinct, cryptic, or otherwise ambiguous.
4312 Before choosing how to translate the string, she needs to understand
4313 better what the string really means and how tight the translation has
4314 to be.  Most of the time, when problems arise, the only way left to make
4315 her judgment is looking at the true program sources from where this
4316 string originated, searching for surrounding comments the programmer
4317 might have put in there, and looking around for helping clues of
4318 @emph{any} kind.
4319
4320 Surely, when looking at program sources, the translator will receive
4321 more help if she is a fluent programmer.  However, even if she is
4322 not versed in programming and feels a little lost in C code, the
4323 translator should not be shy at taking a look, once in a while.
4324 It is most probable that she will still be able to find some of the
4325 hints she needs.  She will learn quickly to not feel uncomfortable
4326 in program code, paying more attention to programmer's comments,
4327 variable and function names (if he dared choosing them well), and
4328 overall organization, than to the program code itself.
4329
4330 @emindex find source fragment for a PO file entry
4331 The following commands are meant to help the translator at getting
4332 program source context for a PO file entry.
4333
4334 @table @kbd
4335 @item s
4336 @efindex s@r{, PO Mode command}
4337 Resume the display of a program source context, or cycle through them
4338 (@code{po-cycle-source-reference}).
4339
4340 @item M-s
4341 @efindex M-s@r{, PO Mode command}
4342 Display of a program source context selected by menu
4343 (@code{po-select-source-reference}).
4344
4345 @item S
4346 @efindex S@r{, PO Mode command}
4347 Add a directory to the search path for source files
4348 (@code{po-consider-source-path}).
4349
4350 @item M-S
4351 @efindex M-S@r{, PO Mode command}
4352 Delete a directory from the search path for source files
4353 (@code{po-ignore-source-path}).
4354
4355 @end table
4356
4357 @efindex s@r{, PO Mode command}
4358 @efindex po-cycle-source-reference@r{, PO Mode command}
4359 @efindex M-s@r{, PO Mode command}
4360 @efindex po-select-source-reference@r{, PO Mode command}
4361 The commands @kbd{s} (@code{po-cycle-source-reference}) and @kbd{M-s}
4362 (@code{po-select-source-reference}) both open another window displaying
4363 some source program file, and already positioned in such a way that
4364 it shows an actual use of the string to be translated.  By doing
4365 so, the command gives source program context for the string.  But if
4366 the entry has no source context references, or if all references
4367 are unresolved along the search path for program sources, then the
4368 command diagnoses this as an error.
4369
4370 Even if @kbd{s} (or @kbd{M-s}) opens a new window, the cursor stays
4371 in the PO file window.  If the translator really wants to
4372 get into the program source window, she ought to do it explicitly,
4373 maybe by using command @kbd{O}.
4374
4375 When @kbd{s} is typed for the first time, or for a PO file entry which
4376 is different of the last one used for getting source context, then the
4377 command reacts by giving the first context available for this entry,
4378 if any.  If some context has already been recently displayed for the
4379 current PO file entry, and the translator wandered off to do other
4380 things, typing @kbd{s} again will merely resume, in another window,
4381 the context last displayed.  In particular, if the translator moved
4382 the cursor away from the context in the source file, the command will
4383 bring the cursor back to the context.  By using @kbd{s} many times
4384 in a row, with no other commands intervening, PO mode will cycle to
4385 the next available contexts for this particular entry, getting back
4386 to the first context once the last has been shown.
4387
4388 The command @kbd{M-s} behaves differently.  Instead of cycling through
4389 references, it lets the translator choose a particular reference among
4390 many, and displays that reference.  It is best used with completion,
4391 if the translator types @kbd{@key{TAB}} immediately after @kbd{M-s}, in
4392 response to the question, she will be offered a menu of all possible
4393 references, as a reminder of which are the acceptable answers.
4394 This command is useful only where there are really many contexts
4395 available for a single string to translate.
4396
4397 @efindex S@r{, PO Mode command}
4398 @efindex po-consider-source-path@r{, PO Mode command}
4399 @efindex M-S@r{, PO Mode command}
4400 @efindex po-ignore-source-path@r{, PO Mode command}
4401 Program source files are usually found relative to where the PO
4402 file stands.  As a special provision, when this fails, the file is
4403 also looked for, but relative to the directory immediately above it.
4404 Those two cases take proper care of most PO files.  However, it might
4405 happen that a PO file has been moved, or is edited in a different
4406 place than its normal location.  When this happens, the translator
4407 should tell PO mode in which directory normally sits the genuine PO
4408 file.  Many such directories may be specified, and all together, they
4409 constitute what is called the @dfn{search path} for program sources.
4410 The command @kbd{S} (@code{po-consider-source-path}) is used to interactively
4411 enter a new directory at the front of the search path, and the command
4412 @kbd{M-S} (@code{po-ignore-source-path}) is used to select, with completion,
4413 one of the directories she does not want anymore on the search path.
4414
4415 @node Auxiliary,  , C Sources Context, PO Mode
4416 @subsection Consulting Auxiliary PO Files
4417 @emindex consulting translations to other languages
4418
4419 PO mode is able to help the knowledgeable translator, being fluent in
4420 many languages, at taking advantage of translations already achieved
4421 in other languages she just happens to know.  It provides these other
4422 language translations as additional context for her own work.  Moreover,
4423 it has features to ease the production of translations for many languages
4424 at once, for translators preferring to work in this way.
4425
4426 @cindex auxiliary PO file
4427 @emindex auxiliary PO file
4428 An @dfn{auxiliary} PO file is an existing PO file meant for the same
4429 package the translator is working on, but targeted to a different mother
4430 tongue language.  Commands exist for declaring and handling auxiliary
4431 PO files, and also for showing contexts for the entry under work.
4432
4433 Here are the auxiliary file commands available in PO mode.
4434
4435 @table @kbd
4436 @item a
4437 @efindex a@r{, PO Mode command}
4438 Seek auxiliary files for another translation for the same entry
4439 (@code{po-cycle-auxiliary}).
4440
4441 @item C-c C-a
4442 @efindex C-c C-a@r{, PO Mode command}
4443 Switch to a particular auxiliary file (@code{po-select-auxiliary}).
4444
4445 @item A
4446 @efindex A@r{, PO Mode command}
4447 Declare this PO file as an auxiliary file (@code{po-consider-as-auxiliary}).
4448
4449 @item M-A
4450 @efindex M-A@r{, PO Mode command}
4451 Remove this PO file from the list of auxiliary files
4452 (@code{po-ignore-as-auxiliary}).
4453
4454 @end table
4455
4456 @efindex A@r{, PO Mode command}
4457 @efindex po-consider-as-auxiliary@r{, PO Mode command}
4458 @efindex M-A@r{, PO Mode command}
4459 @efindex po-ignore-as-auxiliary@r{, PO Mode command}
4460 Command @kbd{A} (@code{po-consider-as-auxiliary}) adds the current
4461 PO file to the list of auxiliary files, while command @kbd{M-A}
4462 (@code{po-ignore-as-auxiliary} just removes it.
4463
4464 @efindex a@r{, PO Mode command}
4465 @efindex po-cycle-auxiliary@r{, PO Mode command}
4466 The command @kbd{a} (@code{po-cycle-auxiliary}) seeks all auxiliary PO
4467 files, round-robin, searching for a translated entry in some other language
4468 having an @code{msgid} field identical as the one for the current entry.
4469 The found PO file, if any, takes the place of the current PO file in
4470 the display (its window gets on top).  Before doing so, the current PO
4471 file is also made into an auxiliary file, if not already.  So, @kbd{a}
4472 in this newly displayed PO file will seek another PO file, and so on,
4473 so repeating @kbd{a} will eventually yield back the original PO file.
4474
4475 @efindex C-c C-a@r{, PO Mode command}
4476 @efindex po-select-auxiliary@r{, PO Mode command}
4477 The command @kbd{C-c C-a} (@code{po-select-auxiliary}) asks the translator
4478 for her choice of a particular auxiliary file, with completion, and
4479 then switches to that selected PO file.  The command also checks if
4480 the selected file has an @code{msgid} field identical as the one for
4481 the current entry, and if yes, this entry becomes current.  Otherwise,
4482 the cursor of the selected file is left undisturbed.
4483
4484 For all this to work fully, auxiliary PO files will have to be normalized,
4485 in that way that @code{msgid} fields should be written @emph{exactly}
4486 the same way.  It is possible to write @code{msgid} fields in various
4487 ways for representing the same string, different writing would break the
4488 proper behaviour of the auxiliary file commands of PO mode.  This is not
4489 expected to be much a problem in practice, as most existing PO files have
4490 their @code{msgid} entries written by the same GNU @code{gettext} tools.
4491
4492 @efindex normalize@r{, PO Mode command}
4493 However, PO files initially created by PO mode itself, while marking
4494 strings in source files, are normalised differently.  So are PO
4495 files resulting of the @samp{M-x normalize} command.  Until these
4496 discrepancies between PO mode and other GNU @code{gettext} tools get
4497 fully resolved, the translator should stay aware of normalisation issues.
4498
4499 @node Compendium,  , PO Mode, Editing
4500 @section Using Translation Compendia
4501 @emindex using translation compendia
4502
4503 @cindex compendium
4504 A @dfn{compendium} is a special PO file containing a set of
4505 translations recurring in many different packages.  The translator can
4506 use gettext tools to build a new compendium, to add entries to her
4507 compendium, and to initialize untranslated entries, or to update
4508 already translated entries, from translations kept in the compendium.
4509
4510 @menu
4511 * Creating Compendia::          Merging translations for later use
4512 * Using Compendia::             Using older translations if they fit
4513 @end menu
4514
4515 @node Creating Compendia, Using Compendia, Compendium, Compendium
4516 @subsection Creating Compendia
4517 @cindex creating compendia
4518 @cindex compendium, creating
4519
4520 Basically every PO file consisting of translated entries only can be
4521 declared as a valid compendium.  Often the translator wants to have
4522 special compendia; let's consider two cases: @cite{concatenating PO
4523 files} and @cite{extracting a message subset from a PO file}.
4524
4525 @subsubsection Concatenate PO Files
4526
4527 @cindex concatenating PO files into a compendium
4528 @cindex accumulating translations
4529 To concatenate several valid PO files into one compendium file you can
4530 use @samp{msgcomm} or @samp{msgcat} (the latter preferred):
4531
4532 @example
4533 msgcat -o compendium.po file1.po file2.po
4534 @end example
4535
4536 By default, @code{msgcat} will accumulate divergent translations
4537 for the same string.  Those occurrences will be marked as @code{fuzzy}
4538 and highly visible decorated; calling @code{msgcat} on
4539 @file{file1.po}:
4540
4541 @example
4542 #: src/hello.c:200
4543 #, c-format
4544 msgid "Report bugs to <%s>.\n"
4545 msgstr "Comunicar `bugs' a <%s>.\n"
4546 @end example
4547
4548 @noindent
4549 and @file{file2.po}:
4550
4551 @example
4552 #: src/bye.c:100
4553 #, c-format
4554 msgid "Report bugs to <%s>.\n"
4555 msgstr "Comunicar \"bugs\" a <%s>.\n"
4556 @end example
4557
4558 @noindent
4559 will result in:
4560
4561 @example
4562 #: src/hello.c:200 src/bye.c:100
4563 #, fuzzy, c-format
4564 msgid "Report bugs to <%s>.\n"
4565 msgstr ""
4566 "#-#-#-#-#  file1.po  #-#-#-#-#\n"
4567 "Comunicar `bugs' a <%s>.\n"
4568 "#-#-#-#-#  file2.po  #-#-#-#-#\n"
4569 "Comunicar \"bugs\" a <%s>.\n"
4570 @end example
4571
4572 @noindent
4573 The translator will have to resolve this ``conflict'' manually; she
4574 has to decide whether the first or the second version is appropriate
4575 (or provide a new translation), to delete the ``marker lines'', and
4576 finally to remove the @code{fuzzy} mark.
4577
4578 If the translator knows in advance the first found translation of a
4579 message is always the best translation she can make use to the
4580 @samp{--use-first} switch:
4581
4582 @example
4583 msgcat --use-first -o compendium.po file1.po file2.po
4584 @end example
4585
4586 A good compendium file must not contain @code{fuzzy} or untranslated
4587 entries.  If input files are ``dirty'' you must preprocess the input
4588 files or postprocess the result using @samp{msgattrib --translated --no-fuzzy}.
4589
4590 @subsubsection Extract a Message Subset from a PO File
4591 @cindex extracting parts of a PO file into a compendium
4592
4593 Nobody wants to translate the same messages again and again; thus you
4594 may wish to have a compendium file containing @file{getopt.c} messages.
4595
4596 To extract a message subset (e.g., all @file{getopt.c} messages) from an
4597 existing PO file into one compendium file you can use @samp{msggrep}:
4598
4599 @example
4600 msggrep --location src/getopt.c -o compendium.po file.po
4601 @end example
4602
4603 @node Using Compendia,  , Creating Compendia, Compendium
4604 @subsection Using Compendia
4605
4606 You can use a compendium file to initialize a translation from scratch
4607 or to update an already existing translation.
4608
4609 @subsubsection Initialize a New Translation File
4610 @cindex initialize translations from a compendium
4611
4612 Since a PO file with translations does not exist the translator can
4613 merely use @file{/dev/null} to fake the ``old'' translation file.
4614
4615 @example
4616 msgmerge --compendium compendium.po -o file.po /dev/null file.pot
4617 @end example
4618
4619 @subsubsection Update an Existing Translation File
4620 @cindex update translations from a compendium
4621
4622 Concatenate the compendium file(s) and the existing PO, merge the
4623 result with the POT file and remove the obsolete entries (optional,
4624 here done using @samp{msgattrib}):
4625
4626 @example
4627 msgcat --use-first -o update.po compendium1.po compendium2.po file.po
4628 msgmerge update.po file.pot | msgattrib --no-obsolete > file.po
4629 @end example
4630
4631 @node Manipulating, Binaries, Editing, Top
4632 @chapter Manipulating PO Files
4633 @cindex manipulating PO files
4634
4635 Sometimes it is necessary to manipulate PO files in a way that is better
4636 performed automatically than by hand.  GNU @code{gettext} includes a
4637 complete set of tools for this purpose.
4638
4639 @cindex merging two PO files
4640 When merging two packages into a single package, the resulting POT file
4641 will be the concatenation of the two packages' POT files.  Thus the
4642 maintainer must concatenate the two existing package translations into
4643 a single translation catalog, for each language.  This is best performed
4644 using @samp{msgcat}.  It is then the translators' duty to deal with any
4645 possible conflicts that arose during the merge.
4646
4647 @cindex encoding conversion
4648 When a translator takes over the translation job from another translator,
4649 but she uses a different character encoding in her locale, she will
4650 convert the catalog to her character encoding.  This is best done through
4651 the @samp{msgconv} program.
4652
4653 When a maintainer takes a source file with tagged messages from another
4654 package, he should also take the existing translations for this source
4655 file (and not let the translators do the same job twice).  One way to do
4656 this is through @samp{msggrep}, another is to create a POT file for
4657 that source file and use @samp{msgmerge}.
4658
4659 @cindex dialect
4660 @cindex orthography
4661 When a translator wants to adjust some translation catalog for a special
4662 dialect or orthography --- for example, German as written in Switzerland
4663 versus German as written in Germany --- she needs to apply some text
4664 processing to every message in the catalog.  The tool for doing this is
4665 @samp{msgfilter}.
4666
4667 Another use of @code{msgfilter} is to produce approximately the POT file for
4668 which a given PO file was made.  This can be done through a filter command
4669 like @samp{msgfilter sed -e d | sed -e '/^# /d'}.  Note that the original
4670 POT file may have had different comments and different plural message counts,
4671 that's why it's better to use the original POT file if available.
4672
4673 @cindex checking of translations
4674 When a translator wants to check her translations, for example according
4675 to orthography rules or using a non-interactive spell checker, she can do
4676 so using the @samp{msgexec} program.
4677
4678 @cindex duplicate elimination
4679 When third party tools create PO or POT files, sometimes duplicates cannot
4680 be avoided.  But the GNU @code{gettext} tools give an error when they
4681 encounter duplicate msgids in the same file and in the same domain.
4682 To merge duplicates, the @samp{msguniq} program can be used.
4683
4684 @samp{msgcomm} is a more general tool for keeping or throwing away
4685 duplicates, occurring in different files.
4686
4687 @samp{msgcmp} can be used to check whether a translation catalog is
4688 completely translated.
4689
4690 @cindex attributes, manipulating
4691 @samp{msgattrib} can be used to select and extract only the fuzzy
4692 or untranslated messages of a translation catalog.
4693
4694 @samp{msgen} is useful as a first step for preparing English translation
4695 catalogs.  It copies each message's msgid to its msgstr.
4696
4697 Finally, for those applications where all these various programs are not
4698 sufficient, a library @samp{libgettextpo} is provided that can be used to
4699 write other specialized programs that process PO files.
4700
4701 @menu
4702 * msgcat Invocation::           Invoking the @code{msgcat} Program
4703 * msgconv Invocation::          Invoking the @code{msgconv} Program
4704 * msggrep Invocation::          Invoking the @code{msggrep} Program
4705 * msgfilter Invocation::        Invoking the @code{msgfilter} Program
4706 * msguniq Invocation::          Invoking the @code{msguniq} Program
4707 * msgcomm Invocation::          Invoking the @code{msgcomm} Program
4708 * msgcmp Invocation::           Invoking the @code{msgcmp} Program
4709 * msgattrib Invocation::        Invoking the @code{msgattrib} Program
4710 * msgen Invocation::            Invoking the @code{msgen} Program
4711 * msgexec Invocation::          Invoking the @code{msgexec} Program
4712 * Colorizing::                  Highlighting parts of PO files
4713 * libgettextpo::                Writing your own programs that process PO files
4714 @end menu
4715
4716 @node msgcat Invocation, msgconv Invocation, Manipulating, Manipulating
4717 @section Invoking the @code{msgcat} Program
4718
4719 @include msgcat.texi
4720
4721 @node msgconv Invocation, msggrep Invocation, msgcat Invocation, Manipulating
4722 @section Invoking the @code{msgconv} Program
4723
4724 @include msgconv.texi
4725
4726 @node msggrep Invocation, msgfilter Invocation, msgconv Invocation, Manipulating
4727 @section Invoking the @code{msggrep} Program
4728
4729 @include msggrep.texi
4730
4731 @node msgfilter Invocation, msguniq Invocation, msggrep Invocation, Manipulating
4732 @section Invoking the @code{msgfilter} Program
4733
4734 @include msgfilter.texi
4735
4736 @node msguniq Invocation, msgcomm Invocation, msgfilter Invocation, Manipulating
4737 @section Invoking the @code{msguniq} Program
4738
4739 @include msguniq.texi
4740
4741 @node msgcomm Invocation, msgcmp Invocation, msguniq Invocation, Manipulating
4742 @section Invoking the @code{msgcomm} Program
4743
4744 @include msgcomm.texi
4745
4746 @node msgcmp Invocation, msgattrib Invocation, msgcomm Invocation, Manipulating
4747 @section Invoking the @code{msgcmp} Program
4748
4749 @include msgcmp.texi
4750
4751 @node msgattrib Invocation, msgen Invocation, msgcmp Invocation, Manipulating
4752 @section Invoking the @code{msgattrib} Program
4753
4754 @include msgattrib.texi
4755
4756 @node msgen Invocation, msgexec Invocation, msgattrib Invocation, Manipulating
4757 @section Invoking the @code{msgen} Program
4758
4759 @include msgen.texi
4760
4761 @node msgexec Invocation, Colorizing, msgen Invocation, Manipulating
4762 @section Invoking the @code{msgexec} Program
4763
4764 @include msgexec.texi
4765
4766 @node Colorizing, libgettextpo, msgexec Invocation, Manipulating
4767 @section Highlighting parts of PO files
4768
4769 Translators are usually only interested in seeing the untranslated and
4770 fuzzy messages of a PO file.  Also, when a message is set fuzzy because
4771 the msgid changed, they want to see the differences between the previous
4772 msgid and the current one (especially if the msgid is long and only few
4773 words in it have changed).  Finally, it's always welcome to highlight the
4774 different sections of a message in a PO file (comments, msgid, msgstr, etc.).
4775
4776 Such highlighting is possible through the @code{msgcat} options
4777 @samp{--color} and @samp{--style}.
4778
4779 @menu
4780 * The --color option::          Triggering colorized output
4781 * The TERM variable::           The environment variable @code{TERM}
4782 * The --style option::          The @code{--style} option
4783 * Style rules::                 Style rules for PO files
4784 * Customizing less::            Customizing @code{less} for viewing PO files
4785 @end menu
4786
4787 @node The --color option, The TERM variable,  , Colorizing
4788 @subsection The @code{--color} option
4789
4790 @opindex --color@r{, @code{msgcat} option}
4791 The @samp{--color=@var{when}} option specifies under which conditions
4792 colorized output should be generated.  The @var{when} part can be one of
4793 the following:
4794
4795 @table @code
4796 @item always
4797 @itemx yes
4798 The output will be colorized.
4799
4800 @item never
4801 @itemx no
4802 The output will not be colorized.
4803
4804 @item auto
4805 @itemx tty
4806 The output will be colorized if the output device is a tty, i.e.@: when the
4807 output goes directly to a text screen or terminal emulator window.
4808
4809 @item html
4810 The output will be colorized and be in HTML format.
4811 @end table
4812
4813 @noindent
4814 @samp{--color} is equivalent to @samp{--color=yes}.  The default is
4815 @samp{--color=auto}.
4816
4817 Thus, a command like @samp{msgcat vi.po} will produce colorized output
4818 when called by itself in a command window.  Whereas in a pipe, such as
4819 @samp{msgcat vi.po | less -R}, it will not produce colorized output.  To
4820 get colorized output in this situation nevertheless, use the command
4821 @samp{msgcat --color vi.po | less -R}.
4822
4823 The @samp{--color=html} option will produce output that can be viewed in
4824 a browser.  This can be useful, for example, for Indic languages,
4825 because the renderic of Indic scripts in browser is usually better than
4826 in terminal emulators.
4827
4828 Note that the output produced with the @code{--color} option is @emph{not}
4829 a valid PO file in itself.  It contains additional terminal-specific escape
4830 sequences or HTML tags.  A PO file reader will give a syntax error when
4831 confronted with such content.  Except for the @samp{--color=html} case,
4832 you therefore normally don't need to save output produced with the
4833 @code{--color} option in a file.
4834
4835 @node The TERM variable, The --style option, The --color option, Colorizing
4836 @subsection The environment variable @code{TERM}
4837
4838 @vindex TERM@r{, environment variable}
4839 The environment variable @code{TERM} contains a identifier for the text
4840 window's capabilities.  You can get a detailed list of these cababilities
4841 by using the @samp{infocmp} command, using @samp{man 5 terminfo} as a
4842 reference.
4843
4844 When producing text with embedded color directives, @code{msgcat} looks
4845 at the @code{TERM} variable.  Text windows today typically support at least
4846 8 colors.  Often, however, the text window supports 16 or more colors,
4847 even though the @code{TERM} variable is set to a identifier denoting only
4848 8 supported colors.  It can be worth setting the @code{TERM} variable to
4849 a different value in these cases:
4850
4851 @table @code
4852 @item xterm
4853 @code{xterm} is in most cases built with support for 16 colors.  It can also
4854 be built with support for 88 or 256 colors (but not both).  You can try to
4855 set @code{TERM} to either @code{xterm-16color}, @code{xterm-88color}, or
4856 @code{xterm-256color}.
4857
4858 @item rxvt
4859 @code{rxvt} is often built with support for 16 colors.  You can try to set
4860 @code{TERM} to @code{rxvt-16color}.
4861
4862 @item konsole
4863 @code{konsole} too is often built with support for 16 colors.  You can try to
4864 set @code{TERM} to @code{konsole-16color} or @code{xterm-16color}.
4865 @end table
4866
4867 After setting @code{TERM}, you can verify it by invoking
4868 @samp{msgcat --color=test} and seeing whether the output looks like a
4869 reasonable color map.
4870
4871 @node The --style option, Style rules, The TERM variable, Colorizing
4872 @subsection The @code{--style} option
4873
4874 @opindex --style@r{, @code{msgcat} option}
4875 The @samp{--style=@var{style_file}} option specifies the style file to use
4876 when colorizing.  It has an effect only when the @code{--color} option is
4877 effective.
4878
4879 @vindex PO_STYLE@r{, environment variable}
4880 If the @code{--style} option is not specified, the environment variable
4881 @code{PO_STYLE} is considered.  It is meant to point to the user's
4882 preferred style for PO files.
4883
4884 The default style file is @file{$prefix/share/gettext/styles/po-default.css},
4885 where @code{$prefix} is the installation location.
4886
4887 A few style files are predefined:
4888 @table @file
4889 @item po-vim.css
4890 This style imitates the look used by vim 7.
4891
4892 @item po-emacs-x.css
4893 This style imitates the look used by GNU Emacs 21 and 22 in an X11 window.
4894
4895 @item po-emacs-xterm.css
4896 @itemx po-emacs-xterm16.css
4897 @itemx po-emacs-xterm256.css
4898 This style imitates the look used by GNU Emacs 22 in a terminal of type
4899 @samp{xterm} (8 colors) or @samp{xterm-16color} (16 colors) or
4900 @samp{xterm-256color} (256 colors), respectively.
4901 @end table
4902
4903 @noindent
4904 You can use these styles without specifying a directory.  They are actually
4905 located in @file{$prefix/share/gettext/styles/}, where @code{$prefix} is the
4906 installation location.
4907
4908 You can also design your own styles.  This is described in the next section.
4909
4910
4911 @node Style rules, Customizing less, The --style option, Colorizing
4912 @subsection Style rules for PO files
4913
4914 The same style file can be used for styling of a PO file, for terminal
4915 output and for HTML output.  It is written in CSS (Cascading Style Sheet)
4916 syntax.  See @url{http://www.w3.org/TR/css2/cover.html} for a formal
4917 definition of CSS.  Many HTML authoring tutorials also contain explanations
4918 of CSS.
4919
4920 In the case of HTML output, the style file is embedded in the HTML output.
4921 In the case of text output, the style file is interpreted by the
4922 @code{msgcat} program.  This means, in particular, that when
4923 @code{@@import} is used with relative file names, the file names are
4924
4925 @itemize @minus
4926 @item
4927 relative to the resulting HTML file, in the case of HTML output,
4928
4929 @item
4930 relative to the style sheet containing the @code{@@import}, in the case of
4931 text output.  (Actually, @code{@@import}s are not yet supported in this case,
4932 due to a limitation in @code{libcroco}.)
4933 @end itemize
4934
4935 CSS rules are built up from selectors and declarations.  The declarations
4936 specify graphical properties; the selectors specify specify when they apply.
4937
4938 In PO files, the following simple selectors (based on "CSS classes", see
4939 the CSS2 spec, section 5.8.3) are supported.
4940
4941 @itemize @bullet
4942 @item
4943 Selectors that apply to entire messages:
4944
4945 @table @code
4946 @item .header
4947 This matches the header entry of a PO file.
4948
4949 @item .translated
4950 This matches a translated message.
4951
4952 @item .untranslated
4953 This matches an untranslated message (i.e.@: a message with empty translation).
4954
4955 @item .fuzzy
4956 This matches a fuzzy message (i.e.@: a message which has a translation that
4957 needs review by the translator).
4958
4959 @item .obsolete
4960 This matches an obsolete message (i.e.@: a message that was translated but is
4961 not needed by the current POT file any more).
4962 @end table
4963
4964 @item
4965 Selectors that apply to parts of a message in PO syntax.  Recall the general
4966 structure of a message in PO syntax:
4967
4968 @example
4969 @var{white-space}
4970 #  @var{translator-comments}
4971 #. @var{extracted-comments}
4972 #: @var{reference}@dots{}
4973 #, @var{flag}@dots{}
4974 #| msgid @var{previous-untranslated-string}
4975 msgid @var{untranslated-string}
4976 msgstr @var{translated-string}
4977 @end example
4978
4979 @table @code
4980 @item .comment
4981 This matches all comments (translator comments, extracted comments,
4982 source file reference comments, flag comments, previous message comments,
4983 as well as the entire obsolete messages).
4984
4985 @item .translator-comment
4986 This matches the translator comments.
4987
4988 @item .extracted-comment
4989 This matches the extracted comments, i.e.@: the comments placed by the
4990 programmer at the attention of the translator.
4991
4992 @item .reference-comment
4993 This matches the source file reference comments (entire lines).
4994
4995 @item .reference
4996 This matches the individual source file references inside the source file
4997 reference comment lines.
4998
4999 @item .flag-comment
5000 This matches the flag comment lines (entire lines).
5001
5002 @item .flag
5003 This matches the individual flags inside flag comment lines.
5004
5005 @item .fuzzy-flag
5006 This matches the `fuzzy' flag inside flag comment lines.
5007
5008 @item .previous-comment
5009 This matches the comments containing the previous untranslated string (entire
5010 lines).
5011
5012 @item .previous
5013 This matches the previous untranslated string including the string delimiters,
5014 the associated keywords (@code{msgid} etc.) and the spaces between them.
5015
5016 @item .msgid
5017 This matches the untranslated string including the string delimiters,
5018 the associated keywords (@code{msgid} etc.) and the spaces between them.
5019
5020 @item .msgstr
5021 This matches the translated string including the string delimiters,
5022 the associated keywords (@code{msgstr} etc.) and the spaces between them.
5023
5024 @item .keyword
5025 This matches the keywords (@code{msgid}, @code{msgstr}, etc.).
5026
5027 @item .string
5028 This matches strings, including the string delimiters (double quotes).
5029 @end table
5030
5031 @item
5032 Selectors that apply to parts of strings:
5033
5034 @table @code
5035 @item .text
5036 This matches the entire contents of a string (excluding the string delimiters,
5037 i.e.@: the double quotes).
5038
5039 @item .escape-sequence
5040 This matches an escape sequence (starting with a backslash).
5041
5042 @item .format-directive
5043 This matches a format string directive (starting with a @samp{%} sign in the
5044 case of most programming languages, with a @samp{@{} in the case of
5045 @code{java-format} and @code{csharp-format}, with a @samp{~} in the case of
5046 @code{lisp-format} and @code{scheme-format}, or with @samp{$} in the case of
5047 @code{sh-format}).
5048
5049 @item .invalid-format-directive
5050 This matches an invalid format string directive.
5051
5052 @item .added
5053 In an untranslated string, this matches a part of the string that was not
5054 present in the previous untranslated string.  (Not yet implemented in this
5055 release.)
5056
5057 @item .changed
5058 In an untranslated string or in a previous untranslated string, this matches
5059 a part of the string that is changed or replaced.  (Not yet implemented in
5060 this release.)
5061
5062 @item .removed
5063 In a previous untranslated string, this matches a part of the string that
5064 is not present in the current untranslated string.  (Not yet implemented in
5065 this release.)
5066 @end table
5067 @end itemize
5068
5069 These selectors can be combined to hierarchical selectors.  For example,
5070
5071 @smallexample
5072 .msgstr .invalid-format-directive @{ color: red; @}
5073 @end smallexample
5074
5075 @noindent
5076 will highlight the invalid format directives in the translated strings.
5077
5078 In text mode, pseudo-classes (CSS2 spec, section 5.11) and pseudo-elements
5079 (CSS2 spec, section 5.12) are not supported.
5080
5081 The declarations in HTML mode are not limited; any graphical attribute
5082 supported by the browsers can be used.
5083
5084 The declarations in text mode are limited to the following properties.  Other
5085 properties will be silently ignored.
5086
5087 @table @asis
5088 @item @code{color} (CSS2 spec, section 14.1)
5089 @itemx @code{background-color} (CSS2 spec, section 14.2.1)
5090 These properties is supported.  Colors will be adjusted to match the terminal's
5091 capabilities.  Note that many terminals support only 8 colors.
5092
5093 @item @code{font-weight} (CSS2 spec, section 15.2.3)
5094 This property is supported, but most terminals can only render two different
5095 weights: @code{normal} and @code{bold}.  Values >= 600 are rendered as
5096 @code{bold}.
5097
5098 @item @code{font-style} (CSS2 spec, section 15.2.3)
5099 This property is supported.  The values @code{italic} and @code{oblique} are
5100 rendered the same way.
5101
5102 @item @code{text-decoration} (CSS2 spec, section 16.3.1)
5103 This property is supported, limited to the values @code{none} and
5104 @code{underline}.
5105 @end table
5106
5107 @node Customizing less,  , Style rules, Colorizing
5108 @subsection Customizing @code{less} for viewing PO files
5109
5110 The @samp{less} program is a popular text file browser for use in a text
5111 screen or terminal emulator.  It also supports text with embedded escape
5112 sequences for colors and text decorations.
5113
5114 You can use @code{less} to view a PO file like this (assuming an UTF-8
5115 environment):
5116
5117 @smallexample
5118 msgcat --to-code=UTF-8 --color xyz.po | less -R
5119 @end smallexample
5120
5121 You can simplify this to this simple command:
5122
5123 @smallexample
5124 less xyz.po
5125 @end smallexample
5126
5127 @noindent
5128 after these three preparations:
5129
5130 @enumerate
5131 @item
5132 Add the options @samp{-R} and @samp{-f} to the @code{LESS} environment
5133 variable.  In sh shells:
5134 @smallexample
5135 $ LESS="$LESS -R -f"
5136 $ export LESS
5137 @end smallexample
5138
5139 @item
5140 If your system does not already have the @file{lessopen.sh} and
5141 @file{lessclose.sh} scripts, create them and set the @code{LESSOPEN} and
5142 @code{LESSCLOSE} environment variables, as indicated in the manual page
5143 (@samp{man less}).
5144
5145 @item
5146 Add to @file{lessopen.sh} a piece of script that recognizes PO files
5147 through their file extension and invokes @code{msgcat} on them, producing
5148 a temporary file.  Like this:
5149
5150 @smallexample
5151 case "$1" in
5152   *.po)
5153     tmpfile=`mktemp "$@{TMPDIR-/tmp@}/less.XXXXXX"`
5154     msgcat --to-code=UTF-8 --color "$1" > "$tmpfile"
5155     echo "$tmpfile"
5156     exit 0
5157     ;;
5158 esac
5159 @end smallexample
5160 @end enumerate
5161
5162 @node libgettextpo,  , Colorizing, Manipulating
5163 @section Writing your own programs that process PO files
5164
5165 For the tasks for which a combination of @samp{msgattrib}, @samp{msgcat} etc.
5166 is not sufficient, a set of C functions is provided in a library, to make it
5167 possible to process PO files in your own programs.  When you use this library,
5168 you don't need to write routines to parse the PO file; instead, you retrieve
5169 a pointer in memory to each of messages contained in the PO file.  Functions
5170 for writing PO files are not provided at this time.
5171
5172 The functions are declared in the header file @samp{<gettext-po.h>}, and are
5173 defined in a library called @samp{libgettextpo}.
5174
5175 @deftp {Data Type} po_file_t
5176 This is a pointer type that refers to the contents of a PO file, after it has
5177 been read into memory.
5178 @end deftp
5179
5180 @deftp {Data Type} po_message_iterator_t
5181 This is a pointer type that refers to an iterator that produces a sequence of
5182 messages.
5183 @end deftp
5184
5185 @deftp {Data Type} po_message_t
5186 This is a pointer type that refers to a message of a PO file, including its
5187 translation.
5188 @end deftp
5189
5190 @deftypefun po_file_t po_file_read (const char *@var{filename})
5191 The @code{po_file_read} function reads a PO file into memory.  The file name
5192 is given as argument.  The return value is a handle to the PO file's contents,
5193 valid until @code{po_file_free} is called on it.  In case of error, the return
5194 value is @code{NULL}, and @code{errno} is set.
5195 @end deftypefun
5196
5197 @deftypefun void po_file_free (po_file_t @var{file})
5198 The @code{po_file_free} function frees a PO file's contents from memory,
5199 including all messages that are only implicitly accessible through iterators.
5200 @end deftypefun
5201
5202 @deftypefun {const char * const *} po_file_domains (po_file_t @var{file})
5203 The @code{po_file_domains} function returns the domains for which the given
5204 PO file has messages.  The return value is a @code{NULL} terminated array
5205 which is valid as long as the @var{file} handle is valid.  For PO files which
5206 contain no @samp{domain} directive, the return value contains only one domain,
5207 namely the default domain @code{"messages"}.
5208 @end deftypefun
5209
5210 @deftypefun po_message_iterator_t po_message_iterator (po_file_t @var{file}, const char *@var{domain})
5211 The @code{po_message_iterator} returns an iterator that will produce the
5212 messages of @var{file} that belong to the given @var{domain}.  If @var{domain}
5213 is @code{NULL}, the default domain is used instead.  To list the messages,
5214 use the function @code{po_next_message} repeatedly.
5215 @end deftypefun
5216
5217 @deftypefun void po_message_iterator_free (po_message_iterator_t @var{iterator})
5218 The @code{po_message_iterator_free} function frees an iterator previously
5219 allocated through the @code{po_message_iterator} function.
5220 @end deftypefun
5221
5222 @deftypefun po_message_t po_next_message (po_message_iterator_t @var{iterator})
5223 The @code{po_next_message} function returns the next message from
5224 @var{iterator} and advances the iterator.  It returns @code{NULL} when the
5225 iterator has reached the end of its message list.
5226 @end deftypefun
5227
5228 The following functions returns details of a @code{po_message_t}.  Recall
5229 that the results are valid as long as the @var{file} handle is valid.
5230
5231 @deftypefun {const char *} po_message_msgid (po_message_t @var{message})
5232 The @code{po_message_msgid} function returns the @code{msgid} (untranslated
5233 English string) of a message.  This is guaranteed to be non-@code{NULL}.
5234 @end deftypefun
5235
5236 @deftypefun {const char *} po_message_msgid_plural (po_message_t @var{message})
5237 The @code{po_message_msgid_plural} function returns the @code{msgid_plural}
5238 (untranslated English plural string) of a message with plurals, or @code{NULL}
5239 for a message without plural.
5240 @end deftypefun
5241
5242 @deftypefun {const char *} po_message_msgstr (po_message_t @var{message})
5243 The @code{po_message_msgstr} function returns the @code{msgstr} (translation)
5244 of a message.  For an untranslated message, the return value is an empty
5245 string.
5246 @end deftypefun
5247
5248 @deftypefun {const char *} po_message_msgstr_plural (po_message_t @var{message}, int @var{index})
5249 The @code{po_message_msgstr_plural} function returns the
5250 @code{msgstr[@var{index}]} of a message with plurals, or @code{NULL} when
5251 the @var{index} is out of range or for a message without plural.
5252 @end deftypefun
5253
5254 Here is an example code how these functions can be used.
5255
5256 @example
5257 const char *filename = @dots{};
5258 po_file_t file = po_file_read (filename);
5259
5260 if (file == NULL)
5261   error (EXIT_FAILURE, errno, "couldn't open the PO file %s", filename);
5262 @{
5263   const char * const *domains = po_file_domains (file);
5264   const char * const *domainp;
5265
5266   for (domainp = domains; *domainp; domainp++)
5267     @{
5268       const char *domain = *domainp;
5269       po_message_iterator_t iterator = po_message_iterator (file, domain);
5270
5271       for (;;)
5272         @{
5273           po_message_t *message = po_next_message (iterator);
5274
5275           if (message == NULL)
5276             break;
5277           @{
5278             const char *msgid = po_message_msgid (message);
5279             const char *msgstr = po_message_msgstr (message);
5280
5281             @dots{}
5282           @}
5283         @}
5284       po_message_iterator_free (iterator);
5285     @}
5286 @}
5287 po_file_free (file);
5288 @end example
5289
5290 @node Binaries, Programmers, Manipulating, Top
5291 @chapter Producing Binary MO Files
5292
5293 @c FIXME: Rewrite.
5294
5295 @menu
5296 * msgfmt Invocation::           Invoking the @code{msgfmt} Program
5297 * msgunfmt Invocation::         Invoking the @code{msgunfmt} Program
5298 * MO Files::                    The Format of GNU MO Files
5299 @end menu
5300
5301 @node msgfmt Invocation, msgunfmt Invocation, Binaries, Binaries
5302 @section Invoking the @code{msgfmt} Program
5303
5304 @include msgfmt.texi
5305
5306 @node msgunfmt Invocation, MO Files, msgfmt Invocation, Binaries
5307 @section Invoking the @code{msgunfmt} Program
5308
5309 @include msgunfmt.texi
5310
5311 @node MO Files,  , msgunfmt Invocation, Binaries
5312 @section The Format of GNU MO Files
5313 @cindex MO file's format
5314 @cindex file format, @file{.mo}
5315
5316 The format of the generated MO files is best described by a picture,
5317 which appears below.
5318
5319 @cindex magic signature of MO files
5320 The first two words serve the identification of the file.  The magic
5321 number will always signal GNU MO files.  The number is stored in the
5322 byte order of the generating machine, so the magic number really is
5323 two numbers: @code{0x950412de} and @code{0xde120495}.
5324
5325 The second word describes the current revision of the file format,
5326 composed of a major and a minor revision number.  The revision numbers
5327 ensure that the readers of MO files can distinguish new formats from
5328 old ones and handle their contents, as far as possible.  For now the
5329 major revision is 0 or 1, and the minor revision is also 0 or 1.  More
5330 revisions might be added in the future.  A program seeing an unexpected
5331 major revision number should stop reading the MO file entirely; whereas
5332 an unexpected minor revision number means that the file can be read but
5333 will not reveal its full contents, when parsed by a program that
5334 supports only smaller minor revision numbers.
5335
5336 The version is kept
5337 separate from the magic number, instead of using different magic
5338 numbers for different formats, mainly because @file{/etc/magic} is
5339 not updated often.
5340
5341 Follow a number of pointers to later tables in the file, allowing
5342 for the extension of the prefix part of MO files without having to
5343 recompile programs reading them.  This might become useful for later
5344 inserting a few flag bits, indication about the charset used, new
5345 tables, or other things.
5346
5347 Then, at offset @var{O} and offset @var{T} in the picture, two tables
5348 of string descriptors can be found.  In both tables, each string
5349 descriptor uses two 32 bits integers, one for the string length,
5350 another for the offset of the string in the MO file, counting in bytes
5351 from the start of the file.  The first table contains descriptors
5352 for the original strings, and is sorted so the original strings
5353 are in increasing lexicographical order.  The second table contains
5354 descriptors for the translated strings, and is parallel to the first
5355 table: to find the corresponding translation one has to access the
5356 array slot in the second array with the same index.
5357
5358 Having the original strings sorted enables the use of simple binary
5359 search, for when the MO file does not contain an hashing table, or
5360 for when it is not practical to use the hashing table provided in
5361 the MO file.  This also has another advantage, as the empty string
5362 in a PO file GNU @code{gettext} is usually @emph{translated} into
5363 some system information attached to that particular MO file, and the
5364 empty string necessarily becomes the first in both the original and
5365 translated tables, making the system information very easy to find.
5366
5367 @cindex hash table, inside MO files
5368 The size @var{S} of the hash table can be zero.  In this case, the
5369 hash table itself is not contained in the MO file.  Some people might
5370 prefer this because a precomputed hashing table takes disk space, and
5371 does not win @emph{that} much speed.  The hash table contains indices
5372 to the sorted array of strings in the MO file.  Conflict resolution is
5373 done by double hashing.  The precise hashing algorithm used is fairly
5374 dependent on GNU @code{gettext} code, and is not documented here.
5375
5376 As for the strings themselves, they follow the hash file, and each
5377 is terminated with a @key{NUL}, and this @key{NUL} is not counted in
5378 the length which appears in the string descriptor.  The @code{msgfmt}
5379 program has an option selecting the alignment for MO file strings.
5380 With this option, each string is separately aligned so it starts at
5381 an offset which is a multiple of the alignment value.  On some RISC
5382 machines, a correct alignment will speed things up.
5383
5384 @cindex context, in MO files
5385 Contexts are stored by storing the concatenation of the context, a
5386 @key{EOT} byte, and the original string, instead of the original string.
5387
5388 @cindex plural forms, in MO files
5389 Plural forms are stored by letting the plural of the original string
5390 follow the singular of the original string, separated through a
5391 @key{NUL} byte.  The length which appears in the string descriptor
5392 includes both.  However, only the singular of the original string
5393 takes part in the hash table lookup.  The plural variants of the
5394 translation are all stored consecutively, separated through a
5395 @key{NUL} byte.  Here also, the length in the string descriptor
5396 includes all of them.
5397
5398 Nothing prevents a MO file from having embedded @key{NUL}s in strings.
5399 However, the program interface currently used already presumes
5400 that strings are @key{NUL} terminated, so embedded @key{NUL}s are
5401 somewhat useless.  But the MO file format is general enough so other
5402 interfaces would be later possible, if for example, we ever want to
5403 implement wide characters right in MO files, where @key{NUL} bytes may
5404 accidentally appear.  (No, we don't want to have wide characters in MO
5405 files.  They would make the file unnecessarily large, and the
5406 @samp{wchar_t} type being platform dependent, MO files would be
5407 platform dependent as well.)
5408
5409 This particular issue has been strongly debated in the GNU
5410 @code{gettext} development forum, and it is expectable that MO file
5411 format will evolve or change over time.  It is even possible that many
5412 formats may later be supported concurrently.  But surely, we have to
5413 start somewhere, and the MO file format described here is a good start.
5414 Nothing is cast in concrete, and the format may later evolve fairly
5415 easily, so we should feel comfortable with the current approach.
5416
5417 @example
5418 @group
5419         byte
5420              +------------------------------------------+
5421           0  | magic number = 0x950412de                |
5422              |                                          |
5423           4  | file format revision = 0                 |
5424              |                                          |
5425           8  | number of strings                        |  == N
5426              |                                          |
5427          12  | offset of table with original strings    |  == O
5428              |                                          |
5429          16  | offset of table with translation strings |  == T
5430              |                                          |
5431          20  | size of hashing table                    |  == S
5432              |                                          |
5433          24  | offset of hashing table                  |  == H
5434              |                                          |
5435              .                                          .
5436              .    (possibly more entries later)         .
5437              .                                          .
5438              |                                          |
5439           O  | length & offset 0th string  ----------------.
5440       O + 8  | length & offset 1st string  ------------------.
5441               ...                                    ...   | |
5442 O + ((N-1)*8)| length & offset (N-1)th string           |  | |
5443              |                                          |  | |
5444           T  | length & offset 0th translation  ---------------.
5445       T + 8  | length & offset 1st translation  -----------------.
5446               ...                                    ...   | | | |
5447 T + ((N-1)*8)| length & offset (N-1)th translation      |  | | | |
5448              |                                          |  | | | |
5449           H  | start hash table                         |  | | | |
5450               ...                                    ...   | | | |
5451   H + S * 4  | end hash table                           |  | | | |
5452              |                                          |  | | | |
5453              | NUL terminated 0th string  <----------------' | | |
5454              |                                          |    | | |
5455              | NUL terminated 1st string  <------------------' | |
5456              |                                          |      | |
5457               ...                                    ...       | |
5458              |                                          |      | |
5459              | NUL terminated 0th translation  <---------------' |
5460              |                                          |        |
5461              | NUL terminated 1st translation  <-----------------'
5462              |                                          |
5463               ...                                    ...
5464              |                                          |
5465              +------------------------------------------+
5466 @end group
5467 @end example
5468
5469 @node Programmers, Translators, Binaries, Top
5470 @chapter The Programmer's View
5471
5472 @c FIXME: Reorganize whole chapter.
5473
5474 One aim of the current message catalog implementation provided by
5475 GNU @code{gettext} was to use the system's message catalog handling, if the
5476 installer wishes to do so.  So we perhaps should first take a look at
5477 the solutions we know about.  The people in the POSIX committee did not
5478 manage to agree on one of the semi-official standards which we'll
5479 describe below.  In fact they couldn't agree on anything, so they decided
5480 only to include an example of an interface.  The major Unix vendors
5481 are split in the usage of the two most important specifications: X/Open's
5482 catgets vs. Uniforum's gettext interface.  We'll describe them both and
5483 later explain our solution of this dilemma.
5484
5485 @menu
5486 * catgets::                     About @code{catgets}
5487 * gettext::                     About @code{gettext}
5488 * Comparison::                  Comparing the two interfaces
5489 * Using libintl.a::             Using libintl.a in own programs
5490 * gettext grok::                Being a @code{gettext} grok
5491 * Temp Programmers::            Temporary Notes for the Programmers Chapter
5492 @end menu
5493
5494 @node catgets, gettext, Programmers, Programmers
5495 @section About @code{catgets}
5496 @cindex @code{catgets}, X/Open specification
5497
5498 The @code{catgets} implementation is defined in the X/Open Portability
5499 Guide, Volume 3, XSI Supplementary Definitions, Chapter 5.  But the
5500 process of creating this standard seemed to be too slow for some of
5501 the Unix vendors so they created their implementations on preliminary
5502 versions of the standard.  Of course this leads again to problems while
5503 writing platform independent programs: even the usage of @code{catgets}
5504 does not guarantee a unique interface.
5505
5506 Another, personal comment on this that only a bunch of committee members
5507 could have made this interface.  They never really tried to program
5508 using this interface.  It is a fast, memory-saving implementation, an
5509 user can happily live with it.  But programmers hate it (at least I and
5510 some others do@dots{})
5511
5512 But we must not forget one point: after all the trouble with transferring
5513 the rights on Unix(tm) they at last came to X/Open, the very same who
5514 published this specification.  This leads me to making the prediction
5515 that this interface will be in future Unix standards (e.g.@: Spec1170) and
5516 therefore part of all Unix implementation (implementations, which are
5517 @emph{allowed} to wear this name).
5518
5519 @menu
5520 * Interface to catgets::        The interface
5521 * Problems with catgets::       Problems with the @code{catgets} interface?!
5522 @end menu
5523
5524 @node Interface to catgets, Problems with catgets, catgets, catgets
5525 @subsection The Interface
5526 @cindex interface to @code{catgets}
5527
5528 The interface to the @code{catgets} implementation consists of three
5529 functions which correspond to those used in file access: @code{catopen}
5530 to open the catalog for using, @code{catgets} for accessing the message
5531 tables, and @code{catclose} for closing after work is done.  Prototypes
5532 for the functions and the needed definitions are in the
5533 @code{<nl_types.h>} header file.
5534
5535 @cindex @code{catopen}, a @code{catgets} function
5536 @code{catopen} is used like in this:
5537
5538 @example
5539 nl_catd catd = catopen ("catalog_name", 0);
5540 @end example
5541
5542 The function takes as the argument the name of the catalog.  This usual
5543 refers to the name of the program or the package.  The second parameter
5544 is not further specified in the standard.  I don't even know whether it
5545 is implemented consistently among various systems.  So the common advice
5546 is to use @code{0} as the value.  The return value is a handle to the
5547 message catalog, equivalent to handles to file returned by @code{open}.
5548
5549 @cindex @code{catgets}, a @code{catgets} function
5550 This handle is of course used in the @code{catgets} function which can
5551 be used like this:
5552
5553 @example
5554 char *translation = catgets (catd, set_no, msg_id, "original string");
5555 @end example
5556
5557 The first parameter is this catalog descriptor.  The second parameter
5558 specifies the set of messages in this catalog, in which the message
5559 described by @code{msg_id} is obtained.  @code{catgets} therefore uses a
5560 three-stage addressing:
5561
5562 @display
5563 catalog name @result{} set number @result{} message ID @result{} translation
5564 @end display
5565
5566 @c Anybody else loving Haskell??? :-) -- Uli
5567
5568 The fourth argument is not used to address the translation.  It is given
5569 as a default value in case when one of the addressing stages fail.  One
5570 important thing to remember is that although the return type of catgets
5571 is @code{char *} the resulting string @emph{must not} be changed.  It
5572 should better be @code{const char *}, but the standard is published in
5573 1988, one year before ANSI C.
5574
5575 @noindent
5576 @cindex @code{catclose}, a @code{catgets} function
5577 The last of these functions is used and behaves as expected:
5578
5579 @example
5580 catclose (catd);
5581 @end example
5582
5583 After this no @code{catgets} call using the descriptor is legal anymore.
5584
5585 @node Problems with catgets,  , Interface to catgets, catgets
5586 @subsection Problems with the @code{catgets} Interface?!
5587 @cindex problems with @code{catgets} interface
5588
5589 Now that this description seemed to be really easy --- where are the
5590 problems we speak of?  In fact the interface could be used in a
5591 reasonable way, but constructing the message catalogs is a pain.  The
5592 reason for this lies in the third argument of @code{catgets}: the unique
5593 message ID.  This has to be a numeric value for all messages in a single
5594 set.  Perhaps you could imagine the problems keeping such a list while
5595 changing the source code.  Add a new message here, remove one there.  Of
5596 course there have been developed a lot of tools helping to organize this
5597 chaos but one as the other fails in one aspect or the other.  We don't
5598 want to say that the other approach has no problems but they are far
5599 more easy to manage.
5600
5601 @node gettext, Comparison, catgets, Programmers
5602 @section About @code{gettext}
5603 @cindex @code{gettext}, a programmer's view
5604
5605 The definition of the @code{gettext} interface comes from a Uniforum
5606 proposal.  It was submitted there by Sun, who had implemented the
5607 @code{gettext} function in SunOS 4, around 1990.  Nowadays, the
5608 @code{gettext} interface is specified by the OpenI18N standard.
5609
5610 The main point about this solution is that it does not follow the
5611 method of normal file handling (open-use-close) and that it does not
5612 burden the programmer with so many tasks, especially the unique key handling.
5613 Of course here also a unique key is needed, but this key is the message
5614 itself (how long or short it is).  See @ref{Comparison} for a more
5615 detailed comparison of the two methods.
5616
5617 The following section contains a rather detailed description of the
5618 interface.  We make it that detailed because this is the interface
5619 we chose for the GNU @code{gettext} Library.  Programmers interested
5620 in using this library will be interested in this description.
5621
5622 @menu
5623 * Interface to gettext::        The interface
5624 * Ambiguities::                 Solving ambiguities
5625 * Locating Catalogs::           Locating message catalog files
5626 * Charset conversion::          How to request conversion to Unicode
5627 * Contexts::                    Solving ambiguities in GUI programs
5628 * Plural forms::                Additional functions for handling plurals
5629 * Optimized gettext::           Optimization of the *gettext functions
5630 @end menu
5631
5632 @node Interface to gettext, Ambiguities, gettext, gettext
5633 @subsection The Interface
5634 @cindex @code{gettext} interface
5635
5636 The minimal functionality an interface must have is a) to select a
5637 domain the strings are coming from (a single domain for all programs is
5638 not reasonable because its construction and maintenance is difficult,
5639 perhaps impossible) and b) to access a string in a selected domain.
5640
5641 This is principally the description of the @code{gettext} interface.  It
5642 has a global domain which unqualified usages reference.  Of course this
5643 domain is selectable by the user.
5644
5645 @example
5646 char *textdomain (const char *domain_name);
5647 @end example
5648
5649 This provides the possibility to change or query the current status of
5650 the current global domain of the @code{LC_MESSAGE} category.  The
5651 argument is a null-terminated string, whose characters must be legal in
5652 the use in filenames.  If the @var{domain_name} argument is @code{NULL},
5653 the function returns the current value.  If no value has been set
5654 before, the name of the default domain is returned: @emph{messages}.
5655 Please note that although the return value of @code{textdomain} is of
5656 type @code{char *} no changing is allowed.  It is also important to know
5657 that no checks of the availability are made.  If the name is not
5658 available you will see this by the fact that no translations are provided.
5659
5660 @noindent
5661 To use a domain set by @code{textdomain} the function
5662
5663 @example
5664 char *gettext (const char *msgid);
5665 @end example
5666
5667 @noindent
5668 is to be used.  This is the simplest reasonable form one can imagine.
5669 The translation of the string @var{msgid} is returned if it is available
5670 in the current domain.  If it is not available, the argument itself is
5671 returned.  If the argument is @code{NULL} the result is undefined.
5672
5673 One thing which should come into mind is that no explicit dependency to
5674 the used domain is given.  The current value of the domain is used.
5675 If this changes between two
5676 executions of the same @code{gettext} call in the program, both calls
5677 reference a different message catalog.
5678
5679 For the easiest case, which is normally used in internationalized
5680 packages, once at the beginning of execution a call to @code{textdomain}
5681 is issued, setting the domain to a unique name, normally the package
5682 name.  In the following code all strings which have to be translated are
5683 filtered through the gettext function.  That's all, the package speaks
5684 your language.
5685
5686 @node Ambiguities, Locating Catalogs, Interface to gettext, gettext
5687 @subsection Solving Ambiguities
5688 @cindex several domains
5689 @cindex domain ambiguities
5690 @cindex large package
5691
5692 While this single name domain works well for most applications there
5693 might be the need to get translations from more than one domain.  Of
5694 course one could switch between different domains with calls to
5695 @code{textdomain}, but this is really not convenient nor is it fast.  A
5696 possible situation could be one case subject to discussion during this
5697 writing:  all
5698 error messages of functions in the set of common used functions should
5699 go into a separate domain @code{error}.  By this mean we would only need
5700 to translate them once.
5701 Another case are messages from a library, as these @emph{have} to be
5702 independent of the current domain set by the application.
5703
5704 @noindent
5705 For this reasons there are two more functions to retrieve strings:
5706
5707 @example
5708 char *dgettext (const char *domain_name, const char *msgid);
5709 char *dcgettext (const char *domain_name, const char *msgid,
5710                  int category);
5711 @end example
5712
5713 Both take an additional argument at the first place, which corresponds
5714 to the argument of @code{textdomain}.  The third argument of
5715 @code{dcgettext} allows to use another locale category but @code{LC_MESSAGES}.
5716 But I really don't know where this can be useful.  If the
5717 @var{domain_name} is @code{NULL} or @var{category} has an value beside
5718 the known ones, the result is undefined.  It should also be noted that
5719 this function is not part of the second known implementation of this
5720 function family, the one found in Solaris.
5721
5722 A second ambiguity can arise by the fact, that perhaps more than one
5723 domain has the same name.  This can be solved by specifying where the
5724 needed message catalog files can be found.
5725
5726 @example
5727 char *bindtextdomain (const char *domain_name,
5728                       const char *dir_name);
5729 @end example
5730
5731 Calling this function binds the given domain to a file in the specified
5732 directory (how this file is determined follows below).  Especially a
5733 file in the systems default place is not favored against the specified
5734 file anymore (as it would be by solely using @code{textdomain}).  A
5735 @code{NULL} pointer for the @var{dir_name} parameter returns the binding
5736 associated with @var{domain_name}.  If @var{domain_name} itself is
5737 @code{NULL} nothing happens and a @code{NULL} pointer is returned.  Here
5738 again as for all the other functions is true that none of the return
5739 value must be changed!
5740
5741 It is important to remember that relative path names for the
5742 @var{dir_name} parameter can be trouble.  Since the path is always
5743 computed relative to the current directory different results will be
5744 achieved when the program executes a @code{chdir} command.  Relative
5745 paths should always be avoided to avoid dependencies and
5746 unreliabilities.
5747
5748 @node Locating Catalogs, Charset conversion, Ambiguities, gettext
5749 @subsection Locating Message Catalog Files
5750 @cindex message catalog files location
5751
5752 Because many different languages for many different packages have to be
5753 stored we need some way to add these information to file message catalog
5754 files.  The way usually used in Unix environments is have this encoding
5755 in the file name.  This is also done here.  The directory name given in
5756 @code{bindtextdomain}s second argument (or the default directory),
5757 followed by the name of the locale, the locale category, and the domain name
5758 are concatenated:
5759
5760 @example
5761 @var{dir_name}/@var{locale}/LC_@var{category}/@var{domain_name}.mo
5762 @end example
5763
5764 The default value for @var{dir_name} is system specific.  For the GNU
5765 library, and for packages adhering to its conventions, it's:
5766 @example
5767 /usr/local/share/locale
5768 @end example
5769
5770 @noindent
5771 @var{locale} is the name of the locale category which is designated by
5772 @code{LC_@var{category}}.  For @code{gettext} and @code{dgettext} this
5773 @code{LC_@var{category}} is always @code{LC_MESSAGES}.@footnote{Some
5774 system, e.g.@: mingw, don't have @code{LC_MESSAGES}.  Here we use a more or
5775 less arbitrary value for it, namely 1729, the smallest positive integer
5776 which can be represented in two different ways as the sum of two cubes.}
5777 The name of the locale category is determined through
5778 @code{setlocale (LC_@var{category}, NULL)}.
5779 @footnote{When the system does not support @code{setlocale} its behavior
5780 in setting the locale values is simulated by looking at the environment
5781 variables.}
5782 When using the function @code{dcgettext}, you can specify the locale category
5783 through the third argument.
5784
5785 @node Charset conversion, Contexts, Locating Catalogs, gettext
5786 @subsection How to specify the output character set @code{gettext} uses
5787 @cindex charset conversion at runtime
5788 @cindex encoding conversion at runtime
5789
5790 @code{gettext} not only looks up a translation in a message catalog.  It
5791 also converts the translation on the fly to the desired output character
5792 set.  This is useful if the user is working in a different character set
5793 than the translator who created the message catalog, because it avoids
5794 distributing variants of message catalogs which differ only in the
5795 character set.
5796
5797 The output character set is, by default, the value of @code{nl_langinfo
5798 (CODESET)}, which depends on the @code{LC_CTYPE} part of the current
5799 locale.  But programs which store strings in a locale independent way
5800 (e.g.@: UTF-8) can request that @code{gettext} and related functions
5801 return the translations in that encoding, by use of the
5802 @code{bind_textdomain_codeset} function.
5803
5804 Note that the @var{msgid} argument to @code{gettext} is not subject to
5805 character set conversion.  Also, when @code{gettext} does not find a
5806 translation for @var{msgid}, it returns @var{msgid} unchanged --
5807 independently of the current output character set.  It is therefore
5808 recommended that all @var{msgid}s be US-ASCII strings.
5809
5810 @deftypefun {char *} bind_textdomain_codeset (const char *@var{domainname}, const char *@var{codeset})
5811 The @code{bind_textdomain_codeset} function can be used to specify the
5812 output character set for message catalogs for domain @var{domainname}.
5813 The @var{codeset} argument must be a valid codeset name which can be used
5814 for the @code{iconv_open} function, or a null pointer.
5815
5816 If the @var{codeset} parameter is the null pointer,
5817 @code{bind_textdomain_codeset} returns the currently selected codeset
5818 for the domain with the name @var{domainname}.  It returns @code{NULL} if
5819 no codeset has yet been selected.
5820
5821 The @code{bind_textdomain_codeset} function can be used several times.
5822 If used multiple times with the same @var{domainname} argument, the
5823 later call overrides the settings made by the earlier one.
5824
5825 The @code{bind_textdomain_codeset} function returns a pointer to a
5826 string containing the name of the selected codeset.  The string is
5827 allocated internally in the function and must not be changed by the
5828 user.  If the system went out of core during the execution of
5829 @code{bind_textdomain_codeset}, the return value is @code{NULL} and the
5830 global variable @var{errno} is set accordingly.
5831 @end deftypefun
5832
5833 @node Contexts, Plural forms, Charset conversion, gettext
5834 @subsection Using contexts for solving ambiguities
5835 @cindex context
5836 @cindex GUI programs
5837 @cindex translating menu entries
5838 @cindex menu entries
5839
5840 One place where the @code{gettext} functions, if used normally, have big
5841 problems is within programs with graphical user interfaces (GUIs).  The
5842 problem is that many of the strings which have to be translated are very
5843 short.  They have to appear in pull-down menus which restricts the
5844 length.  But strings which are not containing entire sentences or at
5845 least large fragments of a sentence may appear in more than one
5846 situation in the program but might have different translations.  This is
5847 especially true for the one-word strings which are frequently used in
5848 GUI programs.
5849
5850 As a consequence many people say that the @code{gettext} approach is
5851 wrong and instead @code{catgets} should be used which indeed does not
5852 have this problem.  But there is a very simple and powerful method to
5853 handle this kind of problems with the @code{gettext} functions.
5854
5855 Contexts can be added to strings to be translated.  A context dependent
5856 translation lookup is when a translation for a given string is searched,
5857 that is limited to a given context.  The translation for the same string
5858 in a different context can be different.  The different translations of
5859 the same string in different contexts can be stored in the in the same
5860 MO file, and can be edited by the translator in the same PO file.
5861
5862 The @file{gettext.h} include file contains the lookup macros for strings
5863 with contexts.  They are implemented as thin macros and inline functions
5864 over the functions from @code{<libintl.h>}.
5865
5866 @findex pgettext
5867 @example
5868 const char *pgettext (const char *msgctxt, const char *msgid);
5869 @end example
5870
5871 In a call of this macro, @var{msgctxt} and @var{msgid} must be string
5872 literals.  The macro returns the translation of @var{msgid}, restricted
5873 to the context given by @var{msgctxt}.
5874
5875 The @var{msgctxt} string is visible in the PO file to the translator.
5876 You should try to make it somehow canonical and never changing.  Because
5877 every time you change an @var{msgctxt}, the translator will have to review
5878 the translation of @var{msgid}.
5879
5880 Finding a canonical @var{msgctxt} string that doesn't change over time can
5881 be hard.  But you shouldn't use the file name or class name containing the
5882 @code{pgettext} call -- because it is a common development task to rename
5883 a file or a class, and it shouldn't cause translator work.  Also you shouldn't
5884 use a comment in the form of a complete English sentence as @var{msgctxt} --
5885 because orthography or grammar changes are often applied to such sentences,
5886 and again, it shouldn't force the translator to do a review.
5887
5888 The @samp{p} in @samp{pgettext} stands for ``particular'': @code{pgettext}
5889 fetches a particular translation of the @var{msgid}.
5890
5891 @findex dpgettext
5892 @findex dcpgettext
5893 @example
5894 const char *dpgettext (const char *domain_name,
5895                        const char *msgctxt, const char *msgid);
5896 const char *dcpgettext (const char *domain_name,
5897                         const char *msgctxt, const char *msgid,
5898                         int category);
5899 @end example
5900
5901 These are generalizations of @code{pgettext}.  They behave similarly to
5902 @code{dgettext} and @code{dcgettext}, respectively.  The @var{domain_name}
5903 argument defines the translation domain.  The @var{category} argument
5904 allows to use another locale category than @code{LC_MESSAGES}.
5905
5906 As as example consider the following fictional situation.  A GUI program
5907 has a menu bar with the following entries:
5908
5909 @smallexample
5910 +------------+------------+--------------------------------------+
5911 | File       | Printer    |                                      |
5912 +------------+------------+--------------------------------------+
5913 | Open     | | Select   |
5914 | New      | | Open     |
5915 +----------+ | Connect  |
5916              +----------+
5917 @end smallexample
5918
5919 To have the strings @code{File}, @code{Printer}, @code{Open},
5920 @code{New}, @code{Select}, and @code{Connect} translated there has to be
5921 at some point in the code a call to a function of the @code{gettext}
5922 family.  But in two places the string passed into the function would be
5923 @code{Open}.  The translations might not be the same and therefore we
5924 are in the dilemma described above.
5925
5926 What distinguishes the two places is the menu path from the menu root to
5927 the particular menu entries:
5928
5929 @smallexample
5930 Menu|File
5931 Menu|Printer
5932 Menu|File|Open
5933 Menu|File|New
5934 Menu|Printer|Select
5935 Menu|Printer|Open
5936 Menu|Printer|Connect
5937 @end smallexample
5938
5939 The context is thus the menu path without its last part.  So, the calls
5940 look like this:
5941
5942 @smallexample
5943 pgettext ("Menu|", "File")
5944 pgettext ("Menu|", "Printer")
5945 pgettext ("Menu|File|", "Open")
5946 pgettext ("Menu|File|", "New")
5947 pgettext ("Menu|Printer|", "Select")
5948 pgettext ("Menu|Printer|", "Open")
5949 pgettext ("Menu|Printer|", "Connect")
5950 @end smallexample
5951
5952 Whether or not to use the @samp{|} character at the end of the context is a
5953 matter of style.
5954
5955 For more complex cases, where the @var{msgctxt} or @var{msgid} are not
5956 string literals, more general macros are available:
5957
5958 @findex pgettext_expr
5959 @findex dpgettext_expr
5960 @findex dcpgettext_expr
5961 @example
5962 const char *pgettext_expr (const char *msgctxt, const char *msgid);
5963 const char *dpgettext_expr (const char *domain_name,
5964                             const char *msgctxt, const char *msgid);
5965 const char *dcpgettext_expr (const char *domain_name,
5966                              const char *msgctxt, const char *msgid,
5967                              int category);
5968 @end example
5969
5970 Here @var{msgctxt} and @var{msgid} can be arbitrary string-valued expressions.
5971 These macros are more general.  But in the case that both argument expressions
5972 are string literals, the macros without the @samp{_expr} suffix are more
5973 efficient.
5974
5975 @node Plural forms, Optimized gettext, Contexts, gettext
5976 @subsection Additional functions for plural forms
5977 @cindex plural forms
5978
5979 The functions of the @code{gettext} family described so far (and all the
5980 @code{catgets} functions as well) have one problem in the real world
5981 which have been neglected completely in all existing approaches.  What
5982 is meant here is the handling of plural forms.
5983
5984 Looking through Unix source code before the time anybody thought about
5985 internationalization (and, sadly, even afterwards) one can often find
5986 code similar to the following:
5987
5988 @smallexample
5989    printf ("%d file%s deleted", n, n == 1 ? "" : "s");
5990 @end smallexample
5991
5992 @noindent
5993 After the first complaints from people internationalizing the code people
5994 either completely avoided formulations like this or used strings like
5995 @code{"file(s)"}.  Both look unnatural and should be avoided.  First
5996 tries to solve the problem correctly looked like this:
5997
5998 @smallexample
5999    if (n == 1)
6000      printf ("%d file deleted", n);
6001    else
6002      printf ("%d files deleted", n);
6003 @end smallexample
6004
6005 But this does not solve the problem.  It helps languages where the
6006 plural form of a noun is not simply constructed by adding an
6007 @ifhtml
6008 ‘s’
6009 @end ifhtml
6010 @ifnothtml
6011 `s'
6012 @end ifnothtml
6013 but that is all.  Once again people fell into the trap of believing the
6014 rules their language is using are universal.  But the handling of plural
6015 forms differs widely between the language families.  For example,
6016 Rafal Maszkowski @code{<rzm@@mat.uni.torun.pl>} reports:
6017
6018 @quotation
6019 In Polish we use e.g.@: plik (file) this way:
6020 @example
6021 1 plik
6022 2,3,4 pliki
6023 5-21 pliko'w
6024 22-24 pliki
6025 25-31 pliko'w
6026 @end example
6027 and so on (o' means 8859-2 oacute which should be rather okreska,
6028 similar to aogonek).
6029 @end quotation
6030
6031 There are two things which can differ between languages (and even inside
6032 language families);
6033
6034 @itemize @bullet
6035 @item
6036 The form how plural forms are built differs.  This is a problem with
6037 languages which have many irregularities.  German, for instance, is a
6038 drastic case.  Though English and German are part of the same language
6039 family (Germanic), the almost regular forming of plural noun forms
6040 (appending an
6041 @ifhtml
6042 ‘s’)
6043 @end ifhtml
6044 @ifnothtml
6045 `s')
6046 @end ifnothtml
6047 is hardly found in German.
6048
6049 @item
6050 The number of plural forms differ.  This is somewhat surprising for
6051 those who only have experiences with Romanic and Germanic languages
6052 since here the number is the same (there are two).
6053
6054 But other language families have only one form or many forms.  More
6055 information on this in an extra section.
6056 @end itemize
6057
6058 The consequence of this is that application writers should not try to
6059 solve the problem in their code.  This would be localization since it is
6060 only usable for certain, hardcoded language environments.  Instead the
6061 extended @code{gettext} interface should be used.
6062
6063 These extra functions are taking instead of the one key string two
6064 strings and a numerical argument.  The idea behind this is that using
6065 the numerical argument and the first string as a key, the implementation
6066 can select using rules specified by the translator the right plural
6067 form.  The two string arguments then will be used to provide a return
6068 value in case no message catalog is found (similar to the normal
6069 @code{gettext} behavior).  In this case the rules for Germanic language
6070 is used and it is assumed that the first string argument is the singular
6071 form, the second the plural form.
6072
6073 This has the consequence that programs without language catalogs can
6074 display the correct strings only if the program itself is written using
6075 a Germanic language.  This is a limitation but since the GNU C library
6076 (as well as the GNU @code{gettext} package) are written as part of the
6077 GNU package and the coding standards for the GNU project require program
6078 being written in English, this solution nevertheless fulfills its
6079 purpose.
6080
6081 @deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
6082 The @code{ngettext} function is similar to the @code{gettext} function
6083 as it finds the message catalogs in the same way.  But it takes two
6084 extra arguments.  The @var{msgid1} parameter must contain the singular
6085 form of the string to be converted.  It is also used as the key for the
6086 search in the catalog.  The @var{msgid2} parameter is the plural form.
6087 The parameter @var{n} is used to determine the plural form.  If no
6088 message catalog is found @var{msgid1} is returned if @code{n == 1},
6089 otherwise @code{msgid2}.
6090
6091 An example for the use of this function is:
6092
6093 @smallexample
6094 printf (ngettext ("%d file removed", "%d files removed", n), n);
6095 @end smallexample
6096
6097 Please note that the numeric value @var{n} has to be passed to the
6098 @code{printf} function as well.  It is not sufficient to pass it only to
6099 @code{ngettext}.
6100
6101 In the English singular case, the number -- always 1 -- can be replaced with
6102 "one":
6103
6104 @smallexample
6105 printf (ngettext ("One file removed", "%d files removed", n), n);
6106 @end smallexample
6107
6108 @noindent
6109 This works because the @samp{printf} function discards excess arguments that
6110 are not consumed by the format string.
6111
6112 If this function is meant to yield a format string that takes two or more
6113 arguments, you can not use it like this:
6114
6115 @smallexample
6116 printf (ngettext ("%d file removed from directory %s",
6117                   "%d files removed from directory %s",
6118                   n),
6119         n, dir);
6120 @end smallexample
6121
6122 @noindent
6123 because in many languages the translators want to replace the @samp{%d}
6124 with an explicit word in the singular case, just like ``one'' in English,
6125 and C format strings cannot consume the second argument but skip the first
6126 argument.  Instead, you have to reorder the arguments so that @samp{n}
6127 comes last:
6128
6129 @smallexample
6130 printf (ngettext ("%$2d file removed from directory %$1s",
6131                   "%$2d files removed from directory %$1s",
6132                   n),
6133         dir, n);
6134 @end smallexample
6135
6136 @noindent
6137 See @ref{c-format} for details about this argument reordering syntax.
6138
6139 When you know that the value of @code{n} is within a given range, you can
6140 specify it as a comment directed to the @code{xgettext} tool.  This
6141 information may help translators to use more adequate translations.  Like
6142 this:
6143
6144 @smallexample
6145 if (days > 7 && days < 14)
6146   /* xgettext: range: 1..6 */
6147   printf (ngettext ("one week and one day", "one week and %d days",
6148                     days - 7),
6149           days - 7);
6150 @end smallexample
6151
6152 It is also possible to use this function when the strings don't contain a
6153 cardinal number:
6154
6155 @smallexample
6156 puts (ngettext ("Delete the selected file?",
6157                 "Delete the selected files?",
6158                 n));
6159 @end smallexample
6160
6161 In this case the number @var{n} is only used to choose the plural form.
6162 @end deftypefun
6163
6164 @deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
6165 The @code{dngettext} is similar to the @code{dgettext} function in the
6166 way the message catalog is selected.  The difference is that it takes
6167 two extra parameter to provide the correct plural form.  These two
6168 parameters are handled in the same way @code{ngettext} handles them.
6169 @end deftypefun
6170
6171 @deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category})
6172 The @code{dcngettext} is similar to the @code{dcgettext} function in the
6173 way the message catalog is selected.  The difference is that it takes
6174 two extra parameter to provide the correct plural form.  These two
6175 parameters are handled in the same way @code{ngettext} handles them.
6176 @end deftypefun
6177
6178 Now, how do these functions solve the problem of the plural forms?
6179 Without the input of linguists (which was not available) it was not
6180 possible to determine whether there are only a few different forms in
6181 which plural forms are formed or whether the number can increase with
6182 every new supported language.
6183
6184 Therefore the solution implemented is to allow the translator to specify
6185 the rules of how to select the plural form.  Since the formula varies
6186 with every language this is the only viable solution except for
6187 hardcoding the information in the code (which still would require the
6188 possibility of extensions to not prevent the use of new languages).
6189
6190 @cindex specifying plural form in a PO file
6191 @kwindex nplurals@r{, in a PO file header}
6192 @kwindex plural@r{, in a PO file header}
6193 The information about the plural form selection has to be stored in the
6194 header entry of the PO file (the one with the empty @code{msgid} string).
6195 The plural form information looks like this:
6196
6197 @smallexample
6198 Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
6199 @end smallexample
6200
6201 The @code{nplurals} value must be a decimal number which specifies how
6202 many different plural forms exist for this language.  The string
6203 following @code{plural} is an expression which is using the C language
6204 syntax.  Exceptions are that no negative numbers are allowed, numbers
6205 must be decimal, and the only variable allowed is @code{n}.  Spaces are
6206 allowed in the expression, but backslash-newlines are not; in the
6207 examples below the backslash-newlines are present for formatting purposes
6208 only.  This expression will be evaluated whenever one of the functions
6209 @code{ngettext}, @code{dngettext}, or @code{dcngettext} is called.  The
6210 numeric value passed to these functions is then substituted for all uses
6211 of the variable @code{n} in the expression.  The resulting value then
6212 must be greater or equal to zero and smaller than the value given as the
6213 value of @code{nplurals}.
6214
6215 @noindent
6216 @cindex plural form formulas
6217 The following rules are known at this point.  The language with families
6218 are listed.  But this does not necessarily mean the information can be
6219 generalized for the whole family (as can be easily seen in the table
6220 below).@footnote{Additions are welcome.  Send appropriate information to
6221 @email{bug-gnu-gettext@@gnu.org} and @email{bug-glibc-manual@@gnu.org}.}
6222
6223 @table @asis
6224 @item Only one form:
6225 Some languages only require one single form.  There is no distinction
6226 between the singular and plural form.  An appropriate header entry
6227 would look like this:
6228
6229 @smallexample
6230 Plural-Forms: nplurals=1; plural=0;
6231 @end smallexample
6232
6233 @noindent
6234 Languages with this property include:
6235
6236 @table @asis
6237 @item Asian family
6238 Japanese, @c   122.1 million speakers
6239 Vietnamese, @c  68.6 million speakers
6240 Korean @c       66.3 million speakers
6241 @end table
6242
6243 @item Two forms, singular used for one only
6244 This is the form used in most existing programs since it is what English
6245 is using.  A header entry would look like this:
6246
6247 @smallexample
6248 Plural-Forms: nplurals=2; plural=n != 1;
6249 @end smallexample
6250
6251 (Note: this uses the feature of C expressions that boolean expressions
6252 have to value zero or one.)
6253
6254 @noindent
6255 Languages with this property include:
6256
6257 @table @asis
6258 @item Germanic family
6259 English, @c    328.0 million speakers
6260 German, @c      96.9 million speakers
6261 Dutch, @c       21.7 million speakers
6262 Swedish, @c      8.3 million speakers
6263 Danish, @c       5.6 million speakers
6264 Norwegian, @c    4.6 million speakers
6265 Faroese @c       0.05 million speakers
6266 @item Romanic family
6267 Spanish, @c    328.5 million speakers
6268 Portuguese, @c 178.0 million speakers - 163 million Brazilian Portuguese
6269 Italian, @c     61.7 million speakers
6270 Bulgarian @c     9.1 million speakers
6271 @item Latin/Greek family
6272 Greek @c        13.1 million speakers
6273 @item Finno-Ugric family
6274 Finnish, @c      5.0 million speakers
6275 Estonian @c      1.0 million speakers
6276 @item Semitic family
6277 Hebrew @c        5.3 million speakers
6278 @item Artificial
6279 Esperanto @c       2 million speakers
6280 @end table
6281
6282 @noindent
6283 Other languages using the same header entry are:
6284
6285 @table @asis
6286 @item Finno-Ugric family
6287 Hungarian @c   12.5 million speakers
6288 @item Turkic/Altaic family
6289 Turkish @c     50.8 million speakers
6290 @end table
6291
6292 Hungarian does not appear to have a plural if you look at sentences involving
6293 cardinal numbers.  For example, ``1 apple'' is ``1 alma'', and ``123 apples'' is
6294 ``123 alma''.  But when the number is not explicit, the distinction between
6295 singular and plural exists: ``the apple'' is ``az alma'', and ``the apples'' is
6296 ``az alm@'{a}k''.  Since @code{ngettext} has to support both types of sentences,
6297 it is classified here, under ``two forms''.
6298
6299 The same holds for Turkish: ``1 apple'' is ``1 elma'', and ``123 apples'' is
6300 ``123 elma''.  But when the number is omitted, the distinction between singular
6301 and plural exists: ``the apple'' is ``elma'', and ``the apples'' is
6302 ``elmalar''.
6303
6304 @item Two forms, singular used for zero and one
6305 Exceptional case in the language family.  The header entry would be:
6306
6307 @smallexample
6308 Plural-Forms: nplurals=2; plural=n>1;
6309 @end smallexample
6310
6311 @noindent
6312 Languages with this property include:
6313
6314 @table @asis
6315 @item Romanic family
6316 Brazilian Portuguese, @c 163 million speakers
6317 French @c                 67.8 million speakers
6318 @end table
6319
6320 @item Three forms, special case for zero
6321 The header entry would be:
6322
6323 @smallexample
6324 Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2;
6325 @end smallexample
6326
6327 @noindent
6328 Languages with this property include:
6329
6330 @table @asis
6331 @item Baltic family
6332 Latvian @c     1.5 million speakers
6333 @end table
6334
6335 @item Three forms, special cases for one and two
6336 The header entry would be:
6337
6338 @smallexample
6339 Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;
6340 @end smallexample
6341
6342 @noindent
6343 Languages with this property include:
6344
6345 @table @asis
6346 @item Celtic
6347 Gaeilge (Irish) @c 0.4 million speakers
6348 @end table
6349
6350 @item Three forms, special case for numbers ending in 00 or [2-9][0-9]
6351 The header entry would be:
6352
6353 @smallexample
6354 Plural-Forms: nplurals=3; \
6355     plural=n==1 ? 0 : (n==0 || (n%100 > 0 && n%100 < 20)) ? 1 : 2;
6356 @end smallexample
6357
6358 @noindent
6359 Languages with this property include:
6360
6361 @table @asis
6362 @item Romanic family
6363 Romanian @c    23.4 million speakers
6364 @end table
6365
6366 @item Three forms, special case for numbers ending in 1[2-9]
6367 The header entry would look like this:
6368
6369 @smallexample
6370 Plural-Forms: nplurals=3; \
6371     plural=n%10==1 && n%100!=11 ? 0 : \
6372            n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2;
6373 @end smallexample
6374
6375 @noindent
6376 Languages with this property include:
6377
6378 @table @asis
6379 @item Baltic family
6380 Lithuanian @c  3.2 million speakers
6381 @end table
6382
6383 @item Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]
6384 The header entry would look like this:
6385
6386 @smallexample
6387 Plural-Forms: nplurals=3; \
6388     plural=n%10==1 && n%100!=11 ? 0 : \
6389            n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
6390 @end smallexample
6391
6392 @noindent
6393 Languages with this property include:
6394
6395 @table @asis
6396 @item Slavic family
6397 Russian, @c    143.6 million speakers
6398 Ukrainian, @c   37.0 million speakers
6399 Belarusian, @c   8.6 million speakers
6400 Serbian, @c      7.0 million speakers
6401 Croatian @c      5.5 million speakers
6402 @end table
6403
6404 @item Three forms, special cases for 1 and 2, 3, 4
6405 The header entry would look like this:
6406
6407 @smallexample
6408 Plural-Forms: nplurals=3; \
6409     plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2;
6410 @end smallexample
6411
6412 @noindent
6413 Languages with this property include:
6414
6415 @table @asis
6416 @item Slavic family
6417 Czech, @c      9.5 million speakers
6418 Slovak @c      5.0 million speakers
6419 @end table
6420
6421 @item Three forms, special case for one and some numbers ending in 2, 3, or 4
6422 The header entry would look like this:
6423
6424 @smallexample
6425 Plural-Forms: nplurals=3; \
6426     plural=n==1 ? 0 : \
6427            n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
6428 @end smallexample
6429
6430 @noindent
6431 Languages with this property include:
6432
6433 @table @asis
6434 @item Slavic family
6435 Polish @c      40.0 million speakers
6436 @end table
6437
6438 @item Four forms, special case for one and all numbers ending in 02, 03, or 04
6439 The header entry would look like this:
6440
6441 @smallexample
6442 Plural-Forms: nplurals=4; \
6443     plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;
6444 @end smallexample
6445
6446 @noindent
6447 Languages with this property include:
6448
6449 @table @asis
6450 @item Slavic family
6451 Slovenian @c   1.9 million speakers
6452 @end table
6453 @end table
6454
6455 You might now ask, @code{ngettext} handles only numbers @var{n} of type
6456 @samp{unsigned long}.  What about larger integer types?  What about negative
6457 numbers?  What about floating-point numbers?
6458
6459 About larger integer types, such as @samp{uintmax_t} or
6460 @samp{unsigned long long}: they can be handled by reducing the value to a
6461 range that fits in an @samp{unsigned long}.  Simply casting the value to
6462 @samp{unsigned long} would not do the right thing, since it would treat
6463 @code{ULONG_MAX + 1} like zero, @code{ULONG_MAX + 2} like singular, and
6464 the like.  Here you can exploit the fact that all mentioned plural form
6465 formulas eventually become periodic, with a period that is a divisor of 100
6466 (or 1000 or 1000000).  So, when you reduce a large value to another one in
6467 the range [1000000, 1999999] that ends in the same 6 decimal digits, you
6468 can assume that it will lead to the same plural form selection.  This code
6469 does this:
6470
6471 @smallexample
6472 #include <inttypes.h>
6473 uintmax_t nbytes = ...;
6474 printf (ngettext ("The file has %"PRIuMAX" byte.",
6475                   "The file has %"PRIuMAX" bytes.",
6476                   (nbytes > ULONG_MAX
6477                    ? (nbytes % 1000000) + 1000000
6478                    : nbytes)),
6479         nbytes);
6480 @end smallexample
6481
6482 Negative and floating-point values usually represent physical entities for
6483 which singular and plural don't clearly apply.  In such cases, there is no
6484 need to use @code{ngettext}; a simple @code{gettext} call with a form suitable
6485 for all values will do.  For example:
6486
6487 @smallexample
6488 printf (gettext ("Time elapsed: %.3f seconds"),
6489         num_milliseconds * 0.001);
6490 @end smallexample
6491
6492 @noindent
6493 Even if @var{num_milliseconds} happens to be a multiple of 1000, the output
6494 @smallexample
6495 Time elapsed: 1.000 seconds
6496 @end smallexample
6497 @noindent
6498 is acceptable in English, and similarly for other languages.
6499
6500 The translators' perspective regarding plural forms is explained in
6501 @ref{Translating plural forms}.
6502
6503 @node Optimized gettext,  , Plural forms, gettext
6504 @subsection Optimization of the *gettext functions
6505 @cindex optimization of @code{gettext} functions
6506
6507 At this point of the discussion we should talk about an advantage of the
6508 GNU @code{gettext} implementation.  Some readers might have pointed out
6509 that an internationalized program might have a poor performance if some
6510 string has to be translated in an inner loop.  While this is unavoidable
6511 when the string varies from one run of the loop to the other it is
6512 simply a waste of time when the string is always the same.  Take the
6513 following example:
6514
6515 @example
6516 @group
6517 @{
6518   while (@dots{})
6519     @{
6520       puts (gettext ("Hello world"));
6521     @}
6522 @}
6523 @end group
6524 @end example
6525
6526 @noindent
6527 When the locale selection does not change between two runs the resulting
6528 string is always the same.  One way to use this is:
6529
6530 @example
6531 @group
6532 @{
6533   str = gettext ("Hello world");
6534   while (@dots{})
6535     @{
6536       puts (str);
6537     @}
6538 @}
6539 @end group
6540 @end example
6541
6542 @noindent
6543 But this solution is not usable in all situation (e.g.@: when the locale
6544 selection changes) nor does it lead to legible code.
6545
6546 For this reason, GNU @code{gettext} caches previous translation results.
6547 When the same translation is requested twice, with no new message
6548 catalogs being loaded in between, @code{gettext} will, the second time,
6549 find the result through a single cache lookup.
6550
6551 @node Comparison, Using libintl.a, gettext, Programmers
6552 @section Comparing the Two Interfaces
6553 @cindex @code{gettext} vs @code{catgets}
6554 @cindex comparison of interfaces
6555
6556 @c FIXME: arguments to catgets vs. gettext
6557 @c Partly done 950718 -- drepper
6558
6559 The following discussion is perhaps a little bit colored.  As said
6560 above we implemented GNU @code{gettext} following the Uniforum
6561 proposal and this surely has its reasons.  But it should show how we
6562 came to this decision.
6563
6564 First we take a look at the developing process.  When we write an
6565 application using NLS provided by @code{gettext} we proceed as always.
6566 Only when we come to a string which might be seen by the users and thus
6567 has to be translated we use @code{gettext("@dots{}")} instead of
6568 @code{"@dots{}"}.  At the beginning of each source file (or in a central
6569 header file) we define
6570
6571 @example
6572 #define gettext(String) (String)
6573 @end example
6574
6575 Even this definition can be avoided when the system supports the
6576 @code{gettext} function in its C library.  When we compile this code the
6577 result is the same as if no NLS code is used.  When  you take a look at
6578 the GNU @code{gettext} code you will see that we use @code{_("@dots{}")}
6579 instead of @code{gettext("@dots{}")}.  This reduces the number of
6580 additional characters per translatable string to @emph{3} (in words:
6581 three).
6582
6583 When now a production version of the program is needed we simply replace
6584 the definition
6585
6586 @example
6587 #define _(String) (String)
6588 @end example
6589
6590 @noindent
6591 by
6592
6593 @cindex include file @file{libintl.h}
6594 @example
6595 #include <libintl.h>
6596 #define _(String) gettext (String)
6597 @end example
6598
6599 @noindent
6600 Additionally we run the program @file{xgettext} on all source code file
6601 which contain translatable strings and that's it: we have a running
6602 program which does not depend on translations to be available, but which
6603 can use any that becomes available.
6604
6605 @cindex @code{N_}, a convenience macro
6606 The same procedure can be done for the @code{gettext_noop} invocations
6607 (@pxref{Special cases}).  One usually defines @code{gettext_noop} as a
6608 no-op macro.  So you should consider the following code for your project:
6609
6610 @example
6611 #define gettext_noop(String) String
6612 #define N_(String) gettext_noop (String)
6613 @end example
6614
6615 @code{N_} is a short form similar to @code{_}.  The @file{Makefile} in
6616 the @file{po/} directory of GNU @code{gettext} knows by default both of the
6617 mentioned short forms so you are invited to follow this proposal for
6618 your own ease.
6619
6620 Now to @code{catgets}.  The main problem is the work for the
6621 programmer.  Every time he comes to a translatable string he has to
6622 define a number (or a symbolic constant) which has also be defined in
6623 the message catalog file.  He also has to take care for duplicate
6624 entries, duplicate message IDs etc.  If he wants to have the same
6625 quality in the message catalog as the GNU @code{gettext} program
6626 provides he also has to put the descriptive comments for the strings and
6627 the location in all source code files in the message catalog.  This is
6628 nearly a Mission: Impossible.
6629
6630 But there are also some points people might call advantages speaking for
6631 @code{catgets}.  If you have a single word in a string and this string
6632 is used in different contexts it is likely that in one or the other
6633 language the word has different translations.  Example:
6634
6635 @example
6636 printf ("%s: %d", gettext ("number"), number_of_errors)
6637
6638 printf ("you should see %d %s", number_count,
6639         number_count == 1 ? gettext ("number") : gettext ("numbers"))
6640 @end example
6641
6642 Here we have to translate two times the string @code{"number"}.  Even
6643 if you do not speak a language beside English it might be possible to
6644 recognize that the two words have a different meaning.  In German the
6645 first appearance has to be translated to @code{"Anzahl"} and the second
6646 to @code{"Zahl"}.
6647
6648 Now you can say that this example is really esoteric.  And you are
6649 right!  This is exactly how we felt about this problem and decide that
6650 it does not weight that much.  The solution for the above problem could
6651 be very easy:
6652
6653 @example
6654 printf ("%s %d", gettext ("number:"), number_of_errors)
6655
6656 printf (number_count == 1 ? gettext ("you should see %d number")
6657                           : gettext ("you should see %d numbers"),
6658         number_count)
6659 @end example
6660
6661 We believe that we can solve all conflicts with this method.  If it is
6662 difficult one can also consider changing one of the conflicting string a
6663 little bit.  But it is not impossible to overcome.
6664
6665 @code{catgets} allows same original entry to have different translations,
6666 but @code{gettext} has another, scalable approach for solving ambiguities
6667 of this kind: @xref{Ambiguities}.
6668
6669 @node Using libintl.a, gettext grok, Comparison, Programmers
6670 @section Using libintl.a in own programs
6671
6672 Starting with version 0.9.4 the library @code{libintl.h} should be
6673 self-contained.  I.e., you can use it in your own programs without
6674 providing additional functions.  The @file{Makefile} will put the header
6675 and the library in directories selected using the @code{$(prefix)}.
6676
6677 @node gettext grok, Temp Programmers, Using libintl.a, Programmers
6678 @section Being a @code{gettext} grok
6679
6680 @strong{ NOTE: } This documentation section is outdated and needs to be
6681 revised.
6682
6683 To fully exploit the functionality of the GNU @code{gettext} library it
6684 is surely helpful to read the source code.  But for those who don't want
6685 to spend that much time in reading the (sometimes complicated) code here
6686 is a list comments:
6687
6688 @itemize @bullet
6689 @item Changing the language at runtime
6690 @cindex language selection at runtime
6691
6692 For interactive programs it might be useful to offer a selection of the
6693 used language at runtime.  To understand how to do this one need to know
6694 how the used language is determined while executing the @code{gettext}
6695 function.  The method which is presented here only works correctly
6696 with the GNU implementation of the @code{gettext} functions.
6697
6698 In the function @code{dcgettext} at every call the current setting of
6699 the highest priority environment variable is determined and used.
6700 Highest priority means here the following list with decreasing
6701 priority:
6702
6703 @enumerate
6704 @vindex LANGUAGE@r{, environment variable}
6705 @item @code{LANGUAGE}
6706 @vindex LC_ALL@r{, environment variable}
6707 @item @code{LC_ALL}
6708 @vindex LC_CTYPE@r{, environment variable}
6709 @vindex LC_NUMERIC@r{, environment variable}
6710 @vindex LC_TIME@r{, environment variable}
6711 @vindex LC_COLLATE@r{, environment variable}
6712 @vindex LC_MONETARY@r{, environment variable}
6713 @vindex LC_MESSAGES@r{, environment variable}
6714 @item @code{LC_xxx}, according to selected locale category
6715 @vindex LANG@r{, environment variable}
6716 @item @code{LANG}
6717 @end enumerate
6718
6719 Afterwards the path is constructed using the found value and the
6720 translation file is loaded if available.
6721
6722 What happens now when the value for, say, @code{LANGUAGE} changes?  According
6723 to the process explained above the new value of this variable is found
6724 as soon as the @code{dcgettext} function is called.  But this also means
6725 the (perhaps) different message catalog file is loaded.  In other
6726 words: the used language is changed.
6727
6728 But there is one little hook.  The code for gcc-2.7.0 and up provides
6729 some optimization.  This optimization normally prevents the calling of
6730 the @code{dcgettext} function as long as no new catalog is loaded.  But
6731 if @code{dcgettext} is not called the program also cannot find the
6732 @code{LANGUAGE} variable be changed (@pxref{Optimized gettext}).  A
6733 solution for this is very easy.  Include the following code in the
6734 language switching function.
6735
6736 @example
6737   /* Change language.  */
6738   setenv ("LANGUAGE", "fr", 1);
6739
6740   /* Make change known.  */
6741   @{
6742     extern int  _nl_msg_cat_cntr;
6743     ++_nl_msg_cat_cntr;
6744   @}
6745 @end example
6746
6747 @cindex @code{_nl_msg_cat_cntr}
6748 The variable @code{_nl_msg_cat_cntr} is defined in @file{loadmsgcat.c}.
6749 You don't need to know what this is for.  But it can be used to detect
6750 whether a @code{gettext} implementation is GNU gettext and not non-GNU
6751 system's native gettext implementation.
6752
6753 @end itemize
6754
6755 @node Temp Programmers,  , gettext grok, Programmers
6756 @section Temporary Notes for the Programmers Chapter
6757
6758 @strong{ NOTE: } This documentation section is outdated and needs to be
6759 revised.
6760
6761 @menu
6762 * Temp Implementations::        Temporary - Two Possible Implementations
6763 * Temp catgets::                Temporary - About @code{catgets}
6764 * Temp WSI::                    Temporary - Why a single implementation
6765 * Temp Notes::                  Temporary - Notes
6766 @end menu
6767
6768 @node Temp Implementations, Temp catgets, Temp Programmers, Temp Programmers
6769 @subsection Temporary - Two Possible Implementations
6770
6771 There are two competing methods for language independent messages:
6772 the X/Open @code{catgets} method, and the Uniforum @code{gettext}
6773 method.  The @code{catgets} method indexes messages by integers; the
6774 @code{gettext} method indexes them by their English translations.
6775 The @code{catgets} method has been around longer and is supported
6776 by more vendors.  The @code{gettext} method is supported by Sun,
6777 and it has been heard that the COSE multi-vendor initiative is
6778 supporting it.  Neither method is a POSIX standard; the POSIX.1
6779 committee had a lot of disagreement in this area.
6780
6781 Neither one is in the POSIX standard.  There was much disagreement
6782 in the POSIX.1 committee about using the @code{gettext} routines
6783 vs. @code{catgets} (XPG).  In the end the committee couldn't
6784 agree on anything, so no messaging system was included as part
6785 of the standard.  I believe the informative annex of the standard
6786 includes the XPG3 messaging interfaces, ``@dots{}as an example of
6787 a messaging system that has been implemented@dots{}''
6788
6789 They were very careful not to say anywhere that you should use one
6790 set of interfaces over the other.  For more on this topic please
6791 see the Programming for Internationalization FAQ.
6792
6793 @node Temp catgets, Temp WSI, Temp Implementations, Temp Programmers
6794 @subsection Temporary - About @code{catgets}
6795
6796 There have been a few discussions of late on the use of
6797 @code{catgets} as a base.  I think it important to present both
6798 sides of the argument and hence am opting to play devil's advocate
6799 for a little bit.
6800
6801 I'll not deny the fact that @code{catgets} could have been designed
6802 a lot better.  It currently has quite a number of limitations and
6803 these have already been pointed out.
6804
6805 However there is a great deal to be said for consistency and
6806 standardization.  A common recurring problem when writing Unix
6807 software is the myriad portability problems across Unix platforms.
6808 It seems as if every Unix vendor had a look at the operating system
6809 and found parts they could improve upon.  Undoubtedly, these
6810 modifications are probably innovative and solve real problems.
6811 However, software developers have a hard time keeping up with all
6812 these changes across so many platforms.
6813
6814 And this has prompted the Unix vendors to begin to standardize their
6815 systems.  Hence the impetus for Spec1170.  Every major Unix vendor
6816 has committed to supporting this standard and every Unix software
6817 developer waits with glee the day they can write software to this
6818 standard and simply recompile (without having to use autoconf)
6819 across different platforms.
6820
6821 As I understand it, Spec1170 is roughly based upon version 4 of the
6822 X/Open Portability Guidelines (XPG4).  Because @code{catgets} and
6823 friends are defined in XPG4, I'm led to believe that @code{catgets}
6824 is a part of Spec1170 and hence will become a standardized component
6825 of all Unix systems.
6826
6827 @node Temp WSI, Temp Notes, Temp catgets, Temp Programmers
6828 @subsection Temporary - Why a single implementation
6829
6830 Now it seems kind of wasteful to me to have two different systems
6831 installed for accessing message catalogs.  If we do want to remedy
6832 @code{catgets} deficiencies why don't we try to expand @code{catgets}
6833 (in a compatible manner) rather than implement an entirely new system.
6834 Otherwise, we'll end up with two message catalog access systems installed
6835 with an operating system - one set of routines for packages using GNU
6836 @code{gettext} for their internationalization, and another set of routines
6837 (catgets) for all other software.  Bloated?
6838
6839 Supposing another catalog access system is implemented.  Which do
6840 we recommend?  At least for Linux, we need to attract as many
6841 software developers as possible.  Hence we need to make it as easy
6842 for them to port their software as possible.  Which means supporting
6843 @code{catgets}.  We will be implementing the @code{libintl} code
6844 within our @code{libc}, but does this mean we also have to incorporate
6845 another message catalog access scheme within our @code{libc} as well?
6846 And what about people who are going to be using the @code{libintl}
6847 + non-@code{catgets} routines.  When they port their software to
6848 other platforms, they're now going to have to include the front-end
6849 (@code{libintl}) code plus the back-end code (the non-@code{catgets}
6850 access routines) with their software instead of just including the
6851 @code{libintl} code with their software.
6852
6853 Message catalog support is however only the tip of the iceberg.
6854 What about the data for the other locale categories?  They also have
6855 a number of deficiencies.  Are we going to abandon them as well and
6856 develop another duplicate set of routines (should @code{libintl}
6857 expand beyond message catalog support)?
6858
6859 Like many parts of Unix that can be improved upon, we're stuck with balancing
6860 compatibility with the past with useful improvements and innovations for
6861 the future.
6862
6863 @node Temp Notes,  , Temp WSI, Temp Programmers
6864 @subsection Temporary - Notes
6865
6866 X/Open agreed very late on the standard form so that many
6867 implementations differ from the final form.  Both of my system (old
6868 Linux catgets and Ultrix-4) have a strange variation.
6869
6870 OK.  After incorporating the last changes I have to spend some time on
6871 making the GNU/Linux @code{libc} @code{gettext} functions.  So in future
6872 Solaris is not the only system having @code{gettext}.
6873
6874 @node Translators, Maintainers, Programmers, Top
6875 @chapter The Translator's View
6876
6877 @c FIXME: Reorganize whole chapter.
6878
6879 @menu
6880 * Trans Intro 0::               Introduction 0
6881 * Trans Intro 1::               Introduction 1
6882 * Discussions::                 Discussions
6883 * Organization::                Organization
6884 * Information Flow::            Information Flow
6885 * Translating plural forms::    How to fill in @code{msgstr[0]}, @code{msgstr[1]}
6886 * Prioritizing messages::       How to find which messages to translate first
6887 @end menu
6888
6889 @node Trans Intro 0, Trans Intro 1, Translators, Translators
6890 @section Introduction 0
6891
6892 @strong{ NOTE: } This documentation section is outdated and needs to be
6893 revised.
6894
6895 Free software is going international!  The Translation Project is a way
6896 to get maintainers, translators and users all together, so free software
6897 will gradually become able to speak many native languages.
6898
6899 The GNU @code{gettext} tool set contains @emph{everything} maintainers
6900 need for internationalizing their packages for messages.  It also
6901 contains quite useful tools for helping translators at localizing
6902 messages to their native language, once a package has already been
6903 internationalized.
6904
6905 To achieve the Translation Project, we need many interested
6906 people who like their own language and write it well, and who are also
6907 able to synergize with other translators speaking the same language.
6908 If you'd like to volunteer to @emph{work} at translating messages,
6909 please send mail to your translating team.
6910
6911 Each team has its own mailing list, courtesy of Linux
6912 International.  You may reach your translating team at the address
6913 @file{@var{ll}@@li.org}, replacing @var{ll} by the two-letter @w{ISO 639}
6914 code for your language.  Language codes are @emph{not} the same as
6915 country codes given in @w{ISO 3166}.  The following translating teams
6916 exist:
6917
6918 @quotation
6919 Chinese @code{zh}, Czech @code{cs}, Danish @code{da}, Dutch @code{nl},
6920 Esperanto @code{eo}, Finnish @code{fi}, French @code{fr}, Irish
6921 @code{ga}, German @code{de}, Greek @code{el}, Italian @code{it},
6922 Japanese @code{ja}, Indonesian @code{in}, Norwegian @code{no}, Polish
6923 @code{pl}, Portuguese @code{pt}, Russian @code{ru}, Spanish @code{es},
6924 Swedish @code{sv} and Turkish @code{tr}.
6925 @end quotation
6926
6927 @noindent
6928 For example, you may reach the Chinese translating team by writing to
6929 @file{zh@@li.org}.  When you become a member of the translating team
6930 for your own language, you may subscribe to its list.  For example,
6931 Swedish people can send a message to @w{@file{sv-request@@li.org}},
6932 having this message body:
6933
6934 @example
6935 subscribe
6936 @end example
6937
6938 Keep in mind that team members should be interested in @emph{working}
6939 at translations, or at solving translational difficulties, rather than
6940 merely lurking around.  If your team does not exist yet and you want to
6941 start one, please write to @w{@file{coordinator@@translationproject.org}};
6942 you will then reach the coordinator for all translator teams.
6943
6944 A handful of GNU packages have already been adapted and provided
6945 with message translations for several languages.  Translation
6946 teams have begun to organize, using these packages as a starting
6947 point.  But there are many more packages and many languages for
6948 which we have no volunteer translators.  If you would like to
6949 volunteer to work at translating messages, please send mail to
6950 @file{coordinator@@translationproject.org} indicating what language(s)
6951 you can work on.
6952
6953 @node Trans Intro 1, Discussions, Trans Intro 0, Translators
6954 @section Introduction 1
6955
6956 @strong{ NOTE: } This documentation section is outdated and needs to be
6957 revised.
6958
6959 This is now official, GNU is going international!  Here is the
6960 announcement submitted for the January 1995 GNU Bulletin:
6961
6962 @quotation
6963 A handful of GNU packages have already been adapted and provided
6964 with message translations for several languages.  Translation
6965 teams have begun to organize, using these packages as a starting
6966 point.  But there are many more packages and many languages
6967 for which we have no volunteer translators.  If you'd like to
6968 volunteer to work at translating messages, please send mail to
6969 @samp{coordinator@@translationproject.org} indicating what language(s)
6970 you can work on.
6971 @end quotation
6972
6973 This document should answer many questions for those who are curious about
6974 the process or would like to contribute.  Please at least skim over it,
6975 hoping to cut down a little of the high volume of e-mail generated by this
6976 collective effort towards internationalization of free software.
6977
6978 Most free programming which is widely shared is done in English, and
6979 currently, English is used as the main communicating language between
6980 national communities collaborating to free software.  This very document
6981 is written in English.  This will not change in the foreseeable future.
6982
6983 However, there is a strong appetite from national communities for
6984 having more software able to write using national language and habits,
6985 and there is an on-going effort to modify free software in such a way
6986 that it becomes able to do so.  The experiments driven so far raised
6987 an enthusiastic response from pretesters, so we believe that
6988 internationalization of free software is dedicated to succeed.
6989
6990 For suggestion clarifications, additions or corrections to this
6991 document, please e-mail to @file{coordinator@@translationproject.org}.
6992
6993 @node Discussions, Organization, Trans Intro 1, Translators
6994 @section Discussions
6995
6996 @strong{ NOTE: } This documentation section is outdated and needs to be
6997 revised.
6998
6999 Facing this internationalization effort, a few users expressed their
7000 concerns.  Some of these doubts are presented and discussed, here.
7001
7002 @itemize @bullet
7003 @item Smaller groups
7004
7005 Some languages are not spoken by a very large number of people, so people
7006 speaking them sometimes consider that there may not be all that much
7007 demand such versions of free software packages.  Moreover, many people
7008 being @emph{into computers}, in some countries, generally seem to prefer
7009 English versions of their software.
7010
7011 On the other end, people might enjoy their own language a lot, and be
7012 very motivated at providing to themselves the pleasure of having their
7013 beloved free software speaking their mother tongue.  They do themselves
7014 a personal favor, and do not pay that much attention to the number of
7015 people benefiting of their work.
7016
7017 @item Misinterpretation
7018
7019 Other users are shy to push forward their own language, seeing in this
7020 some kind of misplaced propaganda.  Someone thought there must be some
7021 users of the language over the networks pestering other people with it.
7022
7023 But any spoken language is worth localization, because there are
7024 people behind the language for whom the language is important and
7025 dear to their hearts.
7026
7027 @item Odd translations
7028
7029 The biggest problem is to find the right translations so that
7030 everybody can understand the messages.  Translations are usually a
7031 little odd.  Some people get used to English, to the extent they may
7032 find translations into their own language ``rather pushy, obnoxious
7033 and sometimes even hilarious.''  As a French speaking man, I have
7034 the experience of those instruction manuals for goods, so poorly
7035 translated in French in Korea or Taiwan@dots{}
7036
7037 The fact is that we sometimes have to create a kind of national
7038 computer culture, and this is not easy without the collaboration of
7039 many people liking their mother tongue.  This is why translations are
7040 better achieved by people knowing and loving their own language, and
7041 ready to work together at improving the results they obtain.
7042
7043 @item Dependencies over the GPL or LGPL
7044
7045 Some people wonder if using GNU @code{gettext} necessarily brings their
7046 package under the protective wing of the GNU General Public License or
7047 the GNU Lesser General Public License, when they do not want to make
7048 their program free, or want other kinds of freedom.  The simplest
7049 answer is ``normally not''.
7050
7051 The @code{gettext-runtime} part of GNU @code{gettext}, i.e.@: the
7052 contents of @code{libintl}, is covered by the GNU Lesser General Public
7053 License.  The @code{gettext-tools} part of GNU @code{gettext}, i.e.@: the
7054 rest of the GNU @code{gettext} package, is covered by the GNU General
7055 Public License.
7056
7057 The mere marking of localizable strings in a package, or conditional
7058 inclusion of a few lines for initialization, is not really including
7059 GPL'ed or LGPL'ed code.  However, since the localization routines in
7060 @code{libintl} are under the LGPL, the LGPL needs to be considered.
7061 It gives the right to distribute the complete unmodified source of
7062 @code{libintl} even with non-free programs.  It also gives the right
7063 to use @code{libintl} as a shared library, even for non-free programs.
7064 But it gives the right to use @code{libintl} as a static library or
7065 to incorporate @code{libintl} into another library only to free
7066 software.
7067
7068 @end itemize
7069
7070 @node Organization, Information Flow, Discussions, Translators
7071 @section Organization
7072
7073 @strong{ NOTE: } This documentation section is outdated and needs to be
7074 revised.
7075
7076 On a larger scale, the true solution would be to organize some kind of
7077 fairly precise set up in which volunteers could participate.  I gave
7078 some thought to this idea lately, and realize there will be some
7079 touchy points.  I thought of writing to Richard Stallman to launch
7080 such a project, but feel it might be good to shake out the ideas
7081 between ourselves first.  Most probably that Linux International has
7082 some experience in the field already, or would like to orchestrate
7083 the volunteer work, maybe.  Food for thought, in any case!
7084
7085 I guess we have to setup something early, somehow, that will help
7086 many possible contributors of the same language to interlock and avoid
7087 work duplication, and further be put in contact for solving together
7088 problems particular to their tongue (in most languages, there are many
7089 difficulties peculiar to translating technical English).  My Swedish
7090 contributor acknowledged these difficulties, and I'm well aware of
7091 them for French.
7092
7093 This is surely not a technical issue, but we should manage so the
7094 effort of locale contributors be maximally useful, despite the national
7095 team layer interface between contributors and maintainers.
7096
7097 The Translation Project needs some setup for coordinating language
7098 coordinators.  Localizing evolving programs will surely
7099 become a permanent and continuous activity in the free software community,
7100 once well started.
7101 The setup should be minimally completed and tested before GNU
7102 @code{gettext} becomes an official reality.  The e-mail address
7103 @file{coordinator@@translationproject.org} has been set up for receiving
7104 offers from volunteers and general e-mail on these topics.  This address
7105 reaches the Translation Project coordinator.
7106
7107 @menu
7108 * Central Coordination::        Central Coordination
7109 * National Teams::              National Teams
7110 * Mailing Lists::               Mailing Lists
7111 @end menu
7112
7113 @node Central Coordination, National Teams, Organization, Organization
7114 @subsection Central Coordination
7115
7116 I also think GNU will need sooner than it thinks, that someone set up
7117 a way to organize and coordinate these groups.  Some kind of group
7118 of groups.  My opinion is that it would be good that GNU delegates
7119 this task to a small group of collaborating volunteers, shortly.
7120 Perhaps in @file{gnu.announce} a list of this national committee's
7121 can be published.
7122
7123 My role as coordinator would simply be to refer to Ulrich any German
7124 speaking volunteer interested to localization of free software packages, and
7125 maybe helping national groups to initially organize, while maintaining
7126 national registries for until national groups are ready to take over.
7127 In fact, the coordinator should ease volunteers to get in contact with
7128 one another for creating national teams, which should then select
7129 one coordinator per language, or country (regionalized language).
7130 If well done, the coordination should be useful without being an
7131 overwhelming task, the time to put delegations in place.
7132
7133 @node National Teams, Mailing Lists, Central Coordination, Organization
7134 @subsection National Teams
7135
7136 I suggest we look for volunteer coordinators/editors for individual
7137 languages.  These people will scan contributions of translation files
7138 for various programs, for their own languages, and will ensure high
7139 and uniform standards of diction.
7140
7141 From my current experience with other people in these days, those who
7142 provide localizations are very enthusiastic about the process, and are
7143 more interested in the localization process than in the program they
7144 localize, and want to do many programs, not just one.  This seems
7145 to confirm that having a coordinator/editor for each language is a
7146 good idea.
7147
7148 We need to choose someone who is good at writing clear and concise
7149 prose in the language in question.  That is hard---we can't check
7150 it ourselves.  So we need to ask a few people to judge each others'
7151 writing and select the one who is best.
7152
7153 I announce my prerelease to a few dozen people, and you would not
7154 believe all the discussions it generated already.  I shudder to think
7155 what will happen when this will be launched, for true, officially,
7156 world wide.  Who am I to arbitrate between two Czekolsovak users
7157 contradicting each other, for example?
7158
7159 I assume that your German is not much better than my French so that
7160 I would not be able to judge about these formulations.  What I would
7161 suggest is that for each language there is a group for people who
7162 maintain the PO files and judge about changes.  I suspect there will
7163 be cultural differences between how such groups of people will behave.
7164 Some will have relaxed ways, reach consensus easily, and have anyone
7165 of the group relate to the maintainers, while others will fight to
7166 death, organize heavy administrations up to national standards, and
7167 use strict channels.
7168
7169 The German team is putting out a good example.  Right now, they are
7170 maybe half a dozen people revising translations of each other and
7171 discussing the linguistic issues.  I do not even have all the names.
7172 Ulrich Drepper is taking care of coordinating the German team.
7173 He subscribed to all my pretest lists, so I do not even have to warn
7174 him specifically of incoming releases.
7175
7176 I'm sure, that is a good idea to get teams for each language working
7177 on translations.  That will make the translations better and more
7178 consistent.
7179
7180 @menu
7181 * Sub-Cultures::                Sub-Cultures
7182 * Organizational Ideas::        Organizational Ideas
7183 @end menu
7184
7185 @node Sub-Cultures, Organizational Ideas, National Teams, National Teams
7186 @subsubsection Sub-Cultures
7187
7188 Taking French for example, there are a few sub-cultures around computers
7189 which developed diverging vocabularies.  Picking volunteers here and
7190 there without addressing this problem in an organized way, soon in the
7191 project, might produce a distasteful mix of internationalized programs,
7192 and possibly trigger endless quarrels among those who really care.
7193
7194 Keeping some kind of unity in the way French localization of
7195 internationalized programs is achieved is a difficult (and delicate) job.
7196 Knowing the latin character of French people (:-), if we take this
7197 the wrong way, we could end up nowhere, or spoil a lot of energies.
7198 Maybe we should begin to address this problem seriously @emph{before}
7199 GNU @code{gettext} become officially published.  And I suspect that this
7200 means soon!
7201
7202 @node Organizational Ideas,  , Sub-Cultures, National Teams
7203 @subsubsection Organizational Ideas
7204
7205 I expect the next big changes after the official release.  Please note
7206 that I use the German translation of the short GPL message.  We need
7207 to set a few good examples before the localization goes out for true
7208 in the free software community.  Here are a few points to discuss:
7209
7210 @itemize @bullet
7211 @item
7212 Each group should have one FTP server (at least one master).
7213
7214 @item
7215 The files on the server should reflect the latest version (of
7216 course!) and it should also contain a RCS directory with the
7217 corresponding archives (I don't have this now).
7218
7219 @item
7220 There should also be a ChangeLog file (this is more useful than the
7221 RCS archive but can be generated automatically from the later by
7222 Emacs).
7223
7224 @item
7225 A @dfn{core group} should judge about questionable changes (for now
7226 this group consists solely by me but I ask some others occasionally;
7227 this also seems to work).
7228
7229 @end itemize
7230
7231 @node Mailing Lists,  , National Teams, Organization
7232 @subsection Mailing Lists
7233
7234 If we get any inquiries about GNU @code{gettext}, send them on to:
7235
7236 @example
7237 @file{coordinator@@translationproject.org}
7238 @end example
7239
7240 The @file{*-pretest} lists are quite useful to me, maybe the idea could
7241 be generalized to many GNU, and non-GNU packages.  But each maintainer
7242 his/her way!
7243
7244 Fran@,{c}ois, we have a mechanism in place here at
7245 @file{gnu.ai.mit.edu} to track teams, support mailing lists for
7246 them and log members.  We have a slight preference that you use it.
7247 If this is OK with you, I can get you clued in.
7248
7249 Things are changing!  A few years ago, when Daniel Fekete and I
7250 asked for a mailing list for GNU localization, nested at the FSF, we
7251 were politely invited to organize it anywhere else, and so did we.
7252 For communicating with my pretesters, I later made a handful of
7253 mailing lists located at iro.umontreal.ca and administrated by
7254 @code{majordomo}.  These lists have been @emph{very} dependable
7255 so far@dots{}
7256
7257 I suspect that the German team will organize itself a mailing list
7258 located in Germany, and so forth for other countries.  But before they
7259 organize for true, it could surely be useful to offer mailing lists
7260 located at the FSF to each national team.  So yes, please explain me
7261 how I should proceed to create and handle them.
7262
7263 We should create temporary mailing lists, one per country, to help
7264 people organize.  Temporary, because once regrouped and structured, it
7265 would be fair the volunteers from country bring back @emph{their} list
7266 in there and manage it as they want.  My feeling is that, in the long
7267 run, each team should run its own list, from within their country.
7268 There also should be some central list to which all teams could
7269 subscribe as they see fit, as long as each team is represented in it.
7270
7271 @node Information Flow, Translating plural forms, Organization, Translators
7272 @section Information Flow
7273
7274 @strong{ NOTE: } This documentation section is outdated and needs to be
7275 revised.
7276
7277 There will surely be some discussion about this messages after the
7278 packages are finally released.  If people now send you some proposals
7279 for better messages, how do you proceed?  Jim, please note that
7280 right now, as I put forward nearly a dozen of localizable programs, I
7281 receive both the translations and the coordination concerns about them.
7282
7283 If I put one of my things to pretest, Ulrich receives the announcement
7284 and passes it on to the German team, who make last minute revisions.
7285 Then he submits the translation files to me @emph{as the maintainer}.
7286 For free packages I do not maintain, I would not even hear about it.
7287 This scheme could be made to work for the whole Translation Project,
7288 I think.  For security reasons, maybe Ulrich (national coordinators,
7289 in fact) should update central registry kept at the Translation Project
7290 (Jim, me, or Len's recruits) once in a while.
7291
7292 In December/January, I was aggressively ready to internationalize
7293 all of GNU, giving myself the duty of one small GNU package per week
7294 or so, taking many weeks or months for bigger packages.  But it does
7295 not work this way.  I first did all the things I'm responsible for.
7296 I've nothing against some missionary work on other maintainers, but
7297 I'm also losing a lot of energy over it---same debates over again.
7298
7299 And when the first localized packages are released we'll get a lot of
7300 responses about ugly translations :-).  Surely, and we need to have
7301 beforehand a fairly good idea about how to handle the information
7302 flow between the national teams and the package maintainers.
7303
7304 Please start saving somewhere a quick history of each PO file.  I know
7305 for sure that the file format will change, allowing for comments.
7306 It would be nice that each file has a kind of log, and references for
7307 those who want to submit comments or gripes, or otherwise contribute.
7308 I sent a proposal for a fast and flexible format, but it is not
7309 receiving acceptance yet by the GNU deciders.  I'll tell you when I
7310 have more information about this.
7311
7312 @node Translating plural forms, Prioritizing messages, Information Flow, Translators
7313 @section Translating plural forms
7314
7315 @cindex plural forms, translating
7316 Suppose you are translating a PO file, and it contains an entry like this:
7317
7318 @smallexample
7319 #, c-format
7320 msgid "One file removed"
7321 msgid_plural "%d files removed"
7322 msgstr[0] ""
7323 msgstr[1] ""
7324 @end smallexample
7325
7326 @noindent
7327 What does this mean? How do you fill it in?
7328
7329 Such an entry denotes a message with plural forms, that is, a message where
7330 the text depends on a cardinal number.  The general form of the message,
7331 in English, is the @code{msgid_plural} line.  The @code{msgid} line is the
7332 English singular form, that is, the form for when the number is equal to 1.
7333 More details about plural forms are explained in @ref{Plural forms}.
7334
7335 The first thing you need to look at is the @code{Plural-Forms} line in the
7336 header entry of the PO file.  It contains the number of plural forms and a
7337 formula.  If the PO file does not yet have such a line, you have to add it.
7338 It only depends on the language into which you are translating.  You can
7339 get this info by using the @code{msginit} command (see @ref{Creating}) --
7340 it contains a database of known plural formulas -- or by asking other
7341 members of your translation team.
7342
7343 Suppose the line looks as follows:
7344
7345 @smallexample
7346 "Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n%10>=2 && n"
7347 "%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
7348 @end smallexample
7349
7350 It's logically one line; recall that the PO file formatting is allowed to
7351 break long lines so that each physical line fits in 80 monospaced columns.
7352
7353 The value of @code{nplurals} here tells you that there are three plural
7354 forms.  The first thing you need to do is to ensure that the entry contains
7355 an @code{msgstr} line for each of the forms:
7356
7357 @smallexample
7358 #, c-format
7359 msgid "One file removed"
7360 msgid_plural "%d files removed"
7361 msgstr[0] ""
7362 msgstr[1] ""
7363 msgstr[2] ""
7364 @end smallexample
7365
7366 Then translate the @code{msgid_plural} line and fill it in into each
7367 @code{msgstr} line:
7368
7369 @smallexample
7370 #, c-format
7371 msgid "One file removed"
7372 msgid_plural "%d files removed"
7373 msgstr[0] "%d slika uklonjenih"
7374 msgstr[1] "%d slika uklonjenih"
7375 msgstr[2] "%d slika uklonjenih"
7376 @end smallexample
7377
7378 Now you can refine the translation so that it matches the plural form.
7379 According to the formula above, @code{msgstr[0]} is used when the number
7380 ends in 1 but does not end in 11; @code{msgstr[1]} is used when the number
7381 ends in 2, 3, 4, but not in 12, 13, 14; and @code{msgstr[2]} is used in
7382 all other cases.  With this knowledge, you can refine the translations:
7383
7384 @smallexample
7385 #, c-format
7386 msgid "One file removed"
7387 msgid_plural "%d files removed"
7388 msgstr[0] "%d slika je uklonjena"
7389 msgstr[1] "%d datoteke uklonjenih"
7390 msgstr[2] "%d slika uklonjenih"
7391 @end smallexample
7392
7393 You noticed that in the English singular form (@code{msgid}) the number
7394 placeholder could be omitted and replaced by the numeral word ``one''.
7395 Can you do this in your translation as well?
7396
7397 @smallexample
7398 msgstr[0] "jednom datotekom je uklonjen"
7399 @end smallexample
7400
7401 @noindent
7402 Well, it depends on whether @code{msgstr[0]} applies only to the number 1,
7403 or to other numbers as well.  If, according to the plural formula,
7404 @code{msgstr[0]} applies only to @code{n == 1}, then you can use the
7405 specialized translation without the number placeholder.  In our case,
7406 however, @code{msgstr[0]} also applies to the numbers 21, 31, 41, etc.,
7407 and therefore you cannot omit the placeholder.
7408
7409 @node Prioritizing messages,  , Translating plural forms, Translators
7410 @section Prioritizing messages: How to determine which messages to translate first
7411
7412 A translator sometimes has only a limited amount of time per week to
7413 spend on a package, and some packages have quite large message catalogs
7414 (over 1000 messages).  Therefore she wishes to translate the messages
7415 first that are the most visible to the user, or that occur most frequently.
7416 This section describes how to determine these "most urgent" messages.
7417 It also applies to determine the "next most urgent" messages after the
7418 message catalog has already been partially translated.
7419
7420 In a first step, she uses the programs like a user would do.  While she
7421 does this, the GNU @code{gettext} library logs into a file the not yet
7422 translated messages for which a translation was requested from the program.
7423
7424 In a second step, she uses the PO mode to translate precisely this set
7425 of messages.
7426
7427 @vindex GETTEXT_LOG_UNTRANSLATED@r{, environment variable}
7428 Here a more details.  The GNU @code{libintl} library (but not the
7429 corresponding functions in GNU @code{libc}) supports an environment variable
7430 @code{GETTEXT_LOG_UNTRANSLATED}.  The GNU @code{libintl} library will
7431 log into this file the messages for which @code{gettext()} and related
7432 functions couldn't find the translation.  If the file doesn't exist, it
7433 will be created as needed.  On systems with GNU @code{libc} a shared library
7434 @samp{preloadable_libintl.so} is provided that can be used with the ELF
7435 @samp{LD_PRELOAD} mechanism.
7436
7437 So, in the first step, the translator uses these commands on systems with
7438 GNU @code{libc}:
7439
7440 @smallexample
7441 $ LD_PRELOAD=/usr/local/lib/preloadable_libintl.so
7442 $ export LD_PRELOAD
7443 $ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused
7444 $ export GETTEXT_LOG_UNTRANSLATED
7445 @end smallexample
7446
7447 @noindent
7448 and these commands on other systems:
7449
7450 @smallexample
7451 $ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused
7452 $ export GETTEXT_LOG_UNTRANSLATED
7453 @end smallexample
7454
7455 Then she uses and peruses the programs.  (It is a good and recommended
7456 practice to use the programs for which you provide translations: it
7457 gives you the needed context.)  When done, she removes the environment
7458 variables:
7459
7460 @smallexample
7461 $ unset LD_PRELOAD
7462 $ unset GETTEXT_LOG_UNTRANSLATED
7463 @end smallexample
7464
7465 The second step starts with removing duplicates:
7466
7467 @smallexample
7468 $ msguniq $HOME/gettextlogused > missing.po
7469 @end smallexample
7470
7471 The result is a PO file, but needs some preprocessing before a PO file editor
7472 can be used with it.  First, it is a multi-domain PO file, containing
7473 messages from many translation domains.  Second, it lacks all translator
7474 comments and source references.  Here is how to get a list of the affected
7475 translation domains:
7476
7477 @smallexample
7478 $ sed -n -e 's,^domain "\(.*\)"$,\1,p' < missing.po | sort | uniq
7479 @end smallexample
7480
7481 Then the translator can handle the domains one by one.  For simplicity,
7482 let's use environment variables to denote the language, domain and source
7483 package.
7484
7485 @smallexample
7486 $ lang=nl             # your language
7487 $ domain=coreutils    # the name of the domain to be handled
7488 $ package=/usr/src/gnu/coreutils-4.5.4   # the package where it comes from
7489 @end smallexample
7490
7491 She takes the latest copy of @file{$lang.po} from the Translation Project,
7492 or from the package (in most cases, @file{$package/po/$lang.po}), or
7493 creates a fresh one if she's the first translator (see @ref{Creating}).
7494 She then uses the following commands to mark the not urgent messages as
7495 "obsolete".  (This doesn't mean that these messages - translated and
7496 untranslated ones - will go away.  It simply means that the PO file editor
7497 will ignore them in the following editing session.)
7498
7499 @smallexample
7500 $ msggrep --domain=$domain missing.po | grep -v '^domain' \
7501   > $domain-missing.po
7502 $ msgattrib --set-obsolete --ignore-file $domain-missing.po $domain.$lang.po \
7503   > $domain.$lang-urgent.po
7504 @end smallexample
7505
7506 The she translates @file{$domain.$lang-urgent.po} by use of a PO file editor
7507 (@pxref{Editing}).
7508 (FIXME: I don't know whether @code{KBabel} and @code{gtranslator} also
7509 preserve obsolete messages, as they should.)
7510 Finally she restores the not urgent messages (with their earlier
7511 translations, for those which were already translated) through this command:
7512
7513 @smallexample
7514 $ msgmerge --no-fuzzy-matching $domain.$lang-urgent.po $package/po/$domain.pot \
7515   > $domain.$lang.po
7516 @end smallexample
7517
7518 Then she can submit @file{$domain.$lang.po} and proceed to the next domain.
7519
7520 @node Maintainers, Installers, Translators, Top
7521 @chapter The Maintainer's View
7522 @cindex package maintainer's view of @code{gettext}
7523
7524 The maintainer of a package has many responsibilities.  One of them
7525 is ensuring that the package will install easily on many platforms,
7526 and that the magic we described earlier (@pxref{Users}) will work
7527 for installers and end users.
7528
7529 Of course, there are many possible ways by which GNU @code{gettext}
7530 might be integrated in a distribution, and this chapter does not cover
7531 them in all generality.  Instead, it details one possible approach which
7532 is especially adequate for many free software distributions following GNU
7533 standards, or even better, Gnits standards, because GNU @code{gettext}
7534 is purposely for helping the internationalization of the whole GNU
7535 project, and as many other good free packages as possible.  So, the
7536 maintainer's view presented here presumes that the package already has
7537 a @file{configure.ac} file and uses GNU Autoconf.
7538
7539 Nevertheless, GNU @code{gettext} may surely be useful for free packages
7540 not following GNU standards and conventions, but the maintainers of such
7541 packages might have to show imagination and initiative in organizing
7542 their distributions so @code{gettext} work for them in all situations.
7543 There are surely many, out there.
7544
7545 Even if @code{gettext} methods are now stabilizing, slight adjustments
7546 might be needed between successive @code{gettext} versions, so you
7547 should ideally revise this chapter in subsequent releases, looking
7548 for changes.
7549
7550 @menu
7551 * Flat and Non-Flat::           Flat or Non-Flat Directory Structures
7552 * Prerequisites::               Prerequisite Works
7553 * gettextize Invocation::       Invoking the @code{gettextize} Program
7554 * Adjusting Files::             Files You Must Create or Alter
7555 * autoconf macros::             Autoconf macros for use in @file{configure.ac}
7556 * CVS Issues::                  Integrating with CVS
7557 * Release Management::          Creating a Distribution Tarball
7558 @end menu
7559
7560 @node Flat and Non-Flat, Prerequisites, Maintainers, Maintainers
7561 @section Flat or Non-Flat Directory Structures
7562
7563 Some free software packages are distributed as @code{tar} files which unpack
7564 in a single directory, these are said to be @dfn{flat} distributions.
7565 Other free software packages have a one level hierarchy of subdirectories, using
7566 for example a subdirectory named @file{doc/} for the Texinfo manual and
7567 man pages, another called @file{lib/} for holding functions meant to
7568 replace or complement C libraries, and a subdirectory @file{src/} for
7569 holding the proper sources for the package.  These other distributions
7570 are said to be @dfn{non-flat}.
7571
7572 We cannot say much about flat distributions.  A flat
7573 directory structure has the disadvantage of increasing the difficulty
7574 of updating to a new version of GNU @code{gettext}.  Also, if you have
7575 many PO files, this could somewhat pollute your single directory.
7576 Also, GNU @code{gettext}'s libintl sources consist of C sources, shell
7577 scripts, @code{sed} scripts and complicated Makefile rules, which don't
7578 fit well into an existing flat structure.  For these reasons, we
7579 recommend to use non-flat approach in this case as well.
7580
7581 Maybe because GNU @code{gettext} itself has a non-flat structure,
7582 we have more experience with this approach, and this is what will be
7583 described in the remaining of this chapter.  Some maintainers might
7584 use this as an opportunity to unflatten their package structure.
7585
7586 @node Prerequisites, gettextize Invocation, Flat and Non-Flat, Maintainers
7587 @section Prerequisite Works
7588 @cindex converting a package to use @code{gettext}
7589 @cindex migration from earlier versions of @code{gettext}
7590 @cindex upgrading to new versions of @code{gettext}
7591
7592 There are some works which are required for using GNU @code{gettext}
7593 in one of your package.  These works have some kind of generality
7594 that escape the point by point descriptions used in the remainder
7595 of this chapter.  So, we describe them here.
7596
7597 @itemize @bullet
7598 @item
7599 Before attempting to use @code{gettextize} you should install some
7600 other packages first.
7601 Ensure that recent versions of GNU @code{m4}, GNU Autoconf and GNU
7602 @code{gettext} are already installed at your site, and if not, proceed
7603 to do this first.  If you get to install these things, beware that
7604 GNU @code{m4} must be fully installed before GNU Autoconf is even
7605 @emph{configured}.
7606
7607 To further ease the task of a package maintainer the @code{automake}
7608 package was designed and implemented.  GNU @code{gettext} now uses this
7609 tool and the @file{Makefile}s in the @file{intl/} and @file{po/}
7610 therefore know about all the goals necessary for using @code{automake}
7611 and @file{libintl} in one project.
7612
7613 Those four packages are only needed by you, as a maintainer; the
7614 installers of your own package and end users do not really need any of
7615 GNU @code{m4}, GNU Autoconf, GNU @code{gettext}, or GNU @code{automake}
7616 for successfully installing and running your package, with messages
7617 properly translated.  But this is not completely true if you provide
7618 internationalized shell scripts within your own package: GNU
7619 @code{gettext} shall then be installed at the user site if the end users
7620 want to see the translation of shell script messages.
7621
7622 @item
7623 Your package should use Autoconf and have a @file{configure.ac} or
7624 @file{configure.in} file.
7625 If it does not, you have to learn how.  The Autoconf documentation
7626 is quite well written, it is a good idea that you print it and get
7627 familiar with it.
7628
7629 @item
7630 Your C sources should have already been modified according to
7631 instructions given earlier in this manual.  @xref{Sources}.
7632
7633 @item
7634 Your @file{po/} directory should receive all PO files submitted to you
7635 by the translator teams, each having @file{@var{ll}.po} as a name.
7636 This is not usually easy to get translation
7637 work done before your package gets internationalized and available!
7638 Since the cycle has to start somewhere, the easiest for the maintainer
7639 is to start with absolutely no PO files, and wait until various
7640 translator teams get interested in your package, and submit PO files.
7641
7642 @end itemize
7643
7644 It is worth adding here a few words about how the maintainer should
7645 ideally behave with PO files submissions.  As a maintainer, your role is
7646 to authenticate the origin of the submission as being the representative
7647 of the appropriate translating teams of the Translation Project (forward
7648 the submission to @file{coordinator@@translationproject.org} in case of doubt),
7649 to ensure that the PO file format is not severely broken and does not
7650 prevent successful installation, and for the rest, to merely put these
7651 PO files in @file{po/} for distribution.
7652
7653 As a maintainer, you do not have to take on your shoulders the
7654 responsibility of checking if the translations are adequate or
7655 complete, and should avoid diving into linguistic matters.  Translation
7656 teams drive themselves and are fully responsible of their linguistic
7657 choices for the Translation Project.  Keep in mind that translator teams are @emph{not}
7658 driven by maintainers.  You can help by carefully redirecting all
7659 communications and reports from users about linguistic matters to the
7660 appropriate translation team, or explain users how to reach or join
7661 their team.  The simplest might be to send them the @file{ABOUT-NLS} file.
7662
7663 Maintainers should @emph{never ever} apply PO file bug reports
7664 themselves, short-cutting translation teams.  If some translator has
7665 difficulty to get some of her points through her team, it should not be
7666 an option for her to directly negotiate translations with maintainers.
7667 Teams ought to settle their problems themselves, if any.  If you, as
7668 a maintainer, ever think there is a real problem with a team, please
7669 never try to @emph{solve} a team's problem on your own.
7670
7671 @node gettextize Invocation, Adjusting Files, Prerequisites, Maintainers
7672 @section Invoking the @code{gettextize} Program
7673
7674 @include gettextize.texi
7675
7676 @node Adjusting Files, autoconf macros, gettextize Invocation, Maintainers
7677 @section Files You Must Create or Alter
7678 @cindex @code{gettext} files
7679
7680 Besides files which are automatically added through @code{gettextize},
7681 there are many files needing revision for properly interacting with
7682 GNU @code{gettext}.  If you are closely following GNU standards for
7683 Makefile engineering and auto-configuration, the adaptations should
7684 be easier to achieve.  Here is a point by point description of the
7685 changes needed in each.
7686
7687 So, here comes a list of files, each one followed by a description of
7688 all alterations it needs.  Many examples are taken out from the GNU
7689 @code{gettext} @value{VERSION} distribution itself, or from the GNU
7690 @code{hello} distribution (@uref{http://www.franken.de/users/gnu/ke/hello}
7691 or @uref{http://www.gnu.franken.de/ke/hello/})  You may indeed
7692 refer to the source code of the GNU @code{gettext} and GNU @code{hello}
7693 packages, as they are intended to be good examples for using GNU
7694 gettext functionality.
7695
7696 @menu
7697 * po/POTFILES.in::              @file{POTFILES.in} in @file{po/}
7698 * po/LINGUAS::                  @file{LINGUAS} in @file{po/}
7699 * po/Makevars::                 @file{Makevars} in @file{po/}
7700 * po/Rules-*::                  Extending @file{Makefile} in @file{po/}
7701 * configure.ac::                @file{configure.ac} at top level
7702 * config.guess::                @file{config.guess}, @file{config.sub} at top level
7703 * mkinstalldirs::               @file{mkinstalldirs} at top level
7704 * aclocal::                     @file{aclocal.m4} at top level
7705 * acconfig::                    @file{acconfig.h} at top level
7706 * config.h.in::                 @file{config.h.in} at top level
7707 * Makefile::                    @file{Makefile.in} at top level
7708 * src/Makefile::                @file{Makefile.in} in @file{src/}
7709 * lib/gettext.h::               @file{gettext.h} in @file{lib/}
7710 @end menu
7711
7712 @node po/POTFILES.in, po/LINGUAS, Adjusting Files, Adjusting Files
7713 @subsection @file{POTFILES.in} in @file{po/}
7714 @cindex @file{POTFILES.in} file
7715
7716 The @file{po/} directory should receive a file named
7717 @file{POTFILES.in}.  This file tells which files, among all program
7718 sources, have marked strings needing translation.  Here is an example
7719 of such a file:
7720
7721 @example
7722 @group
7723 # List of source files containing translatable strings.
7724 # Copyright (C) 1995 Free Software Foundation, Inc.
7725
7726 # Common library files
7727 lib/error.c
7728 lib/getopt.c
7729 lib/xmalloc.c
7730
7731 # Package source files
7732 src/gettext.c
7733 src/msgfmt.c
7734 src/xgettext.c
7735 @end group
7736 @end example
7737
7738 @noindent
7739 Hash-marked comments and white lines are ignored.  All other lines
7740 list those source files containing strings marked for translation
7741 (@pxref{Mark Keywords}), in a notation relative to the top level
7742 of your whole distribution, rather than the location of the
7743 @file{POTFILES.in} file itself.
7744
7745 When a C file is automatically generated by a tool, like @code{flex} or
7746 @code{bison}, that doesn't introduce translatable strings by itself,
7747 it is recommended to list in @file{po/POTFILES.in} the real source file
7748 (ending in @file{.l} in the case of @code{flex}, or in @file{.y} in the
7749 case of @code{bison}), not the generated C file.
7750
7751 @node po/LINGUAS, po/Makevars, po/POTFILES.in, Adjusting Files
7752 @subsection @file{LINGUAS} in @file{po/}
7753 @cindex @file{LINGUAS} file
7754
7755 The @file{po/} directory should also receive a file named
7756 @file{LINGUAS}.  This file contains the list of available translations.
7757 It is a whitespace separated list.  Hash-marked comments and white lines
7758 are ignored.  Here is an example file:
7759
7760 @example
7761 @group
7762 # Set of available languages.
7763 de fr
7764 @end group
7765 @end example
7766
7767 @noindent
7768 This example means that German and French PO files are available, so
7769 that these languages are currently supported by your package.  If you
7770 want to further restrict, at installation time, the set of installed
7771 languages, this should not be done by modifying the @file{LINGUAS} file,
7772 but rather by using the @code{LINGUAS} environment variable
7773 (@pxref{Installers}).
7774
7775 It is recommended that you add the "languages" @samp{en@@quot} and
7776 @samp{en@@boldquot} to the @code{LINGUAS} file.  @code{en@@quot} is a
7777 variant of English message catalogs (@code{en}) which uses real quotation
7778 marks instead of the ugly looking asymmetric ASCII substitutes @samp{`}
7779 and @samp{'}.  @code{en@@boldquot} is a variant of @code{en@@quot} that
7780 additionally outputs quoted pieces of text in a bold font, when used in
7781 a terminal emulator which supports the VT100 escape sequences (such as
7782 @code{xterm} or the Linux console, but not Emacs in @kbd{M-x shell} mode).
7783
7784 These extra message catalogs @samp{en@@quot} and @samp{en@@boldquot}
7785 are constructed automatically, not by translators; to support them, you
7786 need the files @file{Rules-quot}, @file{quot.sed}, @file{boldquot.sed},
7787 @file{en@@quot.header}, @file{en@@boldquot.header}, @file{insert-header.sin}
7788 in the @file{po/} directory.  You can copy them from GNU gettext's @file{po/}
7789 directory; they are also installed by running @code{gettextize}.
7790
7791 @node po/Makevars, po/Rules-*, po/LINGUAS, Adjusting Files
7792 @subsection @file{Makevars} in @file{po/}
7793 @cindex @file{Makevars} file
7794
7795 The @file{po/} directory also has a file named @file{Makevars}.  It
7796 contains variables that are specific to your project.  @file{po/Makevars}
7797 gets inserted into the @file{po/Makefile} when the latter is created.
7798 The variables thus take effect when the POT file is created or updated,
7799 and when the message catalogs get installed.
7800
7801 The first three variables can be left unmodified if your package has a
7802 single message domain and, accordingly, a single @file{po/} directory.
7803 Only packages which have multiple @file{po/} directories at different
7804 locations need to adjust the three first variables defined in
7805 @file{Makevars}.
7806
7807 As an alternative to the @code{XGETTEXT_OPTIONS} variables, it is also
7808 possible to specify @code{xgettext} options through the
7809 @code{AM_XGETTEXT_OPTION} autoconf macro.  See @ref{AM_XGETTEXT_OPTION}.
7810
7811 @node po/Rules-*, configure.ac, po/Makevars, Adjusting Files
7812 @subsection Extending @file{Makefile} in @file{po/}
7813 @cindex @file{Makefile.in.in} extensions
7814
7815 All files called @file{Rules-*} in the @file{po/} directory get appended to
7816 the @file{po/Makefile} when it is created.  They present an opportunity to
7817 add rules for special PO files to the Makefile, without needing to mess
7818 with @file{po/Makefile.in.in}.
7819
7820 @cindex quotation marks
7821 @vindex LANGUAGE@r{, environment variable}
7822 GNU gettext comes with a @file{Rules-quot} file, containing rules for
7823 building catalogs @file{en@@quot.po} and @file{en@@boldquot.po}.  The
7824 effect of @file{en@@quot.po} is that people who set their @code{LANGUAGE}
7825 environment variable to @samp{en@@quot} will get messages with proper
7826 looking symmetric Unicode quotation marks instead of abusing the ASCII
7827 grave accent and the ASCII apostrophe for indicating quotations.  To
7828 enable this catalog, simply add @code{en@@quot} to the @file{po/LINGUAS}
7829 file.  The effect of @file{en@@boldquot.po} is that people who set
7830 @code{LANGUAGE} to @samp{en@@boldquot} will get not only proper quotation
7831 marks, but also the quoted text will be shown in a bold font on terminals
7832 and consoles.  This catalog is useful only for command-line programs, not
7833 GUI programs.  To enable it, similarly add @code{en@@boldquot} to the
7834 @file{po/LINGUAS} file.
7835
7836 Similarly, you can create rules for building message catalogs for the
7837 @file{sr@@latin} locale -- Serbian written with the Latin alphabet --
7838 from those for the @file{sr} locale -- Serbian written with Cyrillic
7839 letters.  See @ref{msgfilter Invocation}.
7840
7841 @node configure.ac, config.guess, po/Rules-*, Adjusting Files
7842 @subsection @file{configure.ac} at top level
7843
7844 @file{configure.ac} or @file{configure.in} - this is the source from which
7845 @code{autoconf} generates the @file{configure} script.
7846
7847 @enumerate
7848 @item Declare the package and version.
7849 @cindex package and version declaration in @file{configure.ac}
7850
7851 This is done by a set of lines like these:
7852
7853 @example
7854 PACKAGE=gettext
7855 VERSION=@value{VERSION}
7856 AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE")
7857 AC_DEFINE_UNQUOTED(VERSION, "$VERSION")
7858 AC_SUBST(PACKAGE)
7859 AC_SUBST(VERSION)
7860 @end example
7861
7862 @noindent
7863 or, if you are using GNU @code{automake}, by a line like this:
7864
7865 @example
7866 AM_INIT_AUTOMAKE(gettext, @value{VERSION})
7867 @end example
7868
7869 @noindent
7870 Of course, you replace @samp{gettext} with the name of your package,
7871 and @samp{@value{VERSION}} by its version numbers, exactly as they
7872 should appear in the packaged @code{tar} file name of your distribution
7873 (@file{gettext-@value{VERSION}.tar.gz}, here).
7874
7875 @item Check for internationalization support.
7876
7877 Here is the main @code{m4} macro for triggering internationalization
7878 support.  Just add this line to @file{configure.ac}:
7879
7880 @example
7881 AM_GNU_GETTEXT
7882 @end example
7883
7884 @noindent
7885 This call is purposely simple, even if it generates a lot of configure
7886 time checking and actions.
7887
7888 If you have suppressed the @file{intl/} subdirectory by calling
7889 @code{gettextize} without @samp{--intl} option, this call should read
7890
7891 @example
7892 AM_GNU_GETTEXT([external])
7893 @end example
7894
7895 @item Have output files created.
7896
7897 The @code{AC_OUTPUT} directive, at the end of your @file{configure.ac}
7898 file, needs to be modified in two ways:
7899
7900 @example
7901 AC_OUTPUT([@var{existing configuration files} intl/Makefile po/Makefile.in],
7902 [@var{existing additional actions}])
7903 @end example
7904
7905 The modification to the first argument to @code{AC_OUTPUT} asks
7906 for substitution in the @file{intl/} and @file{po/} directories.
7907 Note the @samp{.in} suffix used for @file{po/} only.  This is because
7908 the distributed file is really @file{po/Makefile.in.in}.
7909
7910 If you have suppressed the @file{intl/} subdirectory by calling
7911 @code{gettextize} without @samp{--intl} option, then you don't need to
7912 add @code{intl/Makefile} to the @code{AC_OUTPUT} line.
7913
7914 @end enumerate
7915
7916 If, after doing the recommended modifications, a command like
7917 @samp{aclocal -I m4} or @samp{autoconf} or @samp{autoreconf} fails with
7918 a trace similar to this:
7919
7920 @smallexample
7921 configure.ac:44: warning: AC_COMPILE_IFELSE was called before AC_GNU_SOURCE
7922 ../../lib/autoconf/specific.m4:335: AC_GNU_SOURCE is expanded from...
7923 m4/lock.m4:224: gl_LOCK is expanded from...
7924 m4/gettext.m4:571: gt_INTL_SUBDIR_CORE is expanded from...
7925 m4/gettext.m4:472: AM_INTL_SUBDIR is expanded from...
7926 m4/gettext.m4:347: AM_GNU_GETTEXT is expanded from...
7927 configure.ac:44: the top level
7928 configure.ac:44: warning: AC_RUN_IFELSE was called before AC_GNU_SOURCE
7929 @end smallexample
7930
7931 @noindent
7932 you need to add an explicit invocation of @samp{AC_GNU_SOURCE} in the
7933 @file{configure.ac} file - after @samp{AC_PROG_CC} but before
7934 @samp{AM_GNU_GETTEXT}, most likely very close to the @samp{AC_PROG_CC}
7935 invocation.  This is necessary because of ordering restrictions imposed
7936 by GNU autoconf.
7937
7938 @node config.guess, mkinstalldirs, configure.ac, Adjusting Files
7939 @subsection @file{config.guess}, @file{config.sub} at top level
7940
7941 If you haven't suppressed the @file{intl/} subdirectory,
7942 you need to add the GNU @file{config.guess} and @file{config.sub} files
7943 to your distribution.  They are needed because the @file{intl/} directory
7944 has platform dependent support for determining the locale's character
7945 encoding and therefore needs to identify the platform.
7946
7947 You can obtain the newest version of @file{config.guess} and
7948 @file{config.sub} from the @samp{config} project at
7949 @file{http://savannah.gnu.org/}. The commands to fetch them are
7950 @smallexample
7951 $ wget -O config.guess 'http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD'
7952 $ wget -O config.sub 'http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD'
7953 @end smallexample
7954 @noindent
7955 Less recent versions are also contained in the GNU @code{automake} and
7956 GNU @code{libtool} packages.
7957
7958 Normally, @file{config.guess} and @file{config.sub} are put at the
7959 top level of a distribution.  But it is also possible to put them in a
7960 subdirectory, altogether with other configuration support files like
7961 @file{install-sh}, @file{ltconfig}, @file{ltmain.sh} or @file{missing}.
7962 All you need to do, other than moving the files, is to add the following line
7963 to your @file{configure.ac}.
7964
7965 @example
7966 AC_CONFIG_AUX_DIR([@var{subdir}])
7967 @end example
7968
7969 @node mkinstalldirs, aclocal, config.guess, Adjusting Files
7970 @subsection @file{mkinstalldirs} at top level
7971 @cindex @file{mkinstalldirs} file
7972
7973 With earlier versions of GNU gettext, you needed to add the GNU
7974 @file{mkinstalldirs} script to your distribution.  This is not needed any
7975 more.  You can remove it if you not also using an automake version older than
7976 automake 1.9.
7977
7978 @node aclocal, acconfig, mkinstalldirs, Adjusting Files
7979 @subsection @file{aclocal.m4} at top level
7980 @cindex @file{aclocal.m4} file
7981
7982 If you do not have an @file{aclocal.m4} file in your distribution,
7983 the simplest is to concatenate the files @file{codeset.m4}, @file{fcntl-o.m4},
7984 @file{gettext.m4}, @file{glibc2.m4}, @file{glibc21.m4}, @file{iconv.m4},
7985 @file{intdiv0.m4}, @file{intl.m4}, @file{intldir.m4}, @file{intlmacosx.m4},
7986 @file{intmax.m4}, @file{inttypes_h.m4}, @file{inttypes-pri.m4},
7987 @file{lcmessage.m4}, @file{lib-ld.m4}, @file{lib-link.m4},
7988 @file{lib-prefix.m4}, @file{lock.m4}, @file{longlong.m4}, @file{nls.m4},
7989 @file{po.m4}, @file{printf-posix.m4}, @file{progtest.m4}, @file{size_max.m4},
7990 @file{stdint_h.m4}, @file{threadlib.m4}, @file{uintmax_t.m4},
7991 @file{visibility.m4}, @file{wchar_t.m4}, @file{wint_t.m4}, @file{xsize.m4}
7992 from GNU @code{gettext}'s
7993 @file{m4/} directory into a single file.  If you have suppressed the
7994 @file{intl/} directory, only @file{gettext.m4}, @file{iconv.m4},
7995 @file{lib-ld.m4}, @file{lib-link.m4}, @file{lib-prefix.m4},
7996 @file{nls.m4}, @file{po.m4}, @file{progtest.m4} need to be concatenated.
7997
7998 If you are not using GNU @code{automake} 1.8 or newer, you will need to
7999 add a file @file{mkdirp.m4} from a newer automake distribution to the
8000 list of files above.
8001
8002 If you already have an @file{aclocal.m4} file, then you will have
8003 to merge the said macro files into your @file{aclocal.m4}.  Note that if
8004 you are upgrading from a previous release of GNU @code{gettext}, you
8005 should most probably @emph{replace} the macros (@code{AM_GNU_GETTEXT},
8006 etc.), as they usually
8007 change a little from one release of GNU @code{gettext} to the next.
8008 Their contents may vary as we get more experience with strange systems
8009 out there.
8010
8011 If you are using GNU @code{automake} 1.5 or newer, it is enough to put
8012 these macro files into a subdirectory named @file{m4/} and add the line
8013
8014 @example
8015 ACLOCAL_AMFLAGS = -I m4
8016 @end example
8017
8018 @noindent
8019 to your top level @file{Makefile.am}.
8020
8021 If you are using GNU @code{automake} 1.10 or newer, it is even easier:
8022 Add the line
8023
8024 @example
8025 ACLOCAL_AMFLAGS = --install -I m4
8026 @end example
8027
8028 @noindent
8029 to your top level @file{Makefile.am}, and run @samp{aclocal --install -I m4}.
8030 This will copy the needed files to the @file{m4/} subdirectory automatically,
8031 before updating @file{aclocal.m4}.
8032
8033 These macros check for the internationalization support functions
8034 and related informations.  Hopefully, once stabilized, these macros
8035 might be integrated in the standard Autoconf set, because this
8036 piece of @code{m4} code will be the same for all projects using GNU
8037 @code{gettext}.
8038
8039 @node acconfig, config.h.in, aclocal, Adjusting Files
8040 @subsection @file{acconfig.h} at top level
8041 @cindex @file{acconfig.h} file
8042
8043 Earlier GNU @code{gettext} releases required to put definitions for
8044 @code{ENABLE_NLS}, @code{HAVE_GETTEXT} and @code{HAVE_LC_MESSAGES},
8045 @code{HAVE_STPCPY}, @code{PACKAGE} and @code{VERSION} into an
8046 @file{acconfig.h} file.  This is not needed any more; you can remove
8047 them from your @file{acconfig.h} file unless your package uses them
8048 independently from the @file{intl/} directory.
8049
8050 @node config.h.in, Makefile, acconfig, Adjusting Files
8051 @subsection @file{config.h.in} at top level
8052 @cindex @file{config.h.in} file
8053
8054 The include file template that holds the C macros to be defined by
8055 @code{configure} is usually called @file{config.h.in} and may be
8056 maintained either manually or automatically.
8057
8058 If @code{gettextize} has created an @file{intl/} directory, this file
8059 must be called @file{config.h.in} and must be at the top level.  If,
8060 however, you have suppressed the @file{intl/} directory by calling
8061 @code{gettextize} without @samp{--intl} option, then you can choose the
8062 name of this file and its location freely.
8063
8064 If it is maintained automatically, by use of the @samp{autoheader}
8065 program, you need to do nothing about it.  This is the case in particular
8066 if you are using GNU @code{automake}.
8067
8068 If it is maintained manually, and if @code{gettextize} has created an
8069 @file{intl/} directory, you should switch to using @samp{autoheader}.
8070 The list of C macros to be added for the sake of the @file{intl/}
8071 directory is just too long to be maintained manually; it also changes
8072 between different versions of GNU @code{gettext}.
8073
8074 If it is maintained manually, and if on the other hand you have
8075 suppressed the @file{intl/} directory by calling @code{gettextize}
8076 without @samp{--intl} option, then you can get away by adding the
8077 following lines to @file{config.h.in}:
8078
8079 @example
8080 /* Define to 1 if translation of program messages to the user's
8081    native language is requested. */
8082 #undef ENABLE_NLS
8083 @end example
8084
8085 @node Makefile, src/Makefile, config.h.in, Adjusting Files
8086 @subsection @file{Makefile.in} at top level
8087
8088 Here are a few modifications you need to make to your main, top-level
8089 @file{Makefile.in} file.
8090
8091 @enumerate
8092 @item
8093 Add the following lines near the beginning of your @file{Makefile.in},
8094 so the @samp{dist:} goal will work properly (as explained further down):
8095
8096 @example
8097 PACKAGE = @@PACKAGE@@
8098 VERSION = @@VERSION@@
8099 @end example
8100
8101 @item
8102 Add file @file{ABOUT-NLS} to the @code{DISTFILES} definition, so the file gets
8103 distributed.
8104
8105 @item
8106 Wherever you process subdirectories in your @file{Makefile.in}, be sure
8107 you also process the subdirectories @samp{intl} and @samp{po}.  Special
8108 rules in the @file{Makefiles} take care for the case where no
8109 internationalization is wanted.
8110
8111 If you are using Makefiles, either generated by automake, or hand-written
8112 so they carefully follow the GNU coding standards, the effected goals for
8113 which the new subdirectories must be handled include @samp{installdirs},
8114 @samp{install}, @samp{uninstall}, @samp{clean}, @samp{distclean}.
8115
8116 Here is an example of a canonical order of processing.  In this
8117 example, we also define @code{SUBDIRS} in @code{Makefile.in} for it
8118 to be further used in the @samp{dist:} goal.
8119
8120 @example
8121 SUBDIRS = doc intl lib src po
8122 @end example
8123
8124 Note that you must arrange for @samp{make} to descend into the
8125 @code{intl} directory before descending into other directories containing
8126 code which make use of the @code{libintl.h} header file.  For this
8127 reason, here we mention @code{intl} before @code{lib} and @code{src}.
8128
8129 @item
8130 A delicate point is the @samp{dist:} goal, as both
8131 @file{intl/Makefile} and @file{po/Makefile} will later assume that the
8132 proper directory has been set up from the main @file{Makefile}.  Here is
8133 an example at what the @samp{dist:} goal might look like:
8134
8135 @example
8136 distdir = $(PACKAGE)-$(VERSION)
8137 dist: Makefile
8138         rm -fr $(distdir)
8139         mkdir $(distdir)
8140         chmod 777 $(distdir)
8141         for file in $(DISTFILES); do \
8142           ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir); \
8143         done
8144         for subdir in $(SUBDIRS); do \
8145           mkdir $(distdir)/$$subdir || exit 1; \
8146           chmod 777 $(distdir)/$$subdir; \
8147           (cd $$subdir && $(MAKE) $@@) || exit 1; \
8148         done
8149         tar chozf $(distdir).tar.gz $(distdir)
8150         rm -fr $(distdir)
8151 @end example
8152
8153 @end enumerate
8154
8155 Note that if you are using GNU @code{automake}, @file{Makefile.in} is
8156 automatically generated from @file{Makefile.am}, and all needed changes
8157 to @file{Makefile.am} are already made by running @samp{gettextize}.
8158
8159 @node src/Makefile, lib/gettext.h, Makefile, Adjusting Files
8160 @subsection @file{Makefile.in} in @file{src/}
8161
8162 Some of the modifications made in the main @file{Makefile.in} will
8163 also be needed in the @file{Makefile.in} from your package sources,
8164 which we assume here to be in the @file{src/} subdirectory.  Here are
8165 all the modifications needed in @file{src/Makefile.in}:
8166
8167 @enumerate
8168 @item
8169 In view of the @samp{dist:} goal, you should have these lines near the
8170 beginning of @file{src/Makefile.in}:
8171
8172 @example
8173 PACKAGE = @@PACKAGE@@
8174 VERSION = @@VERSION@@
8175 @end example
8176
8177 @item
8178 If not done already, you should guarantee that @code{top_srcdir}
8179 gets defined.  This will serve for @code{cpp} include files.  Just add
8180 the line:
8181
8182 @example
8183 top_srcdir = @@top_srcdir@@
8184 @end example
8185
8186 @item
8187 You might also want to define @code{subdir} as @samp{src}, later
8188 allowing for almost uniform @samp{dist:} goals in all your
8189 @file{Makefile.in}.  At list, the @samp{dist:} goal below assume that
8190 you used:
8191
8192 @example
8193 subdir = src
8194 @end example
8195
8196 @item
8197 The @code{main} function of your program will normally call
8198 @code{bindtextdomain} (see @pxref{Triggering}), like this:
8199
8200 @example
8201 bindtextdomain (@var{PACKAGE}, LOCALEDIR);
8202 textdomain (@var{PACKAGE});
8203 @end example
8204
8205 To make LOCALEDIR known to the program, add the following lines to
8206 @file{Makefile.in} if you are using Autoconf version 2.60 or newer:
8207
8208 @example
8209 datadir = @@datadir@@
8210 datarootdir= @@datarootdir@@
8211 localedir = @@localedir@@
8212 DEFS = -DLOCALEDIR=\"$(localedir)\" @@DEFS@@
8213 @end example
8214
8215 or these lines if your version of Autoconf is older than 2.60:
8216
8217 @example
8218 datadir = @@datadir@@
8219 localedir = $(datadir)/locale
8220 DEFS = -DLOCALEDIR=\"$(localedir)\" @@DEFS@@
8221 @end example
8222
8223 Note that @code{@@datadir@@} defaults to @samp{$(prefix)/share}, thus
8224 @code{$(localedir)} defaults to @samp{$(prefix)/share/locale}.
8225
8226 @item
8227 You should ensure that the final linking will use @code{@@LIBINTL@@} or
8228 @code{@@LTLIBINTL@@} as a library.  @code{@@LIBINTL@@} is for use without
8229 @code{libtool}, @code{@@LTLIBINTL@@} is for use with @code{libtool}.  An
8230 easy way to achieve this is to manage that it gets into @code{LIBS}, like
8231 this:
8232
8233 @example
8234 LIBS = @@LIBINTL@@ @@LIBS@@
8235 @end example
8236
8237 In most packages internationalized with GNU @code{gettext}, one will
8238 find a directory @file{lib/} in which a library containing some helper
8239 functions will be build.  (You need at least the few functions which the
8240 GNU @code{gettext} Library itself needs.)  However some of the functions
8241 in the @file{lib/} also give messages to the user which of course should be
8242 translated, too.  Taking care of this, the support library (say
8243 @file{libsupport.a}) should be placed before @code{@@LIBINTL@@} and
8244 @code{@@LIBS@@} in the above example.  So one has to write this:
8245
8246 @example
8247 LIBS = ../lib/libsupport.a @@LIBINTL@@ @@LIBS@@
8248 @end example
8249
8250 @item
8251 You should also ensure that directory @file{intl/} will be searched for
8252 C preprocessor include files in all circumstances.  So, you have to
8253 manage so both @samp{-I../intl} and @samp{-I$(top_srcdir)/intl} will
8254 be given to the C compiler.
8255
8256 @item
8257 Your @samp{dist:} goal has to conform with others.  Here is a
8258 reasonable definition for it:
8259
8260 @example
8261 distdir = ../$(PACKAGE)-$(VERSION)/$(subdir)
8262 dist: Makefile $(DISTFILES)
8263         for file in $(DISTFILES); do \
8264           ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir) || exit 1; \
8265         done
8266 @end example
8267
8268 @end enumerate
8269
8270 Note that if you are using GNU @code{automake}, @file{Makefile.in} is
8271 automatically generated from @file{Makefile.am}, and the first three
8272 changes and the last change are not necessary.  The remaining needed
8273 @file{Makefile.am} modifications are the following:
8274
8275 @enumerate
8276 @item
8277 To make LOCALEDIR known to the program, add the following to
8278 @file{Makefile.am}:
8279
8280 @example
8281 <module>_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\"
8282 @end example
8283
8284 @noindent
8285 for each specific module or compilation unit, or
8286
8287 @example
8288 AM_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\"
8289 @end example
8290
8291 for all modules and compilation units together.  Furthermore, if you are
8292 using an Autoconf version older then 2.60, add this line to define
8293 @samp{localedir}:
8294
8295 @example
8296 localedir = $(datadir)/locale
8297 @end example
8298
8299 @item
8300 To ensure that the final linking will use @code{@@LIBINTL@@} or
8301 @code{@@LTLIBINTL@@} as a library, add the following to
8302 @file{Makefile.am}:
8303
8304 @example
8305 <program>_LDADD = @@LIBINTL@@
8306 @end example
8307
8308 @noindent
8309 for each specific program, or
8310
8311 @example
8312 LDADD = @@LIBINTL@@
8313 @end example
8314
8315 for all programs together.  Remember that when you use @code{libtool}
8316 to link a program, you need to use @@LTLIBINTL@@ instead of @@LIBINTL@@
8317 for that program.
8318
8319 @item
8320 If you have an @file{intl/} directory, whose contents is created by
8321 @code{gettextize}, then to ensure that it will be searched for
8322 C preprocessor include files in all circumstances, add something like
8323 this to @file{Makefile.am}:
8324
8325 @example
8326 AM_CPPFLAGS = -I../intl -I$(top_srcdir)/intl
8327 @end example
8328
8329 @end enumerate
8330
8331 @node lib/gettext.h,  , src/Makefile, Adjusting Files
8332 @subsection @file{gettext.h} in @file{lib/}
8333 @cindex @file{gettext.h} file
8334 @cindex turning off NLS support
8335 @cindex disabling NLS
8336
8337 Internationalization of packages, as provided by GNU @code{gettext}, is
8338 optional.  It can be turned off in two situations:
8339
8340 @itemize @bullet
8341 @item
8342 When the installer has specified @samp{./configure --disable-nls}.  This
8343 can be useful when small binaries are more important than features, for
8344 example when building utilities for boot diskettes.  It can also be useful
8345 in order to get some specific C compiler warnings about code quality with
8346 some older versions of GCC (older than 3.0).
8347
8348 @item
8349 When the package does not include the @code{intl/} subdirectory, and the
8350 libintl.h header (with its associated libintl library, if any) is not
8351 already installed on the system, it is preferable that the package builds
8352 without internationalization support, rather than to give a compilation
8353 error.
8354 @end itemize
8355
8356 A C preprocessor macro can be used to detect these two cases.  Usually,
8357 when @code{libintl.h} was found and not explicitly disabled, the
8358 @code{ENABLE_NLS} macro will be defined to 1 in the autoconf generated
8359 configuration file (usually called @file{config.h}).  In the two negative
8360 situations, however, this macro will not be defined, thus it will evaluate
8361 to 0 in C preprocessor expressions.
8362
8363 @cindex include file @file{libintl.h}
8364 @file{gettext.h} is a convenience header file for conditional use of
8365 @file{<libintl.h>}, depending on the @code{ENABLE_NLS} macro.  If
8366 @code{ENABLE_NLS} is set, it includes @file{<libintl.h>}; otherwise it
8367 defines no-op substitutes for the libintl.h functions.  We recommend
8368 the use of @code{"gettext.h"} over direct use of @file{<libintl.h>},
8369 so that portability to older systems is guaranteed and installers can
8370 turn off internationalization if they want to.  In the C code, you will
8371 then write
8372
8373 @example
8374 #include "gettext.h"
8375 @end example
8376
8377 @noindent
8378 instead of
8379
8380 @example
8381 #include <libintl.h>
8382 @end example
8383
8384 The location of @code{gettext.h} is usually in a directory containing
8385 auxiliary include files.  In many GNU packages, there is a directory
8386 @file{lib/} containing helper functions; @file{gettext.h} fits there.
8387 In other packages, it can go into the @file{src} directory.
8388
8389 Do not install the @code{gettext.h} file in public locations.  Every
8390 package that needs it should contain a copy of it on its own.
8391
8392 @node autoconf macros, CVS Issues, Adjusting Files, Maintainers
8393 @section Autoconf macros for use in @file{configure.ac}
8394 @cindex autoconf macros for @code{gettext}
8395
8396 GNU @code{gettext} installs macros for use in a package's
8397 @file{configure.ac} or @file{configure.in}.
8398 @xref{Top, , Introduction, autoconf, The Autoconf Manual}.
8399 The primary macro is, of course, @code{AM_GNU_GETTEXT}.
8400
8401 @menu
8402 * AM_GNU_GETTEXT::              AM_GNU_GETTEXT in @file{gettext.m4}
8403 * AM_GNU_GETTEXT_VERSION::      AM_GNU_GETTEXT_VERSION in @file{gettext.m4}
8404 * AM_GNU_GETTEXT_NEED::         AM_GNU_GETTEXT_NEED in @file{gettext.m4}
8405 * AM_GNU_GETTEXT_INTL_SUBDIR::  AM_GNU_GETTEXT_INTL_SUBDIR in @file{intldir.m4}
8406 * AM_PO_SUBDIRS::               AM_PO_SUBDIRS in @file{po.m4}
8407 * AM_XGETTEXT_OPTION::          AM_XGETTEXT_OPTION in @file{po.m4}
8408 * AM_ICONV::                    AM_ICONV in @file{iconv.m4}
8409 @end menu
8410
8411 @node AM_GNU_GETTEXT, AM_GNU_GETTEXT_VERSION, autoconf macros, autoconf macros
8412 @subsection AM_GNU_GETTEXT in @file{gettext.m4}
8413
8414 @amindex AM_GNU_GETTEXT
8415 The @code{AM_GNU_GETTEXT} macro tests for the presence of the GNU gettext
8416 function family in either the C library or a separate @code{libintl}
8417 library (shared or static libraries are both supported) or in the package's
8418 @file{intl/} directory.  It also invokes @code{AM_PO_SUBDIRS}, thus preparing
8419 the @file{po/} directories of the package for building.
8420
8421 @code{AM_GNU_GETTEXT} accepts up to three optional arguments.  The general
8422 syntax is
8423
8424 @example
8425 AM_GNU_GETTEXT([@var{intlsymbol}], [@var{needsymbol}], [@var{intldir}])
8426 @end example
8427
8428 @c We don't document @var{intlsymbol} = @samp{use-libtool} here, because
8429 @c it is of no use for packages other than GNU gettext itself.  (Such packages
8430 @c are not allowed to install the shared libintl.  But if they use libtool,
8431 @c then it is in order to install shared libraries that depend on libintl.)
8432 @var{intlsymbol} can be @samp{external} or @samp{no-libtool}.  The default
8433 (if it is not specified or empty) is @samp{no-libtool}.  @var{intlsymbol}
8434 should be @samp{external} for packages with no @file{intl/} directory.
8435 For packages with an @file{intl/} directory, you can either use an
8436 @var{intlsymbol} equal to @samp{no-libtool}, or you can use @samp{external}
8437 and override by using the macro @code{AM_GNU_GETTEXT_INTL_SUBDIR} elsewhere.
8438 The two ways to specify the existence of an @file{intl/} directory are
8439 equivalent.  At build time, a static library
8440 @code{$(top_builddir)/intl/libintl.a} will then be created.
8441
8442 If @var{needsymbol} is specified and is @samp{need-ngettext}, then GNU
8443 gettext implementations (in libc or libintl) without the @code{ngettext()}
8444 function will be ignored.  If @var{needsymbol} is specified and is
8445 @samp{need-formatstring-macros}, then GNU gettext implementations that don't
8446 support the ISO C 99 @file{<inttypes.h>} formatstring macros will be ignored.
8447 Only one @var{needsymbol} can be specified.  These requirements can also be
8448 specified by using the macro @code{AM_GNU_GETTEXT_NEED} elsewhere.  To specify
8449 more than one requirement, just specify the strongest one among them, or
8450 invoke the @code{AM_GNU_GETTEXT_NEED} macro several times.  The hierarchy
8451 among the various alternatives is as follows: @samp{need-formatstring-macros}
8452 implies @samp{need-ngettext}.
8453
8454 @var{intldir} is used to find the intl libraries.  If empty, the value
8455 @samp{$(top_builddir)/intl/} is used.
8456
8457 The @code{AM_GNU_GETTEXT} macro determines whether GNU gettext is
8458 available and should be used.  If so, it sets the @code{USE_NLS} variable
8459 to @samp{yes}; it defines @code{ENABLE_NLS} to 1 in the autoconf
8460 generated configuration file (usually called @file{config.h}); it sets
8461 the variables @code{LIBINTL} and @code{LTLIBINTL} to the linker options
8462 for use in a Makefile (@code{LIBINTL} for use without libtool,
8463 @code{LTLIBINTL} for use with libtool); it adds an @samp{-I} option to
8464 @code{CPPFLAGS} if necessary.  In the negative case, it sets
8465 @code{USE_NLS} to @samp{no}; it sets @code{LIBINTL} and @code{LTLIBINTL}
8466 to empty and doesn't change @code{CPPFLAGS}.
8467
8468 The complexities that @code{AM_GNU_GETTEXT} deals with are the following:
8469
8470 @itemize @bullet
8471 @item
8472 @cindex @code{libintl} library
8473 Some operating systems have @code{gettext} in the C library, for example
8474 glibc.  Some have it in a separate library @code{libintl}.  GNU @code{libintl}
8475 might have been installed as part of the GNU @code{gettext} package.
8476
8477 @item
8478 GNU @code{libintl}, if installed, is not necessarily already in the search
8479 path (@code{CPPFLAGS} for the include file search path, @code{LDFLAGS} for
8480 the library search path).
8481
8482 @item
8483 Except for glibc, the operating system's native @code{gettext} cannot
8484 exploit the GNU mo files, doesn't have the necessary locale dependency
8485 features, and cannot convert messages from the catalog's text encoding
8486 to the user's locale encoding.
8487
8488 @item
8489 GNU @code{libintl}, if installed, is not necessarily already in the
8490 run time library search path.  To avoid the need for setting an environment
8491 variable like @code{LD_LIBRARY_PATH}, the macro adds the appropriate
8492 run time search path options to the @code{LIBINTL} and @code{LTLIBINTL}
8493 variables.  This works on most systems, but not on some operating systems
8494 with limited shared library support, like SCO.
8495
8496 @item
8497 GNU @code{libintl} relies on POSIX/XSI @code{iconv}.  The macro checks for
8498 linker options needed to use iconv and appends them to the @code{LIBINTL}
8499 and @code{LTLIBINTL} variables.
8500 @end itemize
8501
8502 @node AM_GNU_GETTEXT_VERSION, AM_GNU_GETTEXT_NEED, AM_GNU_GETTEXT, autoconf macros
8503 @subsection AM_GNU_GETTEXT_VERSION in @file{gettext.m4}
8504
8505 @amindex AM_GNU_GETTEXT_VERSION
8506 The @code{AM_GNU_GETTEXT_VERSION} macro declares the version number of
8507 the GNU gettext infrastructure that is used by the package.
8508
8509 The use of this macro is optional; only the @code{autopoint} program makes
8510 use of it (@pxref{CVS Issues}).
8511
8512 @node AM_GNU_GETTEXT_NEED, AM_GNU_GETTEXT_INTL_SUBDIR, AM_GNU_GETTEXT_VERSION, autoconf macros
8513 @subsection AM_GNU_GETTEXT_NEED in @file{gettext.m4}
8514
8515 @amindex AM_GNU_GETTEXT_NEED
8516 The @code{AM_GNU_GETTEXT_NEED} macro declares a constraint regarding the
8517 GNU gettext implementation.  The syntax is
8518
8519 @example
8520 AM_GNU_GETTEXT_NEED([@var{needsymbol}])
8521 @end example
8522
8523 If @var{needsymbol} is @samp{need-ngettext}, then GNU gettext implementations
8524 (in libc or libintl) without the @code{ngettext()} function will be ignored.
8525 If @var{needsymbol} is @samp{need-formatstring-macros}, then GNU gettext
8526 implementations that don't support the ISO C 99 @file{<inttypes.h>}
8527 formatstring macros will be ignored.
8528
8529 The optional second argument of @code{AM_GNU_GETTEXT} is also taken into
8530 account.
8531
8532 The @code{AM_GNU_GETTEXT_NEED} invocations can occur before or after
8533 the @code{AM_GNU_GETTEXT} invocation; the order doesn't matter.
8534
8535 @node AM_GNU_GETTEXT_INTL_SUBDIR, AM_PO_SUBDIRS, AM_GNU_GETTEXT_NEED, autoconf macros
8536 @subsection AM_GNU_GETTEXT_INTL_SUBDIR in @file{intldir.m4}
8537
8538 @amindex AM_GNU_GETTEXT_INTL_SUBDIR
8539 The @code{AM_GNU_GETTEXT_INTL_SUBDIR} macro specifies that the
8540 @code{AM_GNU_GETTEXT} macro, although invoked with the first argument
8541 @samp{external}, should also prepare for building the @file{intl/}
8542 subdirectory.
8543
8544 The @code{AM_GNU_GETTEXT_INTL_SUBDIR} invocation can occur before or after
8545 the @code{AM_GNU_GETTEXT} invocation; the order doesn't matter.
8546
8547 The use of this macro requires GNU automake 1.10 or newer and
8548 GNU autoconf 2.61 or newer.
8549
8550 @node AM_PO_SUBDIRS, AM_XGETTEXT_OPTION, AM_GNU_GETTEXT_INTL_SUBDIR, autoconf macros
8551 @subsection AM_PO_SUBDIRS in @file{po.m4}
8552
8553 @amindex AM_PO_SUBDIRS
8554 The @code{AM_PO_SUBDIRS} macro prepares the @file{po/} directories of the
8555 package for building.  This macro should be used in internationalized
8556 programs written in other programming languages than C, C++, Objective C,
8557 for example @code{sh}, @code{Python}, @code{Lisp}.  See @ref{Programming
8558 Languages} for a list of programming languages that support localization
8559 through PO files.
8560
8561 The @code{AM_PO_SUBDIRS} macro determines whether internationalization
8562 should be used.  If so, it sets the @code{USE_NLS} variable to @samp{yes},
8563 otherwise to @samp{no}.  It also determines the right values for Makefile
8564 variables in each @file{po/} directory.
8565
8566 @node AM_XGETTEXT_OPTION, AM_ICONV, AM_PO_SUBDIRS, autoconf macros
8567 @subsection AM_XGETTEXT_OPTION in @file{po.m4}
8568
8569 @amindex AM_XGETTEXT_OPTION
8570 The @code{AM_XGETTEXT_OPTION} macro registers a command-line option to be
8571 used in the invocations of @code{xgettext} in the @file{po/} directories
8572 of the package.
8573
8574 For example, if you have a source file that defines a function
8575 @samp{error_at_line} whose fifth argument is a format string, you can use
8576 @example
8577 AM_XGETTEXT_OPTION([--flag=error_at_line:5:c-format])
8578 @end example
8579 @noindent
8580 to instruct @code{xgettext} to mark all translatable strings in @samp{gettext}
8581 invocations that occur as fifth argument to this function as @samp{c-format}.
8582
8583 See @ref{xgettext Invocation} for the list of options that @code{xgettext}
8584 accepts.
8585
8586 The use of this macro is an alternative to the use of the
8587 @samp{XGETTEXT_OPTIONS} variable in @file{po/Makevars}.
8588
8589 @node AM_ICONV,  , AM_XGETTEXT_OPTION, autoconf macros
8590 @subsection AM_ICONV in @file{iconv.m4}
8591
8592 @amindex AM_ICONV
8593 The @code{AM_ICONV} macro tests for the presence of the POSIX/XSI
8594 @code{iconv} function family in either the C library or a separate
8595 @code{libiconv} library.  If found, it sets the @code{am_cv_func_iconv}
8596 variable to @samp{yes}; it defines @code{HAVE_ICONV} to 1 in the autoconf
8597 generated configuration file (usually called @file{config.h}); it defines
8598 @code{ICONV_CONST} to @samp{const} or to empty, depending on whether the
8599 second argument of @code{iconv()} is of type @samp{const char **} or
8600 @samp{char **}; it sets the variables @code{LIBICONV} and
8601 @code{LTLIBICONV} to the linker options for use in a Makefile
8602 (@code{LIBICONV} for use without libtool, @code{LTLIBICONV} for use with
8603 libtool); it adds an @samp{-I} option to @code{CPPFLAGS} if
8604 necessary.  If not found, it sets @code{LIBICONV} and @code{LTLIBICONV} to
8605 empty and doesn't change @code{CPPFLAGS}.
8606
8607 The complexities that @code{AM_ICONV} deals with are the following:
8608
8609 @itemize @bullet
8610 @item
8611 @cindex @code{libiconv} library
8612 Some operating systems have @code{iconv} in the C library, for example
8613 glibc.  Some have it in a separate library @code{libiconv}, for example
8614 OSF/1 or FreeBSD.  Regardless of the operating system, GNU @code{libiconv}
8615 might have been installed.  In that case, it should be used instead of the
8616 operating system's native @code{iconv}.
8617
8618 @item
8619 GNU @code{libiconv}, if installed, is not necessarily already in the search
8620 path (@code{CPPFLAGS} for the include file search path, @code{LDFLAGS} for
8621 the library search path).
8622
8623 @item
8624 GNU @code{libiconv} is binary incompatible with some operating system's
8625 native @code{iconv}, for example on FreeBSD.  Use of an @file{iconv.h}
8626 and @file{libiconv.so} that don't fit together would produce program
8627 crashes.
8628
8629 @item
8630 GNU @code{libiconv}, if installed, is not necessarily already in the
8631 run time library search path.  To avoid the need for setting an environment
8632 variable like @code{LD_LIBRARY_PATH}, the macro adds the appropriate
8633 run time search path options to the @code{LIBICONV} variable.  This works
8634 on most systems, but not on some operating systems with limited shared
8635 library support, like SCO.
8636 @end itemize
8637
8638 @file{iconv.m4} is distributed with the GNU gettext package because
8639 @file{gettext.m4} relies on it.
8640
8641 @node CVS Issues, Release Management, autoconf macros, Maintainers
8642 @section Integrating with CVS
8643
8644 Many projects use CVS for distributed development, version control and
8645 source backup.  This section gives some advice how to manage the uses
8646 of @code{cvs}, @code{gettextize}, @code{autopoint} and @code{autoconf}.
8647
8648 @menu
8649 * Distributed CVS::             Avoiding version mismatch in distributed development
8650 * Files under CVS::             Files to put under CVS version control
8651 * autopoint Invocation::        Invoking the @code{autopoint} Program
8652 @end menu
8653
8654 @node Distributed CVS, Files under CVS, CVS Issues, CVS Issues
8655 @subsection Avoiding version mismatch in distributed development
8656
8657 In a project development with multiple developers, using CVS, there
8658 should be a single developer who occasionally - when there is desire to
8659 upgrade to a new @code{gettext} version - runs @code{gettextize} and
8660 performs the changes listed in @ref{Adjusting Files}, and then commits
8661 his changes to the CVS.
8662
8663 It is highly recommended that all developers on a project use the same
8664 version of GNU @code{gettext} in the package.  In other words, if a
8665 developer runs @code{gettextize}, he should go the whole way, make the
8666 necessary remaining changes and commit his changes to the CVS.
8667 Otherwise the following damages will likely occur:
8668
8669 @itemize @bullet
8670 @item
8671 Apparent version mismatch between developers.  Since some @code{gettext}
8672 specific portions in @file{configure.ac}, @file{configure.in} and
8673 @code{Makefile.am}, @code{Makefile.in} files depend on the @code{gettext}
8674 version, the use of infrastructure files belonging to different
8675 @code{gettext} versions can easily lead to build errors.
8676
8677 @item
8678 Hidden version mismatch.  Such version mismatch can also lead to
8679 malfunctioning of the package, that may be undiscovered by the developers.
8680 The worst case of hidden version mismatch is that internationalization
8681 of the package doesn't work at all.
8682
8683 @item
8684 Release risks.  All developers implicitly perform constant testing on
8685 a package.  This is important in the days and weeks before a release.
8686 If the guy who makes the release tar files uses a different version
8687 of GNU @code{gettext} than the other developers, the distribution will
8688 be less well tested than if all had been using the same @code{gettext}
8689 version.  For example, it is possible that a platform specific bug goes
8690 undiscovered due to this constellation.
8691 @end itemize
8692
8693 @node Files under CVS, autopoint Invocation, Distributed CVS, CVS Issues
8694 @subsection Files to put under CVS version control
8695
8696 There are basically three ways to deal with generated files in the
8697 context of a CVS repository, such as @file{configure} generated from
8698 @file{configure.ac}, @code{@var{parser}.c} generated from
8699 @code{@var{parser}.y}, or @code{po/Makefile.in.in} autoinstalled by
8700 @code{gettextize} or @code{autopoint}.
8701
8702 @enumerate
8703 @item
8704 All generated files are always committed into the repository.
8705
8706 @item
8707 All generated files are committed into the repository occasionally,
8708 for example each time a release is made.
8709
8710 @item
8711 Generated files are never committed into the repository.
8712 @end enumerate
8713
8714 Each of these three approaches has different advantages and drawbacks.
8715
8716 @enumerate
8717 @item
8718 The advantage is that anyone can check out the CVS at any moment and
8719 gets a working build.  The drawbacks are:  1a. It requires some frequent
8720 "cvs commit" actions by the maintainers.  1b. The repository grows in size
8721 quite fast.
8722
8723 @item
8724 The advantage is that anyone can check out the CVS, and the usual
8725 "./configure; make" will work.  The drawbacks are:  2a. The one who
8726 checks out the repository needs tools like GNU @code{automake},
8727 GNU @code{autoconf}, GNU @code{m4} installed in his PATH; sometimes
8728 he even needs particular versions of them.  2b. When a release is made
8729 and a commit is made on the generated files, the other developers get
8730 conflicts on the generated files after doing "cvs update".  Although
8731 these conflicts are easy to resolve, they are annoying.
8732
8733 @item
8734 The advantage is less work for the maintainers.  The drawback is that
8735 anyone who checks out the CVS not only needs tools like GNU @code{automake},
8736 GNU @code{autoconf}, GNU @code{m4} installed in his PATH, but also that
8737 he needs to perform a package specific pre-build step before being able
8738 to "./configure; make".
8739 @end enumerate
8740
8741 For the first and second approach, all files modified or brought in
8742 by the occasional @code{gettextize} invocation and update should be
8743 committed into the CVS.
8744
8745 For the third approach, the maintainer can omit from the CVS repository
8746 all the files that @code{gettextize} mentions as "copy".  Instead, he
8747 adds to the @file{configure.ac} or @file{configure.in} a line of the
8748 form
8749
8750 @example
8751 AM_GNU_GETTEXT_VERSION(@value{VERSION})
8752 @end example
8753
8754 @noindent
8755 and adds to the package's pre-build script an invocation of
8756 @samp{autopoint}.  For everyone who checks out the CVS, this
8757 @code{autopoint} invocation will copy into the right place the
8758 @code{gettext} infrastructure files that have been omitted from the CVS.
8759
8760 The version number used as argument to @code{AM_GNU_GETTEXT_VERSION} is
8761 the version of the @code{gettext} infrastructure that the package wants
8762 to use.  It is also the minimum version number of the @samp{autopoint}
8763 program.  So, if you write @code{AM_GNU_GETTEXT_VERSION(0.11.5)} then the
8764 developers can have any version >= 0.11.5 installed; the package will work
8765 with the 0.11.5 infrastructure in all developers' builds.  When the
8766 maintainer then runs gettextize from, say, version 0.12.1 on the package,
8767 the occurrence of @code{AM_GNU_GETTEXT_VERSION(0.11.5)} will be changed
8768 into @code{AM_GNU_GETTEXT_VERSION(0.12.1)}, and all other developers that
8769 use the CVS will henceforth need to have GNU @code{gettext} 0.12.1 or newer
8770 installed.
8771
8772 @node autopoint Invocation,  , Files under CVS, CVS Issues
8773 @subsection Invoking the @code{autopoint} Program
8774
8775 @include autopoint.texi
8776
8777 @node Release Management,  , CVS Issues, Maintainers
8778 @section Creating a Distribution Tarball
8779
8780 @cindex release
8781 @cindex distribution tarball
8782 In projects that use GNU @code{automake}, the usual commands for creating
8783 a distribution tarball, @samp{make dist} or @samp{make distcheck},
8784 automatically update the PO files as needed.
8785
8786 If GNU @code{automake} is not used, the maintainer needs to perform this
8787 update before making a release:
8788
8789 @example
8790 $ ./configure
8791 $ (cd po; make update-po)
8792 $ make distclean
8793 @end example
8794
8795 @node Installers, Programming Languages, Maintainers, Top
8796 @chapter The Installer's and Distributor's View
8797 @cindex package installer's view of @code{gettext}
8798 @cindex package distributor's view of @code{gettext}
8799 @cindex package build and installation options
8800 @cindex setting up @code{gettext} at build time
8801
8802 By default, packages fully using GNU @code{gettext}, internally,
8803 are installed in such a way that they to allow translation of
8804 messages.  At @emph{configuration} time, those packages should
8805 automatically detect whether the underlying host system already provides
8806 the GNU @code{gettext} functions.  If not,
8807 the GNU @code{gettext} library should be automatically prepared
8808 and used.  Installers may use special options at configuration
8809 time for changing this behavior.  The command @samp{./configure
8810 --with-included-gettext} bypasses system @code{gettext} to
8811 use the included GNU @code{gettext} instead,
8812 while @samp{./configure --disable-nls}
8813 produces programs totally unable to translate messages.
8814
8815 @vindex LINGUAS@r{, environment variable}
8816 Internationalized packages have usually many @file{@var{ll}.po}
8817 files.  Unless
8818 translations are disabled, all those available are installed together
8819 with the package.  However, the environment variable @code{LINGUAS}
8820 may be set, prior to configuration, to limit the installed set.
8821 @code{LINGUAS} should then contain a space separated list of two-letter
8822 codes, stating which languages are allowed.
8823
8824 @node Programming Languages, Conclusion, Installers, Top
8825 @chapter Other Programming Languages
8826
8827 While the presentation of @code{gettext} focuses mostly on C and
8828 implicitly applies to C++ as well, its scope is far broader than that:
8829 Many programming languages, scripting languages and other textual data
8830 like GUI resources or package descriptions can make use of the gettext
8831 approach.
8832
8833 @menu
8834 * Language Implementors::       The Language Implementor's View
8835 * Programmers for other Languages::  The Programmer's View
8836 * Translators for other Languages::  The Translator's View
8837 * Maintainers for other Languages::  The Maintainer's View
8838 * List of Programming Languages::  Individual Programming Languages
8839 * List of Data Formats::        Internationalizable Data
8840 @end menu
8841
8842 @node Language Implementors, Programmers for other Languages, Programming Languages, Programming Languages
8843 @section The Language Implementor's View
8844 @cindex programming languages
8845 @cindex scripting languages
8846
8847 All programming and scripting languages that have the notion of strings
8848 are eligible to supporting @code{gettext}.  Supporting @code{gettext}
8849 means the following:
8850
8851 @enumerate
8852 @item
8853 You should add to the language a syntax for translatable strings.  In
8854 principle, a function call of @code{gettext} would do, but a shorthand
8855 syntax helps keeping the legibility of internationalized programs.  For
8856 example, in C we use the syntax @code{_("string")}, and in GNU awk we use
8857 the shorthand @code{_"string"}.
8858
8859 @item
8860 You should arrange that evaluation of such a translatable string at
8861 runtime calls the @code{gettext} function, or performs equivalent
8862 processing.
8863
8864 @item
8865 Similarly, you should make the functions @code{ngettext},
8866 @code{dcgettext}, @code{dcngettext} available from within the language.
8867 These functions are less often used, but are nevertheless necessary for
8868 particular purposes: @code{ngettext} for correct plural handling, and
8869 @code{dcgettext} and @code{dcngettext} for obeying other locale-related
8870 environment variables than @code{LC_MESSAGES}, such as @code{LC_TIME} or
8871 @code{LC_MONETARY}.  For these latter functions, you need to make the
8872 @code{LC_*} constants, available in the C header @code{<locale.h>},
8873 referenceable from within the language, usually either as enumeration
8874 values or as strings.
8875
8876 @item
8877 You should allow the programmer to designate a message domain, either by
8878 making the @code{textdomain} function available from within the
8879 language, or by introducing a magic variable called @code{TEXTDOMAIN}.
8880 Similarly, you should allow the programmer to designate where to search
8881 for message catalogs, by providing access to the @code{bindtextdomain}
8882 function.
8883
8884 @item
8885 You should either perform a @code{setlocale (LC_ALL, "")} call during
8886 the startup of your language runtime, or allow the programmer to do so.
8887 Remember that gettext will act as a no-op if the @code{LC_MESSAGES} and
8888 @code{LC_CTYPE} locale categories are not both set.
8889
8890 @item
8891 A programmer should have a way to extract translatable strings from a
8892 program into a PO file.  The GNU @code{xgettext} program is being
8893 extended to support very different programming languages.  Please
8894 contact the GNU @code{gettext} maintainers to help them doing this.  If
8895 the string extractor is best integrated into your language's parser, GNU
8896 @code{xgettext} can function as a front end to your string extractor.
8897
8898 @item
8899 The language's library should have a string formatting facility where
8900 the arguments of a format string are denoted by a positional number or a
8901 name.  This is needed because for some languages and some messages with
8902 more than one substitutable argument, the translation will need to
8903 output the substituted arguments in different order.  @xref{c-format Flag}.
8904
8905 @item
8906 If the language has more than one implementation, and not all of the
8907 implementations use @code{gettext}, but the programs should be portable
8908 across implementations, you should provide a no-i18n emulation, that
8909 makes the other implementations accept programs written for yours,
8910 without actually translating the strings.
8911
8912 @item
8913 To help the programmer in the task of marking translatable strings,
8914 which is sometimes performed using the Emacs PO mode (@pxref{Marking}),
8915 you are welcome to
8916 contact the GNU @code{gettext} maintainers, so they can add support for
8917 your language to @file{po-mode.el}.
8918 @end enumerate
8919
8920 On the implementation side, three approaches are possible, with
8921 different effects on portability and copyright:
8922
8923 @itemize @bullet
8924 @item
8925 You may integrate the GNU @code{gettext}'s @file{intl/} directory in
8926 your package, as described in @ref{Maintainers}.  This allows you to
8927 have internationalization on all kinds of platforms.  Note that when you
8928 then distribute your package, it legally falls under the GNU General
8929 Public License, and the GNU project will be glad about your contribution
8930 to the Free Software pool.
8931
8932 @item
8933 You may link against GNU @code{gettext} functions if they are found in
8934 the C library.  For example, an autoconf test for @code{gettext()} and
8935 @code{ngettext()} will detect this situation.  For the moment, this test
8936 will succeed on GNU systems and not on other platforms.  No severe
8937 copyright restrictions apply.
8938
8939 @item
8940 You may emulate or reimplement the GNU @code{gettext} functionality.
8941 This has the advantage of full portability and no copyright
8942 restrictions, but also the drawback that you have to reimplement the GNU
8943 @code{gettext} features (such as the @code{LANGUAGE} environment
8944 variable, the locale aliases database, the automatic charset conversion,
8945 and plural handling).
8946 @end itemize
8947
8948 @node Programmers for other Languages, Translators for other Languages, Language Implementors, Programming Languages
8949 @section The Programmer's View
8950
8951 For the programmer, the general procedure is the same as for the C
8952 language.  The Emacs PO mode marking supports other languages, and the GNU
8953 @code{xgettext} string extractor recognizes other languages based on the
8954 file extension or a command-line option.  In some languages,
8955 @code{setlocale} is not needed because it is already performed by the
8956 underlying language runtime.
8957
8958 @node Translators for other Languages, Maintainers for other Languages, Programmers for other Languages, Programming Languages
8959 @section The Translator's View
8960
8961 The translator works exactly as in the C language case.  The only
8962 difference is that when translating format strings, she has to be aware
8963 of the language's particular syntax for positional arguments in format
8964 strings.
8965
8966 @menu
8967 * c-format::                    C Format Strings
8968 * objc-format::                 Objective C Format Strings
8969 * sh-format::                   Shell Format Strings
8970 * python-format::               Python Format Strings
8971 * lisp-format::                 Lisp Format Strings
8972 * elisp-format::                Emacs Lisp Format Strings
8973 * librep-format::               librep Format Strings
8974 * scheme-format::               Scheme Format Strings
8975 * smalltalk-format::            Smalltalk Format Strings
8976 * java-format::                 Java Format Strings
8977 * csharp-format::               C# Format Strings
8978 * awk-format::                  awk Format Strings
8979 * object-pascal-format::        Object Pascal Format Strings
8980 * ycp-format::                  YCP Format Strings
8981 * tcl-format::                  Tcl Format Strings
8982 * perl-format::                 Perl Format Strings
8983 * php-format::                  PHP Format Strings
8984 * gcc-internal-format::         GCC internal Format Strings
8985 * gfc-internal-format::         GFC internal Format Strings
8986 * qt-format::                   Qt Format Strings
8987 * qt-plural-format::            Qt Plural Format Strings
8988 * kde-format::                  KDE Format Strings
8989 * boost-format::                Boost Format Strings
8990 * lua-format::                  Lua Format Strings
8991 * javascript-format::           JavaScript Format Strings
8992 @end menu
8993
8994 @node c-format, objc-format, Translators for other Languages, Translators for other Languages
8995 @subsection C Format Strings
8996
8997 C format strings are described in POSIX (IEEE P1003.1 2001), section
8998 XSH 3 fprintf(),
8999 @uref{http://www.opengroup.org/onlinepubs/007904975/functions/fprintf.html}.
9000 See also the fprintf() manual page,
9001 @uref{http://www.linuxvalley.it/encyclopedia/ldp/manpage/man3/printf.3.php},
9002 @uref{http://informatik.fh-wuerzburg.de/student/i510/man/printf.html}.
9003
9004 Although format strings with positions that reorder arguments, such as
9005
9006 @example
9007 "Only %2$d bytes free on '%1$s'."
9008 @end example
9009
9010 @noindent
9011 which is semantically equivalent to
9012
9013 @example
9014 "'%s' has only %d bytes free."
9015 @end example
9016
9017 @noindent
9018 are a POSIX/XSI feature and not specified by ISO C 99, translators can rely
9019 on this reordering ability: On the few platforms where @code{printf()},
9020 @code{fprintf()} etc. don't support this feature natively, @file{libintl.a}
9021 or @file{libintl.so} provides replacement functions, and GNU @code{<libintl.h>}
9022 activates these replacement functions automatically.
9023
9024 @cindex outdigits
9025 @cindex Arabic digits
9026 As a special feature for Farsi (Persian) and maybe Arabic, translators can
9027 insert an @samp{I} flag into numeric format directives.  For example, the
9028 translation of @code{"%d"} can be @code{"%Id"}.  The effect of this flag,
9029 on systems with GNU @code{libc}, is that in the output, the ASCII digits are
9030 replaced with the @samp{outdigits} defined in the @code{LC_CTYPE} locale
9031 category.  On other systems, the @code{gettext} function removes this flag,
9032 so that it has no effect.
9033
9034 Note that the programmer should @emph{not} put this flag into the
9035 untranslated string.  (Putting the @samp{I} format directive flag into an
9036 @var{msgid} string would lead to undefined behaviour on platforms without
9037 glibc when NLS is disabled.)
9038
9039 @node objc-format, sh-format, c-format, Translators for other Languages
9040 @subsection Objective C Format Strings
9041
9042 Objective C format strings are like C format strings.  They support an
9043 additional format directive: "%@@", which when executed consumes an argument
9044 of type @code{Object *}.
9045
9046 @node sh-format, python-format, objc-format, Translators for other Languages
9047 @subsection Shell Format Strings
9048
9049 Shell format strings, as supported by GNU gettext and the @samp{envsubst}
9050 program, are strings with references to shell variables in the form
9051 @code{$@var{variable}} or @code{$@{@var{variable}@}}.  References of the form
9052 @code{$@{@var{variable}-@var{default}@}},
9053 @code{$@{@var{variable}:-@var{default}@}},
9054 @code{$@{@var{variable}=@var{default}@}},
9055 @code{$@{@var{variable}:=@var{default}@}},
9056 @code{$@{@var{variable}+@var{replacement}@}},
9057 @code{$@{@var{variable}:+@var{replacement}@}},
9058 @code{$@{@var{variable}?@var{ignored}@}},
9059 @code{$@{@var{variable}:?@var{ignored}@}},
9060 that would be valid inside shell scripts, are not supported.  The
9061 @var{variable} names must consist solely of alphanumeric or underscore
9062 ASCII characters, not start with a digit and be nonempty; otherwise such
9063 a variable reference is ignored.
9064
9065 @node python-format, lisp-format, sh-format, Translators for other Languages
9066 @subsection Python Format Strings
9067
9068 There are two kinds of format strings in Python: those acceptable to
9069 the Python built-in format operator @code{%}, labelled as
9070 @samp{python-format}, and those acceptable to the @code{format} method
9071 of the @samp{str} object.
9072
9073 Python @code{%} format strings are described in
9074 @w{Python Library reference} /
9075 @w{2. Built-in Types, Exceptions and Functions} /
9076 @w{2.2. Built-in Types} /
9077 @w{2.2.6. Sequence Types} /
9078 @w{2.2.6.2. String Formatting Operations}.
9079 @uref{http://www.python.org/doc/2.2.1/lib/typesseq-strings.html}.
9080
9081 Python brace format strings are described in @w{PEP 3101 -- Advanced
9082 String Formatting}, @uref{http://www.python.org/dev/peps/pep-3101/}.
9083
9084 @node lisp-format, elisp-format, python-format, Translators for other Languages
9085 @subsection Lisp Format Strings
9086
9087 Lisp format strings are described in the Common Lisp HyperSpec,
9088 chapter 22.3 @w{Formatted Output},
9089 @uref{http://www.lisp.org/HyperSpec/Body/sec_22-3.html}.
9090
9091 @node elisp-format, librep-format, lisp-format, Translators for other Languages
9092 @subsection Emacs Lisp Format Strings
9093
9094 Emacs Lisp format strings are documented in the Emacs Lisp reference,
9095 section @w{Formatting Strings},
9096 @uref{http://www.gnu.org/manual/elisp-manual-21-2.8/html_chapter/elisp_4.html#SEC75}.
9097 Note that as of version 21, XEmacs supports numbered argument specifications
9098 in format strings while FSF Emacs doesn't.
9099
9100 @node librep-format, scheme-format, elisp-format, Translators for other Languages
9101 @subsection librep Format Strings
9102
9103 librep format strings are documented in the librep manual, section
9104 @w{Formatted Output},
9105 @url{http://librep.sourceforge.net/librep-manual.html#Formatted%20Output},
9106 @url{http://www.gwinnup.org/research/docs/librep.html#SEC122}.
9107
9108 @node scheme-format, smalltalk-format, librep-format, Translators for other Languages
9109 @subsection Scheme Format Strings
9110
9111 Scheme format strings are documented in the SLIB manual, section
9112 @w{Format Specification}.
9113
9114 @node smalltalk-format, java-format, scheme-format, Translators for other Languages
9115 @subsection Smalltalk Format Strings
9116
9117 Smalltalk format strings are described in the GNU Smalltalk documentation,
9118 class @code{CharArray}, methods @samp{bindWith:} and
9119 @samp{bindWithArguments:}.
9120 @uref{http://www.gnu.org/software/smalltalk/gst-manual/gst_68.html#SEC238}.
9121 In summary, a directive starts with @samp{%} and is followed by @samp{%}
9122 or a nonzero digit (@samp{1} to @samp{9}).
9123
9124 @node java-format, csharp-format, smalltalk-format, Translators for other Languages
9125 @subsection Java Format Strings
9126
9127 Java format strings are described in the JDK documentation for class
9128 @code{java.text.MessageFormat},
9129 @uref{http://java.sun.com/j2se/1.4/docs/api/java/text/MessageFormat.html}.
9130 See also the ICU documentation
9131 @uref{http://oss.software.ibm.com/icu/apiref/classMessageFormat.html}.
9132
9133 @node csharp-format, awk-format, java-format, Translators for other Languages
9134 @subsection C# Format Strings
9135
9136 C# format strings are described in the .NET documentation for class
9137 @code{System.String} and in
9138 @uref{http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpConFormattingOverview.asp}.
9139
9140 @node awk-format, object-pascal-format, csharp-format, Translators for other Languages
9141 @subsection awk Format Strings
9142
9143 awk format strings are described in the gawk documentation, section
9144 @w{Printf},
9145 @uref{http://www.gnu.org/manual/gawk/html_node/Printf.html#Printf}.
9146
9147 @node object-pascal-format, ycp-format, awk-format, Translators for other Languages
9148 @subsection Object Pascal Format Strings
9149
9150 Object Pascal format strings are described in the documentation of the
9151 Free Pascal runtime library, section Format,
9152 @uref{http://www.freepascal.org/docs-html/rtl/sysutils/format.html}.
9153
9154 @node ycp-format, tcl-format, object-pascal-format, Translators for other Languages
9155 @subsection YCP Format Strings
9156
9157 YCP sformat strings are described in the libycp documentation
9158 @uref{file:/usr/share/doc/packages/libycp/YCP-builtins.html}.
9159 In summary, a directive starts with @samp{%} and is followed by @samp{%}
9160 or a nonzero digit (@samp{1} to @samp{9}).
9161
9162 @node tcl-format, perl-format, ycp-format, Translators for other Languages
9163 @subsection Tcl Format Strings
9164
9165 Tcl format strings are described in the @file{format.n} manual page,
9166 @uref{http://www.scriptics.com/man/tcl8.3/TclCmd/format.htm}.
9167
9168 @node perl-format, php-format, tcl-format, Translators for other Languages
9169 @subsection Perl Format Strings
9170
9171 There are two kinds format strings in Perl: those acceptable to the
9172 Perl built-in function @code{printf}, labelled as @samp{perl-format},
9173 and those acceptable to the @code{libintl-perl} function @code{__x},
9174 labelled as @samp{perl-brace-format}.
9175
9176 Perl @code{printf} format strings are described in the @code{sprintf}
9177 section of @samp{man perlfunc}.
9178
9179 Perl brace format strings are described in the
9180 @file{Locale::TextDomain(3pm)} manual page of the CPAN package
9181 libintl-perl.  In brief, Perl format uses placeholders put between
9182 braces (@samp{@{} and @samp{@}}).  The placeholder must have the syntax
9183 of simple identifiers.
9184
9185 @node php-format, gcc-internal-format, perl-format, Translators for other Languages
9186 @subsection PHP Format Strings
9187
9188 PHP format strings are described in the documentation of the PHP function
9189 @code{sprintf}, in @file{phpdoc/manual/function.sprintf.html} or
9190 @uref{http://www.php.net/manual/en/function.sprintf.php}.
9191
9192 @node gcc-internal-format, gfc-internal-format, php-format, Translators for other Languages
9193 @subsection GCC internal Format Strings
9194
9195 These format strings are used inside the GCC sources.  In such a format
9196 string, a directive starts with @samp{%}, is optionally followed by a
9197 size specifier @samp{l}, an optional flag @samp{+}, another optional flag
9198 @samp{#}, and is finished by a specifier: @samp{%} denotes a literal
9199 percent sign, @samp{c} denotes a character, @samp{s} denotes a string,
9200 @samp{i} and @samp{d} denote an integer, @samp{o}, @samp{u}, @samp{x}
9201 denote an unsigned integer, @samp{.*s} denotes a string preceded by a
9202 width specification, @samp{H} denotes a @samp{location_t *} pointer,
9203 @samp{D} denotes a general declaration, @samp{F} denotes a function
9204 declaration, @samp{T} denotes a type, @samp{A} denotes a function argument,
9205 @samp{C} denotes a tree code, @samp{E} denotes an expression, @samp{L}
9206 denotes a programming language, @samp{O} denotes a binary operator,
9207 @samp{P} denotes a function parameter, @samp{Q} denotes an assignment
9208 operator, @samp{V} denotes a const/volatile qualifier.
9209
9210 @node gfc-internal-format, qt-format, gcc-internal-format, Translators for other Languages
9211 @subsection GFC internal Format Strings
9212
9213 These format strings are used inside the GNU Fortran Compiler sources,
9214 that is, the Fortran frontend in the GCC sources.  In such a format
9215 string, a directive starts with @samp{%} and is finished by a
9216 specifier: @samp{%} denotes a literal percent sign, @samp{C} denotes the
9217 current source location, @samp{L} denotes a source location, @samp{c}
9218 denotes a character, @samp{s} denotes a string, @samp{i} and @samp{d}
9219 denote an integer, @samp{u} denotes an unsigned integer.  @samp{i},
9220 @samp{d}, and @samp{u} may be preceded by a size specifier @samp{l}.
9221
9222 @node qt-format, qt-plural-format, gfc-internal-format, Translators for other Languages
9223 @subsection Qt Format Strings
9224
9225 Qt format strings are described in the documentation of the QString class
9226 @uref{file:/usr/lib/qt-4.3.0/doc/html/qstring.html}.
9227 In summary, a directive consists of a @samp{%} followed by a digit. The same
9228 directive cannot occur more than once in a format string.
9229
9230 @node qt-plural-format, kde-format, qt-format, Translators for other Languages
9231 @subsection Qt Format Strings
9232
9233 Qt format strings are described in the documentation of the QObject::tr method
9234 @uref{file:/usr/lib/qt-4.3.0/doc/html/qobject.html}.
9235 In summary, the only allowed directive is @samp{%n}.
9236
9237 @node kde-format, boost-format, qt-plural-format, Translators for other Languages
9238 @subsection KDE Format Strings
9239
9240 KDE 4 format strings are defined as follows:
9241 A directive consists of a @samp{%} followed by a non-zero decimal number.
9242 If a @samp{%n} occurs in a format strings, all of @samp{%1}, ..., @samp{%(n-1)}
9243 must occur as well, except possibly one of them.
9244
9245 @node boost-format, lua-format, kde-format, Translators for other Languages
9246 @subsection Boost Format Strings
9247
9248 Boost format strings are described in the documentation of the
9249 @code{boost::format} class, at
9250 @uref{http://www.boost.org/libs/format/doc/format.html}.
9251 In summary, a directive has either the same syntax as in a C format string,
9252 such as @samp{%1$+5d}, or may be surrounded by vertical bars, such as
9253 @samp{%|1$+5d|} or @samp{%|1$+5|}, or consists of just an argument number
9254 between percent signs, such as @samp{%1%}.
9255
9256 @node lua-format, javascript-format, boost-format, Translators for other Languages
9257 @subsection Lua Format Strings
9258
9259 Lua format strings are described in the Lua reference manual, section @w{String Manipulation},
9260 @uref{http://www.lua.org/manual/5.1/manual.html#pdf-string.format}.
9261
9262 @node javascript-format,  , lua-format, Translators for other Languages
9263 @subsection JavaScript Format Strings
9264
9265 Although JavaScript specification itself does not define any format
9266 strings, many JavaScript implementations provide printf-like
9267 functions.  @code{xgettext} understands a set of common format strings
9268 used in popular JavaScript implementations including Gjs, Seed, and
9269 Node.JS.  In such a format string, a directive starts with @samp{%}
9270 and is finished by a specifier: @samp{%} denotes a literal percent
9271 sign, @samp{c} denotes a character, @samp{s} denotes a string,
9272 @samp{b}, @samp{d}, @samp{o}, @samp{x}, @samp{X} denote an integer,
9273 @samp{f} denotes floating-point number, @samp{j} denotes a JSON
9274 object.
9275
9276
9277 @node Maintainers for other Languages, List of Programming Languages, Translators for other Languages, Programming Languages
9278 @section The Maintainer's View
9279
9280 For the maintainer, the general procedure differs from the C language
9281 case in two ways.
9282
9283 @itemize @bullet
9284 @item
9285 For those languages that don't use GNU gettext, the @file{intl/} directory
9286 is not needed and can be omitted.  This means that the maintainer calls the
9287 @code{gettextize} program without the @samp{--intl} option, and that he
9288 invokes the @code{AM_GNU_GETTEXT} autoconf macro via
9289 @samp{AM_GNU_GETTEXT([external])}.
9290
9291 @item
9292 If only a single programming language is used, the @code{XGETTEXT_OPTIONS}
9293 variable in @file{po/Makevars} (@pxref{po/Makevars}) should be adjusted to
9294 match the @code{xgettext} options for that particular programming language.
9295 If the package uses more than one programming language with @code{gettext}
9296 support, it becomes necessary to change the POT file construction rule
9297 in @file{po/Makefile.in.in}.  It is recommended to make one @code{xgettext}
9298 invocation per programming language, each with the options appropriate for
9299 that language, and to combine the resulting files using @code{msgcat}.
9300 @end itemize
9301
9302 @node List of Programming Languages, List of Data Formats, Maintainers for other Languages, Programming Languages
9303 @section Individual Programming Languages
9304
9305 @c Here is a list of programming languages, as used for Free Software projects
9306 @c on SourceForge/Freshmeat, as of February 2002.  Those supported by gettext
9307 @c are marked with a star.
9308 @c   C                       3580     *
9309 @c   Perl                    1911     *
9310 @c   C++                     1379     *
9311 @c   Java                    1200     *
9312 @c   PHP                     1051     *
9313 @c   Python                   613     *
9314 @c   Unix Shell               357     *
9315 @c   Tcl                      266     *
9316 @c   SQL                      174
9317 @c   JavaScript               118
9318 @c   Assembly                 108
9319 @c   Scheme                    51
9320 @c   Ruby                      47
9321 @c   Lisp                      45     *
9322 @c   Objective C               39     *
9323 @c   PL/SQL                    29
9324 @c   Fortran                   25
9325 @c   Ada                       24
9326 @c   Delphi                    22
9327 @c   Awk                       19     *
9328 @c   Pascal                    19
9329 @c   ML                        19
9330 @c   Eiffel                    17
9331 @c   Emacs-Lisp                14     *
9332 @c   Zope                      14
9333 @c   ASP                       12
9334 @c   Forth                     12
9335 @c   Cold Fusion               10
9336 @c   Haskell                    9
9337 @c   Visual Basic               9
9338 @c   C#                         6     *
9339 @c   Smalltalk                  6     *
9340 @c   Basic                      5
9341 @c   Erlang                     5
9342 @c   Modula                     5
9343 @c   Object Pascal              5     *
9344 @c   Rexx                       5
9345 @c   Dylan                      4
9346 @c   Prolog                     4
9347 @c   APL                        3
9348 @c   PROGRESS                   2
9349 @c   Euler                      1
9350 @c   Euphoria                   1
9351 @c   Pliant                     1
9352 @c   Simula                     1
9353 @c   XBasic                     1
9354 @c   Logo                       0
9355 @c   Other Scripting Engines   49
9356 @c   Other                    116
9357
9358 @menu
9359 * C::                           C, C++, Objective C
9360 * sh::                          sh - Shell Script
9361 * bash::                        bash - Bourne-Again Shell Script
9362 * Python::                      Python
9363 * Common Lisp::                 GNU clisp - Common Lisp
9364 * clisp C::                     GNU clisp C sources
9365 * Emacs Lisp::                  Emacs Lisp
9366 * librep::                      librep
9367 * Scheme::                      GNU guile - Scheme
9368 * Smalltalk::                   GNU Smalltalk
9369 * Java::                        Java
9370 * C#::                          C#
9371 * gawk::                        GNU awk
9372 * Pascal::                      Pascal - Free Pascal Compiler
9373 * wxWidgets::                   wxWidgets library
9374 * YCP::                         YCP - YaST2 scripting language
9375 * Tcl::                         Tcl - Tk's scripting language
9376 * Perl::                        Perl
9377 * PHP::                         PHP Hypertext Preprocessor
9378 * Pike::                        Pike
9379 * GCC-source::                  GNU Compiler Collection sources
9380 * Lua::                         Lua
9381 * JavaScript::                  JavaScript
9382 @end menu
9383
9384 @node C, sh, List of Programming Languages, List of Programming Languages
9385 @subsection C, C++, Objective C
9386 @cindex C and C-like languages
9387
9388 @table @asis
9389 @item RPMs
9390 gcc, gpp, gobjc, glibc, gettext
9391
9392 @item File extension
9393 For C: @code{c}, @code{h}.
9394 @*For C++: @code{C}, @code{c++}, @code{cc}, @code{cxx}, @code{cpp}, @code{hpp}.
9395 @*For Objective C: @code{m}.
9396
9397 @item String syntax
9398 @code{"abc"}
9399
9400 @item gettext shorthand
9401 @code{_("abc")}
9402
9403 @item gettext/ngettext functions
9404 @code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext},
9405 @code{dngettext}, @code{dcngettext}
9406
9407 @item textdomain
9408 @code{textdomain} function
9409
9410 @item bindtextdomain
9411 @code{bindtextdomain} function
9412
9413 @item setlocale
9414 Programmer must call @code{setlocale (LC_ALL, "")}
9415
9416 @item Prerequisite
9417 @code{#include <libintl.h>}
9418 @*@code{#include <locale.h>}
9419 @*@code{#define _(string) gettext (string)}
9420
9421 @item Use or emulate GNU gettext
9422 Use
9423
9424 @item Extractor
9425 @code{xgettext -k_}
9426
9427 @item Formatting with positions
9428 @code{fprintf "%2$d %1$d"}
9429 @*In C++: @code{autosprintf "%2$d %1$d"}
9430 (@pxref{Top, , Introduction, autosprintf, GNU autosprintf})
9431
9432 @item Portability
9433 autoconf (gettext.m4) and #if ENABLE_NLS
9434
9435 @item po-mode marking
9436 yes
9437 @end table
9438
9439 The following examples are available in the @file{examples} directory:
9440 @code{hello-c}, @code{hello-c-gnome}, @code{hello-c++}, @code{hello-c++-qt},
9441 @code{hello-c++-kde}, @code{hello-c++-gnome}, @code{hello-c++-wxwidgets},
9442 @code{hello-objc}, @code{hello-objc-gnustep}, @code{hello-objc-gnome}.
9443
9444 @node sh, bash, C, List of Programming Languages
9445 @subsection sh - Shell Script
9446 @cindex shell scripts
9447
9448 @table @asis
9449 @item RPMs
9450 bash, gettext
9451
9452 @item File extension
9453 @code{sh}
9454
9455 @item String syntax
9456 @code{"abc"}, @code{'abc'}, @code{abc}
9457
9458 @item gettext shorthand
9459 @code{"`gettext \"abc\"`"}
9460
9461 @item gettext/ngettext functions
9462 @pindex gettext
9463 @pindex ngettext
9464 @code{gettext}, @code{ngettext} programs
9465 @*@code{eval_gettext}, @code{eval_ngettext} shell functions
9466
9467 @item textdomain
9468 @vindex TEXTDOMAIN@r{, environment variable}
9469 environment variable @code{TEXTDOMAIN}
9470
9471 @item bindtextdomain
9472 @vindex TEXTDOMAINDIR@r{, environment variable}
9473 environment variable @code{TEXTDOMAINDIR}
9474
9475 @item setlocale
9476 automatic
9477
9478 @item Prerequisite
9479 @code{. gettext.sh}
9480
9481 @item Use or emulate GNU gettext
9482 use
9483
9484 @item Extractor
9485 @code{xgettext}
9486
9487 @item Formatting with positions
9488 ---
9489
9490 @item Portability
9491 fully portable
9492
9493 @item po-mode marking
9494 ---
9495 @end table
9496
9497 An example is available in the @file{examples} directory: @code{hello-sh}.
9498
9499 @menu
9500 * Preparing Shell Scripts::     Preparing Shell Scripts for Internationalization
9501 * gettext.sh::                  Contents of @code{gettext.sh}
9502 * gettext Invocation::          Invoking the @code{gettext} program
9503 * ngettext Invocation::         Invoking the @code{ngettext} program
9504 * envsubst Invocation::         Invoking the @code{envsubst} program
9505 * eval_gettext Invocation::     Invoking the @code{eval_gettext} function
9506 * eval_ngettext Invocation::    Invoking the @code{eval_ngettext} function
9507 @end menu
9508
9509 @node Preparing Shell Scripts, gettext.sh, sh, sh
9510 @subsubsection Preparing Shell Scripts for Internationalization
9511 @cindex preparing shell scripts for translation
9512
9513 Preparing a shell script for internationalization is conceptually similar
9514 to the steps described in @ref{Sources}.  The concrete steps for shell
9515 scripts are as follows.
9516
9517 @enumerate
9518 @item
9519 Insert the line
9520
9521 @smallexample
9522 . gettext.sh
9523 @end smallexample
9524
9525 near the top of the script.  @code{gettext.sh} is a shell function library
9526 that provides the functions
9527 @code{eval_gettext} (see @ref{eval_gettext Invocation}) and
9528 @code{eval_ngettext} (see @ref{eval_ngettext Invocation}).
9529 You have to ensure that @code{gettext.sh} can be found in the @code{PATH}.
9530
9531 @item
9532 Set and export the @code{TEXTDOMAIN} and @code{TEXTDOMAINDIR} environment
9533 variables.  Usually @code{TEXTDOMAIN} is the package or program name, and
9534 @code{TEXTDOMAINDIR} is the absolute pathname corresponding to
9535 @code{$prefix/share/locale}, where @code{$prefix} is the installation location.
9536
9537 @smallexample
9538 TEXTDOMAIN=@@PACKAGE@@
9539 export TEXTDOMAIN
9540 TEXTDOMAINDIR=@@LOCALEDIR@@
9541 export TEXTDOMAINDIR
9542 @end smallexample
9543
9544 @item
9545 Prepare the strings for translation, as described in @ref{Preparing Strings}.
9546
9547 @item
9548 Simplify translatable strings so that they don't contain command substitution
9549 (@code{"`...`"} or @code{"$(...)"}), variable access with defaulting (like
9550 @code{$@{@var{variable}-@var{default}@}}), access to positional arguments
9551 (like @code{$0}, @code{$1}, ...) or highly volatile shell variables (like
9552 @code{$?}). This can always be done through simple local code restructuring.
9553 For example,
9554
9555 @smallexample
9556 echo "Usage: $0 [OPTION] FILE..."
9557 @end smallexample
9558
9559 becomes
9560
9561 @smallexample
9562 program_name=$0
9563 echo "Usage: $program_name [OPTION] FILE..."
9564 @end smallexample
9565
9566 Similarly,
9567
9568 @smallexample
9569 echo "Remaining files: `ls | wc -l`"
9570 @end smallexample
9571
9572 becomes
9573
9574 @smallexample
9575 filecount="`ls | wc -l`"
9576 echo "Remaining files: $filecount"
9577 @end smallexample
9578
9579 @item
9580 For each translatable string, change the output command @samp{echo} or
9581 @samp{$echo} to @samp{gettext} (if the string contains no references to
9582 shell variables) or to @samp{eval_gettext} (if it refers to shell variables),
9583 followed by a no-argument @samp{echo} command (to account for the terminating
9584 newline). Similarly, for cases with plural handling, replace a conditional
9585 @samp{echo} command with an invocation of @samp{ngettext} or
9586 @samp{eval_ngettext}, followed by a no-argument @samp{echo} command.
9587
9588 When doing this, you also need to add an extra backslash before the dollar
9589 sign in references to shell variables, so that the @samp{eval_gettext}
9590 function receives the translatable string before the variable values are
9591 substituted into it. For example,
9592
9593 @smallexample
9594 echo "Remaining files: $filecount"
9595 @end smallexample
9596
9597 becomes
9598
9599 @smallexample
9600 eval_gettext "Remaining files: \$filecount"; echo
9601 @end smallexample
9602
9603 If the output command is not @samp{echo}, you can make it use @samp{echo}
9604 nevertheless, through the use of backquotes. However, note that inside
9605 backquotes, backslashes must be doubled to be effective (because the
9606 backquoting eats one level of backslashes). For example, assuming that
9607 @samp{error} is a shell function that signals an error,
9608
9609 @smallexample
9610 error "file not found: $filename"
9611 @end smallexample
9612
9613 is first transformed into
9614
9615 @smallexample
9616 error "`echo \"file not found: \$filename\"`"
9617 @end smallexample
9618
9619 which then becomes
9620
9621 @smallexample
9622 error "`eval_gettext \"file not found: \\\$filename\"`"
9623 @end smallexample
9624 @end enumerate
9625
9626 @node gettext.sh, gettext Invocation, Preparing Shell Scripts, sh
9627 @subsubsection Contents of @code{gettext.sh}
9628
9629 @code{gettext.sh}, contained in the run-time package of GNU gettext, provides
9630 the following:
9631
9632 @itemize @bullet
9633 @item $echo
9634 The variable @code{echo} is set to a command that outputs its first argument
9635 and a newline, without interpreting backslashes in the argument string.
9636
9637 @item eval_gettext
9638 See @ref{eval_gettext Invocation}.
9639
9640 @item eval_ngettext
9641 See @ref{eval_ngettext Invocation}.
9642 @end itemize
9643
9644 @node gettext Invocation, ngettext Invocation, gettext.sh, sh
9645 @subsubsection Invoking the @code{gettext} program
9646
9647 @include rt-gettext.texi
9648
9649 Note: @code{xgettext} supports only the one-argument form of the
9650 @code{gettext} invocation, where no options are present and the
9651 @var{textdomain} is implicit, from the environment.
9652
9653 @node ngettext Invocation, envsubst Invocation, gettext Invocation, sh
9654 @subsubsection Invoking the @code{ngettext} program
9655
9656 @include rt-ngettext.texi
9657
9658 Note: @code{xgettext} supports only the three-arguments form of the
9659 @code{ngettext} invocation, where no options are present and the
9660 @var{textdomain} is implicit, from the environment.
9661
9662 @node envsubst Invocation, eval_gettext Invocation, ngettext Invocation, sh
9663 @subsubsection Invoking the @code{envsubst} program
9664
9665 @include rt-envsubst.texi
9666
9667 @node eval_gettext Invocation, eval_ngettext Invocation, envsubst Invocation, sh
9668 @subsubsection Invoking the @code{eval_gettext} function
9669
9670 @cindex @code{eval_gettext} function, usage
9671 @example
9672 eval_gettext @var{msgid}
9673 @end example
9674
9675 @cindex lookup message translation
9676 This function outputs the native language translation of a textual message,
9677 performing dollar-substitution on the result.  Note that only shell variables
9678 mentioned in @var{msgid} will be dollar-substituted in the result.
9679
9680 @node eval_ngettext Invocation,  , eval_gettext Invocation, sh
9681 @subsubsection Invoking the @code{eval_ngettext} function
9682
9683 @cindex @code{eval_ngettext} function, usage
9684 @example
9685 eval_ngettext @var{msgid} @var{msgid-plural} @var{count}
9686 @end example
9687
9688 @cindex lookup plural message translation
9689 This function outputs the native language translation of a textual message
9690 whose grammatical form depends on a number, performing dollar-substitution
9691 on the result.  Note that only shell variables mentioned in @var{msgid} or
9692 @var{msgid-plural} will be dollar-substituted in the result.
9693
9694 @node bash, Python, sh, List of Programming Languages
9695 @subsection bash - Bourne-Again Shell Script
9696 @cindex bash
9697
9698 GNU @code{bash} 2.0 or newer has a special shorthand for translating a
9699 string and substituting variable values in it: @code{$"msgid"}.  But
9700 the use of this construct is @strong{discouraged}, due to the security
9701 holes it opens and due to its portability problems.
9702
9703 The security holes of @code{$"..."} come from the fact that after looking up
9704 the translation of the string, @code{bash} processes it like it processes
9705 any double-quoted string: dollar and backquote processing, like @samp{eval}
9706 does.
9707
9708 @enumerate
9709 @item
9710 In a locale whose encoding is one of BIG5, BIG5-HKSCS, GBK, GB18030, SHIFT_JIS,
9711 JOHAB, some double-byte characters have a second byte whose value is
9712 @code{0x60}.  For example, the byte sequence @code{\xe0\x60} is a single
9713 character in these locales.  Many versions of @code{bash} (all versions
9714 up to bash-2.05, and newer versions on platforms without @code{mbsrtowcs()}
9715 function) don't know about character boundaries and see a backquote character
9716 where there is only a particular Chinese character.  Thus it can start
9717 executing part of the translation as a command list.  This situation can occur
9718 even without the translator being aware of it: if the translator provides
9719 translations in the UTF-8 encoding, it is the @code{gettext()} function which
9720 will, during its conversion from the translator's encoding to the user's
9721 locale's encoding, produce the dangerous @code{\x60} bytes.
9722
9723 @item
9724 A translator could - voluntarily or inadvertently - use backquotes
9725 @code{"`...`"} or dollar-parentheses @code{"$(...)"} in her translations.
9726 The enclosed strings would be executed as command lists by the shell.
9727 @end enumerate
9728
9729 The portability problem is that @code{bash} must be built with
9730 internationalization support; this is normally not the case on systems
9731 that don't have the @code{gettext()} function in libc.
9732
9733 @node Python, Common Lisp, bash, List of Programming Languages
9734 @subsection Python
9735 @cindex Python
9736
9737 @table @asis
9738 @item RPMs
9739 python
9740
9741 @item File extension
9742 @code{py}
9743
9744 @item String syntax
9745 @code{'abc'}, @code{u'abc'}, @code{r'abc'}, @code{ur'abc'},
9746 @*@code{"abc"}, @code{u"abc"}, @code{r"abc"}, @code{ur"abc"},
9747 @*@code{'''abc'''}, @code{u'''abc'''}, @code{r'''abc'''}, @code{ur'''abc'''},
9748 @*@code{"""abc"""}, @code{u"""abc"""}, @code{r"""abc"""}, @code{ur"""abc"""}
9749
9750 @item gettext shorthand
9751 @code{_('abc')} etc.
9752
9753 @item gettext/ngettext functions
9754 @code{gettext.gettext}, @code{gettext.dgettext},
9755 @code{gettext.ngettext}, @code{gettext.dngettext},
9756 also @code{ugettext}, @code{ungettext}
9757
9758 @item textdomain
9759 @code{gettext.textdomain} function, or
9760 @code{gettext.install(@var{domain})} function
9761
9762 @item bindtextdomain
9763 @code{gettext.bindtextdomain} function, or
9764 @code{gettext.install(@var{domain},@var{localedir})} function
9765
9766 @item setlocale
9767 not used by the gettext emulation
9768
9769 @item Prerequisite
9770 @code{import gettext}
9771
9772 @item Use or emulate GNU gettext
9773 emulate
9774
9775 @item Extractor
9776 @code{xgettext}
9777
9778 @item Formatting with positions
9779 @code{'...%(ident)d...' % @{ 'ident': value @}}
9780
9781 @item Portability
9782 fully portable
9783
9784 @item po-mode marking
9785 ---
9786 @end table
9787
9788 An example is available in the @file{examples} directory: @code{hello-python}.
9789
9790 A note about format strings: Python supports format strings with unnamed
9791 arguments, such as @code{'...%d...'}, and format strings with named arguments,
9792 such as @code{'...%(ident)d...'}.  The latter are preferable for
9793 internationalized programs, for two reasons:
9794
9795 @itemize @bullet
9796 @item
9797 When a format string takes more than one argument, the translator can provide
9798 a translation that uses the arguments in a different order, if the format
9799 string uses named arguments.  For example, the translator can reformulate
9800 @smallexample
9801 "'%(volume)s' has only %(freespace)d bytes free."
9802 @end smallexample
9803 @noindent
9804 to
9805 @smallexample
9806 "Only %(freespace)d bytes free on '%(volume)s'."
9807 @end smallexample
9808 @noindent
9809 Additionally, the identifiers also provide some context to the translator.
9810
9811 @item
9812 In the context of plural forms, the format string used for the singular form
9813 does not use the numeric argument in many languages.  Even in English, one
9814 prefers to write @code{"one hour"} instead of @code{"1 hour"}.  Omitting
9815 individual arguments from format strings like this is only possible with
9816 the named argument syntax.  (With unnamed arguments, Python -- unlike C --
9817 verifies that the format string uses all supplied arguments.)
9818 @end itemize
9819
9820 @node Common Lisp, clisp C, Python, List of Programming Languages
9821 @subsection GNU clisp - Common Lisp
9822 @cindex Common Lisp
9823 @cindex Lisp
9824 @cindex clisp
9825
9826 @table @asis
9827 @item RPMs
9828 clisp 2.28 or newer
9829
9830 @item File extension
9831 @code{lisp}
9832
9833 @item String syntax
9834 @code{"abc"}
9835
9836 @item gettext shorthand
9837 @code{(_ "abc")}, @code{(ENGLISH "abc")}
9838
9839 @item gettext/ngettext functions
9840 @code{i18n:gettext}, @code{i18n:ngettext}
9841
9842 @item textdomain
9843 @code{i18n:textdomain}
9844
9845 @item bindtextdomain
9846 @code{i18n:textdomaindir}
9847
9848 @item setlocale
9849 automatic
9850
9851 @item Prerequisite
9852 ---
9853
9854 @item Use or emulate GNU gettext
9855 use
9856
9857 @item Extractor
9858 @code{xgettext -k_ -kENGLISH}
9859
9860 @item Formatting with positions
9861 @code{format "~1@@*~D ~0@@*~D"}
9862
9863 @item Portability
9864 On platforms without gettext, no translation.
9865
9866 @item po-mode marking
9867 ---
9868 @end table
9869
9870 An example is available in the @file{examples} directory: @code{hello-clisp}.
9871
9872 @node clisp C, Emacs Lisp, Common Lisp, List of Programming Languages
9873 @subsection GNU clisp C sources
9874 @cindex clisp C sources
9875
9876 @table @asis
9877 @item RPMs
9878 clisp
9879
9880 @item File extension
9881 @code{d}
9882
9883 @item String syntax
9884 @code{"abc"}
9885
9886 @item gettext shorthand
9887 @code{ENGLISH ? "abc" : ""}
9888 @*@code{GETTEXT("abc")}
9889 @*@code{GETTEXTL("abc")}
9890
9891 @item gettext/ngettext functions
9892 @code{clgettext}, @code{clgettextl}
9893
9894 @item textdomain
9895 ---
9896
9897 @item bindtextdomain
9898 ---
9899
9900 @item setlocale
9901 automatic
9902
9903 @item Prerequisite
9904 @code{#include "lispbibl.c"}
9905
9906 @item Use or emulate GNU gettext
9907 use
9908
9909 @item Extractor
9910 @code{clisp-xgettext}
9911
9912 @item Formatting with positions
9913 @code{fprintf "%2$d %1$d"}
9914
9915 @item Portability
9916 On platforms without gettext, no translation.
9917
9918 @item po-mode marking
9919 ---
9920 @end table
9921
9922 @node Emacs Lisp, librep, clisp C, List of Programming Languages
9923 @subsection Emacs Lisp
9924 @cindex Emacs Lisp
9925
9926 @table @asis
9927 @item RPMs
9928 emacs, xemacs
9929
9930 @item File extension
9931 @code{el}
9932
9933 @item String syntax
9934 @code{"abc"}
9935
9936 @item gettext shorthand
9937 @code{(_"abc")}
9938
9939 @item gettext/ngettext functions
9940 @code{gettext}, @code{dgettext} (xemacs only)
9941
9942 @item textdomain
9943 @code{domain} special form (xemacs only)
9944
9945 @item bindtextdomain
9946 @code{bind-text-domain} function (xemacs only)
9947
9948 @item setlocale
9949 automatic
9950
9951 @item Prerequisite
9952 ---
9953
9954 @item Use or emulate GNU gettext
9955 use
9956
9957 @item Extractor
9958 @code{xgettext}
9959
9960 @item Formatting with positions
9961 @code{format "%2$d %1$d"}
9962
9963 @item Portability
9964 Only XEmacs.  Without @code{I18N3} defined at build time, no translation.
9965
9966 @item po-mode marking
9967 ---
9968 @end table
9969
9970 @node librep, Scheme, Emacs Lisp, List of Programming Languages
9971 @subsection librep
9972 @cindex @code{librep} Lisp
9973
9974 @table @asis
9975 @item RPMs
9976 librep 0.15.3 or newer
9977
9978 @item File extension
9979 @code{jl}
9980
9981 @item String syntax
9982 @code{"abc"}
9983
9984 @item gettext shorthand
9985 @code{(_"abc")}
9986
9987 @item gettext/ngettext functions
9988 @code{gettext}
9989
9990 @item textdomain
9991 @code{textdomain} function
9992
9993 @item bindtextdomain
9994 @code{bindtextdomain} function
9995
9996 @item setlocale
9997 ---
9998
9999 @item Prerequisite
10000 @code{(require 'rep.i18n.gettext)}
10001
10002 @item Use or emulate GNU gettext
10003 use
10004
10005 @item Extractor
10006 @code{xgettext}
10007
10008 @item Formatting with positions
10009 @code{format "%2$d %1$d"}
10010
10011 @item Portability
10012 On platforms without gettext, no translation.
10013
10014 @item po-mode marking
10015 ---
10016 @end table
10017
10018 An example is available in the @file{examples} directory: @code{hello-librep}.
10019
10020 @node Scheme, Smalltalk, librep, List of Programming Languages
10021 @subsection GNU guile - Scheme
10022 @cindex Scheme
10023 @cindex guile
10024
10025 @table @asis
10026 @item RPMs
10027 guile
10028
10029 @item File extension
10030 @code{scm}
10031
10032 @item String syntax
10033 @code{"abc"}
10034
10035 @item gettext shorthand
10036 @code{(_ "abc")}
10037
10038 @item gettext/ngettext functions
10039 @code{gettext}, @code{ngettext}
10040
10041 @item textdomain
10042 @code{textdomain}
10043
10044 @item bindtextdomain
10045 @code{bindtextdomain}
10046
10047 @item setlocale
10048 @code{(catch #t (lambda () (setlocale LC_ALL "")) (lambda args #f))}
10049
10050 @item Prerequisite
10051 @code{(use-modules (ice-9 format))}
10052
10053 @item Use or emulate GNU gettext
10054 use
10055
10056 @item Extractor
10057 @code{xgettext -k_}
10058
10059 @item Formatting with positions
10060 @c @code{format "~1@@*~D ~0@@*~D~2@@*"}, requires @code{(use-modules (ice-9 format))}
10061 @c not yet supported
10062 ---
10063
10064 @item Portability
10065 On platforms without gettext, no translation.
10066
10067 @item po-mode marking
10068 ---
10069 @end table
10070
10071 An example is available in the @file{examples} directory: @code{hello-guile}.
10072
10073 @node Smalltalk, Java, Scheme, List of Programming Languages
10074 @subsection GNU Smalltalk
10075 @cindex Smalltalk
10076
10077 @table @asis
10078 @item RPMs
10079 smalltalk
10080
10081 @item File extension
10082 @code{st}
10083
10084 @item String syntax
10085 @code{'abc'}
10086
10087 @item gettext shorthand
10088 @code{NLS ? 'abc'}
10089
10090 @item gettext/ngettext functions
10091 @code{LcMessagesDomain>>#at:}, @code{LcMessagesDomain>>#at:plural:with:}
10092
10093 @item textdomain
10094 @code{LcMessages>>#domain:localeDirectory:} (returns a @code{LcMessagesDomain}
10095 object).@*
10096 Example: @code{I18N Locale default messages domain: 'gettext' localeDirectory: /usr/local/share/locale'}
10097
10098 @item bindtextdomain
10099 @code{LcMessages>>#domain:localeDirectory:}, see above.
10100
10101 @item setlocale
10102 Automatic if you use @code{I18N Locale default}.
10103
10104 @item Prerequisite
10105 @code{PackageLoader fileInPackage: 'I18N'!}
10106
10107 @item Use or emulate GNU gettext
10108 emulate
10109
10110 @item Extractor
10111 @code{xgettext}
10112
10113 @item Formatting with positions
10114 @code{'%1 %2' bindWith: 'Hello' with: 'world'}
10115
10116 @item Portability
10117 fully portable
10118
10119 @item po-mode marking
10120 ---
10121 @end table
10122
10123 An example is available in the @file{examples} directory:
10124 @code{hello-smalltalk}.
10125
10126 @node Java, C#, Smalltalk, List of Programming Languages
10127 @subsection Java
10128 @cindex Java
10129
10130 @table @asis
10131 @item RPMs
10132 java, java2
10133
10134 @item File extension
10135 @code{java}
10136
10137 @item String syntax
10138 "abc"
10139
10140 @item gettext shorthand
10141 _("abc")
10142
10143 @item gettext/ngettext functions
10144 @code{GettextResource.gettext}, @code{GettextResource.ngettext},
10145 @code{GettextResource.pgettext}, @code{GettextResource.npgettext}
10146
10147 @item textdomain
10148 ---, use @code{ResourceBundle.getResource} instead
10149
10150 @item bindtextdomain
10151 ---, use CLASSPATH instead
10152
10153 @item setlocale
10154 automatic
10155
10156 @item Prerequisite
10157 ---
10158
10159 @item Use or emulate GNU gettext
10160 ---, uses a Java specific message catalog format
10161
10162 @item Extractor
10163 @code{xgettext -k_}
10164
10165 @item Formatting with positions
10166 @code{MessageFormat.format "@{1,number@} @{0,number@}"}
10167
10168 @item Portability
10169 fully portable
10170
10171 @item po-mode marking
10172 ---
10173 @end table
10174
10175 Before marking strings as internationalizable, uses of the string
10176 concatenation operator need to be converted to @code{MessageFormat}
10177 applications.  For example, @code{"file "+filename+" not found"} becomes
10178 @code{MessageFormat.format("file @{0@} not found", new Object[] @{ filename @})}.
10179 Only after this is done, can the strings be marked and extracted.
10180
10181 GNU gettext uses the native Java internationalization mechanism, namely
10182 @code{ResourceBundle}s.  There are two formats of @code{ResourceBundle}s:
10183 @code{.properties} files and @code{.class} files.  The @code{.properties}
10184 format is a text file which the translators can directly edit, like PO
10185 files, but which doesn't support plural forms.  Whereas the @code{.class}
10186 format is compiled from @code{.java} source code and can support plural
10187 forms (provided it is accessed through an appropriate API, see below).
10188
10189 To convert a PO file to a @code{.properties} file, the @code{msgcat}
10190 program can be used with the option @code{--properties-output}.  To convert
10191 a @code{.properties} file back to a PO file, the @code{msgcat} program
10192 can be used with the option @code{--properties-input}.  All the tools
10193 that manipulate PO files can work with @code{.properties} files as well,
10194 if given the @code{--properties-input} and/or @code{--properties-output}
10195 option.
10196
10197 To convert a PO file to a ResourceBundle class, the @code{msgfmt} program
10198 can be used with the option @code{--java} or @code{--java2}.  To convert a
10199 ResourceBundle back to a PO file, the @code{msgunfmt} program can be used
10200 with the option @code{--java}.
10201
10202 Two different programmatic APIs can be used to access ResourceBundles.
10203 Note that both APIs work with all kinds of ResourceBundles, whether
10204 GNU gettext generated classes, or other @code{.class} or @code{.properties}
10205 files.
10206
10207 @enumerate
10208 @item
10209 The @code{java.util.ResourceBundle} API.
10210
10211 In particular, its @code{getString} function returns a string translation.
10212 Note that a missing translation yields a @code{MissingResourceException}.
10213
10214 This has the advantage of being the standard API.  And it does not require
10215 any additional libraries, only the @code{msgcat} generated @code{.properties}
10216 files or the @code{msgfmt} generated @code{.class} files.  But it cannot do
10217 plural handling, even if the resource was generated by @code{msgfmt} from
10218 a PO file with plural handling.
10219
10220 @item
10221 The @code{gnu.gettext.GettextResource} API.
10222
10223 Reference documentation in Javadoc 1.1 style format is in the
10224 @uref{javadoc2/index.html,javadoc2 directory}.
10225
10226 Its @code{gettext} function returns a string translation.  Note that when
10227 a translation is missing, the @var{msgid} argument is returned unchanged.
10228
10229 This has the advantage of having the @code{ngettext} function for plural
10230 handling and the @code{pgettext} and @code{npgettext} for strings constraint
10231 to a particular context.
10232
10233 @cindex @code{libintl} for Java
10234 To use this API, one needs the @code{libintl.jar} file which is part of
10235 the GNU gettext package and distributed under the LGPL.
10236 @end enumerate
10237
10238 Four examples, using the second API, are available in the @file{examples}
10239 directory: @code{hello-java}, @code{hello-java-awt}, @code{hello-java-swing},
10240 @code{hello-java-qtjambi}.
10241
10242 Now, to make use of the API and define a shorthand for @samp{getString},
10243 there are three idioms that you can choose from:
10244
10245 @itemize @bullet
10246 @item
10247 (This one assumes Java 1.5 or newer.)
10248 In a unique class of your project, say @samp{Util}, define a static variable
10249 holding the @code{ResourceBundle} instance and the shorthand:
10250
10251 @smallexample
10252 private static ResourceBundle myResources =
10253   ResourceBundle.getBundle("domain-name");
10254 public static String _(String s) @{
10255   return myResources.getString(s);
10256 @}
10257 @end smallexample
10258
10259 All classes containing internationalized strings then contain
10260
10261 @smallexample
10262 import static Util._;
10263 @end smallexample
10264
10265 @noindent
10266 and the shorthand is used like this:
10267
10268 @smallexample
10269 System.out.println(_("Operation completed."));
10270 @end smallexample
10271
10272 @item
10273 In a unique class of your project, say @samp{Util}, define a static variable
10274 holding the @code{ResourceBundle} instance:
10275
10276 @smallexample
10277 public static ResourceBundle myResources =
10278   ResourceBundle.getBundle("domain-name");
10279 @end smallexample
10280
10281 All classes containing internationalized strings then contain
10282
10283 @smallexample
10284 private static ResourceBundle res = Util.myResources;
10285 private static String _(String s) @{ return res.getString(s); @}
10286 @end smallexample
10287
10288 @noindent
10289 and the shorthand is used like this:
10290
10291 @smallexample
10292 System.out.println(_("Operation completed."));
10293 @end smallexample
10294
10295 @item
10296 You add a class with a very short name, say @samp{S}, containing just the
10297 definition of the resource bundle and of the shorthand:
10298
10299 @smallexample
10300 public class S @{
10301   public static ResourceBundle myResources =
10302     ResourceBundle.getBundle("domain-name");
10303   public static String _(String s) @{
10304     return myResources.getString(s);
10305   @}
10306 @}
10307 @end smallexample
10308
10309 @noindent
10310 and the shorthand is used like this:
10311
10312 @smallexample
10313 System.out.println(S._("Operation completed."));
10314 @end smallexample
10315 @end itemize
10316
10317 Which of the three idioms you choose, will depend on whether your project
10318 requires portability to Java versions prior to Java 1.5 and, if so, whether
10319 copying two lines of codes into every class is more acceptable in your project
10320 than a class with a single-letter name.
10321
10322 @node C#, gawk, Java, List of Programming Languages
10323 @subsection C#
10324 @cindex C#
10325
10326 @table @asis
10327 @item RPMs
10328 pnet, pnetlib 0.6.2 or newer, or mono 0.29 or newer
10329
10330 @item File extension
10331 @code{cs}
10332
10333 @item String syntax
10334 @code{"abc"}, @code{@@"abc"}
10335
10336 @item gettext shorthand
10337 _("abc")
10338
10339 @item gettext/ngettext functions
10340 @code{GettextResourceManager.GetString},
10341 @code{GettextResourceManager.GetPluralString}
10342 @code{GettextResourceManager.GetParticularString}
10343 @code{GettextResourceManager.GetParticularPluralString}
10344
10345 @item textdomain
10346 @code{new GettextResourceManager(domain)}
10347
10348 @item bindtextdomain
10349 ---, compiled message catalogs are located in subdirectories of the directory
10350 containing the executable
10351
10352 @item setlocale
10353 automatic
10354
10355 @item Prerequisite
10356 ---
10357
10358 @item Use or emulate GNU gettext
10359 ---, uses a C# specific message catalog format
10360
10361 @item Extractor
10362 @code{xgettext -k_}
10363
10364 @item Formatting with positions
10365 @code{String.Format "@{1@} @{0@}"}
10366
10367 @item Portability
10368 fully portable
10369
10370 @item po-mode marking
10371 ---
10372 @end table
10373
10374 Before marking strings as internationalizable, uses of the string
10375 concatenation operator need to be converted to @code{String.Format}
10376 invocations.  For example, @code{"file "+filename+" not found"} becomes
10377 @code{String.Format("file @{0@} not found", filename)}.
10378 Only after this is done, can the strings be marked and extracted.
10379
10380 GNU gettext uses the native C#/.NET internationalization mechanism, namely
10381 the classes @code{ResourceManager} and @code{ResourceSet}.  Applications
10382 use the @code{ResourceManager} methods to retrieve the native language
10383 translation of strings.  An instance of @code{ResourceSet} is the in-memory
10384 representation of a message catalog file.  The @code{ResourceManager} loads
10385 and accesses @code{ResourceSet} instances as needed to look up the
10386 translations.
10387
10388 There are two formats of @code{ResourceSet}s that can be directly loaded by
10389 the C# runtime: @code{.resources} files and @code{.dll} files.
10390
10391 @itemize @bullet
10392 @item
10393 The @code{.resources} format is a binary file usually generated through the
10394 @code{resgen} or @code{monoresgen} utility, but which doesn't support plural
10395 forms.  @code{.resources} files can also be embedded in .NET @code{.exe} files.
10396 This only affects whether a file system access is performed to load the message
10397 catalog; it doesn't affect the contents of the message catalog.
10398
10399 @item
10400 On the other hand, the @code{.dll} format is a binary file that is compiled
10401 from @code{.cs} source code and can support plural forms (provided it is
10402 accessed through the GNU gettext API, see below).
10403 @end itemize
10404
10405 Note that these .NET @code{.dll} and @code{.exe} files are not tied to a
10406 particular platform; their file format and GNU gettext for C# can be used
10407 on any platform.
10408
10409 To convert a PO file to a @code{.resources} file, the @code{msgfmt} program
10410 can be used with the option @samp{--csharp-resources}.  To convert a
10411 @code{.resources} file back to a PO file, the @code{msgunfmt} program can be
10412 used with the option @samp{--csharp-resources}.  You can also, in some cases,
10413 use the @code{resgen} program (from the @code{pnet} package) or the
10414 @code{monoresgen} program (from the @code{mono}/@code{mcs} package).  These
10415 programs can also convert a @code{.resources} file back to a PO file.  But
10416 beware: as of this writing (January 2004), the @code{monoresgen} converter is
10417 quite buggy and the @code{resgen} converter ignores the encoding of the PO
10418 files.
10419
10420 To convert a PO file to a @code{.dll} file, the @code{msgfmt} program can be
10421 used with the option @code{--csharp}.  The result will be a @code{.dll} file
10422 containing a subclass of @code{GettextResourceSet}, which itself is a subclass
10423 of @code{ResourceSet}.  To convert a @code{.dll} file containing a
10424 @code{GettextResourceSet} subclass back to a PO file, the @code{msgunfmt}
10425 program can be used with the option @code{--csharp}.
10426
10427 The advantages of the @code{.dll} format over the @code{.resources} format
10428 are:
10429
10430 @enumerate
10431 @item
10432 Freedom to localize: Users can add their own translations to an application
10433 after it has been built and distributed.  Whereas when the programmer uses
10434 a @code{ResourceManager} constructor provided by the system, the set of
10435 @code{.resources} files for an application must be specified when the
10436 application is built and cannot be extended afterwards.
10437 @c If this were the only issue with the @code{.resources} format, one could
10438 @c use the @code{ResourceManager.CreateFileBasedResourceManager} function.
10439
10440 @item
10441 Plural handling: A message catalog in @code{.dll} format supports the plural
10442 handling function @code{GetPluralString}.  Whereas @code{.resources} files can
10443 only contain data and only support lookups that depend on a single string.
10444
10445 @item
10446 Context handling: A message catalog in @code{.dll} format supports the
10447 query-with-context functions @code{GetParticularString} and
10448 @code{GetParticularPluralString}.  Whereas @code{.resources} files can
10449 only contain data and only support lookups that depend on a single string.
10450
10451 @item
10452 The @code{GettextResourceManager} that loads the message catalogs in
10453 @code{.dll} format also provides for inheritance on a per-message basis.
10454 For example, in Austrian (@code{de_AT}) locale, translations from the German
10455 (@code{de}) message catalog will be used for messages not found in the
10456 Austrian message catalog.  This has the consequence that the Austrian
10457 translators need only translate those few messages for which the translation
10458 into Austrian differs from the German one.  Whereas when working with
10459 @code{.resources} files, each message catalog must provide the translations
10460 of all messages by itself.
10461
10462 @item
10463 The @code{GettextResourceManager} that loads the message catalogs in
10464 @code{.dll} format also provides for a fallback: The English @var{msgid} is
10465 returned when no translation can be found.  Whereas when working with
10466 @code{.resources} files, a language-neutral @code{.resources} file must
10467 explicitly be provided as a fallback.
10468 @end enumerate
10469
10470 On the side of the programmatic APIs, the programmer can use either the
10471 standard @code{ResourceManager} API and the GNU @code{GettextResourceManager}
10472 API.  The latter is an extension of the former, because
10473 @code{GettextResourceManager} is a subclass of @code{ResourceManager}.
10474
10475 @enumerate
10476 @item
10477 The @code{System.Resources.ResourceManager} API.
10478
10479 This API works with resources in @code{.resources} format.
10480
10481 The creation of the @code{ResourceManager} is done through
10482 @smallexample
10483   new ResourceManager(domainname, Assembly.GetExecutingAssembly())
10484 @end smallexample
10485 @noindent
10486
10487 The @code{GetString} function returns a string's translation.  Note that this
10488 function returns null when a translation is missing (i.e.@: not even found in
10489 the fallback resource file).
10490
10491 @item
10492 The @code{GNU.Gettext.GettextResourceManager} API.
10493
10494 This API works with resources in @code{.dll} format.
10495
10496 Reference documentation is in the
10497 @uref{csharpdoc/index.html,csharpdoc directory}.
10498
10499 The creation of the @code{ResourceManager} is done through
10500 @smallexample
10501   new GettextResourceManager(domainname)
10502 @end smallexample
10503
10504 The @code{GetString} function returns a string's translation.  Note that when
10505 a translation is missing, the @var{msgid} argument is returned unchanged.
10506
10507 The @code{GetPluralString} function returns a string translation with plural
10508 handling, like the @code{ngettext} function in C.
10509
10510 The @code{GetParticularString} function returns a string's translation,
10511 specific to a particular context, like the @code{pgettext} function in C.
10512 Note that when a translation is missing, the @var{msgid} argument is returned
10513 unchanged.
10514
10515 The @code{GetParticularPluralString} function returns a string translation,
10516 specific to a particular context, with plural handling, like the
10517 @code{npgettext} function in C.
10518
10519 @cindex @code{libintl} for C#
10520 To use this API, one needs the @code{GNU.Gettext.dll} file which is part of
10521 the GNU gettext package and distributed under the LGPL.
10522 @end enumerate
10523
10524 You can also mix both approaches: use the
10525 @code{GNU.Gettext.GettextResourceManager} constructor, but otherwise use
10526 only the @code{ResourceManager} type and only the @code{GetString} method.
10527 This is appropriate when you want to profit from the tools for PO files,
10528 but don't want to change an existing source code that uses
10529 @code{ResourceManager} and don't (yet) need the @code{GetPluralString} method.
10530
10531 Two examples, using the second API, are available in the @file{examples}
10532 directory: @code{hello-csharp}, @code{hello-csharp-forms}.
10533
10534 Now, to make use of the API and define a shorthand for @samp{GetString},
10535 there are two idioms that you can choose from:
10536
10537 @itemize @bullet
10538 @item
10539 In a unique class of your project, say @samp{Util}, define a static variable
10540 holding the @code{ResourceManager} instance:
10541
10542 @smallexample
10543 public static GettextResourceManager MyResourceManager =
10544   new GettextResourceManager("domain-name");
10545 @end smallexample
10546
10547 All classes containing internationalized strings then contain
10548
10549 @smallexample
10550 private static GettextResourceManager Res = Util.MyResourceManager;
10551 private static String _(String s) @{ return Res.GetString(s); @}
10552 @end smallexample
10553
10554 @noindent
10555 and the shorthand is used like this:
10556
10557 @smallexample
10558 Console.WriteLine(_("Operation completed."));
10559 @end smallexample
10560
10561 @item
10562 You add a class with a very short name, say @samp{S}, containing just the
10563 definition of the resource manager and of the shorthand:
10564
10565 @smallexample
10566 public class S @{
10567   public static GettextResourceManager MyResourceManager =
10568     new GettextResourceManager("domain-name");
10569   public static String _(String s) @{
10570      return MyResourceManager.GetString(s);
10571   @}
10572 @}
10573 @end smallexample
10574
10575 @noindent
10576 and the shorthand is used like this:
10577
10578 @smallexample
10579 Console.WriteLine(S._("Operation completed."));
10580 @end smallexample
10581 @end itemize
10582
10583 Which of the two idioms you choose, will depend on whether copying two lines
10584 of codes into every class is more acceptable in your project than a class
10585 with a single-letter name.
10586
10587 @node gawk, Pascal, C#, List of Programming Languages
10588 @subsection GNU awk
10589 @cindex awk
10590 @cindex gawk
10591
10592 @table @asis
10593 @item RPMs
10594 gawk 3.1 or newer
10595
10596 @item File extension
10597 @code{awk}
10598
10599 @item String syntax
10600 @code{"abc"}
10601
10602 @item gettext shorthand
10603 @code{_"abc"}
10604
10605 @item gettext/ngettext functions
10606 @code{dcgettext}, missing @code{dcngettext} in gawk-3.1.0
10607
10608 @item textdomain
10609 @code{TEXTDOMAIN} variable
10610
10611 @item bindtextdomain
10612 @code{bindtextdomain} function
10613
10614 @item setlocale
10615 automatic, but missing @code{setlocale (LC_MESSAGES, "")} in gawk-3.1.0
10616
10617 @item Prerequisite
10618 ---
10619
10620 @item Use or emulate GNU gettext
10621 use
10622
10623 @item Extractor
10624 @code{xgettext}
10625
10626 @item Formatting with positions
10627 @code{printf "%2$d %1$d"} (GNU awk only)
10628
10629 @item Portability
10630 On platforms without gettext, no translation.  On non-GNU awks, you must
10631 define @code{dcgettext}, @code{dcngettext} and @code{bindtextdomain}
10632 yourself.
10633
10634 @item po-mode marking
10635 ---
10636 @end table
10637
10638 An example is available in the @file{examples} directory: @code{hello-gawk}.
10639
10640 @node Pascal, wxWidgets, gawk, List of Programming Languages
10641 @subsection Pascal - Free Pascal Compiler
10642 @cindex Pascal
10643 @cindex Free Pascal
10644 @cindex Object Pascal
10645
10646 @table @asis
10647 @item RPMs
10648 fpk
10649
10650 @item File extension
10651 @code{pp}, @code{pas}
10652
10653 @item String syntax
10654 @code{'abc'}
10655
10656 @item gettext shorthand
10657 automatic
10658
10659 @item gettext/ngettext functions
10660 ---, use @code{ResourceString} data type instead
10661
10662 @item textdomain
10663 ---, use @code{TranslateResourceStrings} function instead
10664
10665 @item bindtextdomain
10666 ---, use @code{TranslateResourceStrings} function instead
10667
10668 @item setlocale
10669 automatic, but uses only LANG, not LC_MESSAGES or LC_ALL
10670
10671 @item Prerequisite
10672 @code{@{$mode delphi@}} or @code{@{$mode objfpc@}}@*@code{uses gettext;}
10673
10674 @item Use or emulate GNU gettext
10675 emulate partially
10676
10677 @item Extractor
10678 @code{ppc386} followed by @code{xgettext} or @code{rstconv}
10679
10680 @item Formatting with positions
10681 @code{uses sysutils;}@*@code{format "%1:d %0:d"}
10682
10683 @item Portability
10684 ?
10685
10686 @item po-mode marking
10687 ---
10688 @end table
10689
10690 The Pascal compiler has special support for the @code{ResourceString} data
10691 type.  It generates a @code{.rst} file.  This is then converted to a
10692 @code{.pot} file by use of @code{xgettext} or @code{rstconv}.  At runtime,
10693 a @code{.mo} file corresponding to translations of this @code{.pot} file
10694 can be loaded using the @code{TranslateResourceStrings} function in the
10695 @code{gettext} unit.
10696
10697 An example is available in the @file{examples} directory: @code{hello-pascal}.
10698
10699 @node wxWidgets, YCP, Pascal, List of Programming Languages
10700 @subsection wxWidgets library
10701 @cindex @code{wxWidgets} library
10702
10703 @table @asis
10704 @item RPMs
10705 wxGTK, gettext
10706
10707 @item File extension
10708 @code{cpp}
10709
10710 @item String syntax
10711 @code{"abc"}
10712
10713 @item gettext shorthand
10714 @code{_("abc")}
10715
10716 @item gettext/ngettext functions
10717 @code{wxLocale::GetString}, @code{wxGetTranslation}
10718
10719 @item textdomain
10720 @code{wxLocale::AddCatalog}
10721
10722 @item bindtextdomain
10723 @code{wxLocale::AddCatalogLookupPathPrefix}
10724
10725 @item setlocale
10726 @code{wxLocale::Init}, @code{wxSetLocale}
10727
10728 @item Prerequisite
10729 @code{#include <wx/intl.h>}
10730
10731 @item Use or emulate GNU gettext
10732 emulate, see @code{include/wx/intl.h} and @code{src/common/intl.cpp}
10733
10734 @item Extractor
10735 @code{xgettext}
10736
10737 @item Formatting with positions
10738 wxString::Format supports positions if and only if the system has
10739 @code{wprintf()}, @code{vswprintf()} functions and they support positions
10740 according to POSIX.
10741
10742 @item Portability
10743 fully portable
10744
10745 @item po-mode marking
10746 yes
10747 @end table
10748
10749 @node YCP, Tcl, wxWidgets, List of Programming Languages
10750 @subsection YCP - YaST2 scripting language
10751 @cindex YCP
10752 @cindex YaST2 scripting language
10753
10754 @table @asis
10755 @item RPMs
10756 libycp, libycp-devel, yast2-core, yast2-core-devel
10757
10758 @item File extension
10759 @code{ycp}
10760
10761 @item String syntax
10762 @code{"abc"}
10763
10764 @item gettext shorthand
10765 @code{_("abc")}
10766
10767 @item gettext/ngettext functions
10768 @code{_()} with 1 or 3 arguments
10769
10770 @item textdomain
10771 @code{textdomain} statement
10772
10773 @item bindtextdomain
10774 ---
10775
10776 @item setlocale
10777 ---
10778
10779 @item Prerequisite
10780 ---
10781
10782 @item Use or emulate GNU gettext
10783 use
10784
10785 @item Extractor
10786 @code{xgettext}
10787
10788 @item Formatting with positions
10789 @code{sformat "%2 %1"}
10790
10791 @item Portability
10792 fully portable
10793
10794 @item po-mode marking
10795 ---
10796 @end table
10797
10798 An example is available in the @file{examples} directory: @code{hello-ycp}.
10799
10800 @node Tcl, Perl, YCP, List of Programming Languages
10801 @subsection Tcl - Tk's scripting language
10802 @cindex Tcl
10803 @cindex Tk's scripting language
10804
10805 @table @asis
10806 @item RPMs
10807 tcl
10808
10809 @item File extension
10810 @code{tcl}
10811
10812 @item String syntax
10813 @code{"abc"}
10814
10815 @item gettext shorthand
10816 @code{[_ "abc"]}
10817
10818 @item gettext/ngettext functions
10819 @code{::msgcat::mc}
10820
10821 @item textdomain
10822 ---
10823
10824 @item bindtextdomain
10825 ---, use @code{::msgcat::mcload} instead
10826
10827 @item setlocale
10828 automatic, uses LANG, but ignores LC_MESSAGES and LC_ALL
10829
10830 @item Prerequisite
10831 @code{package require msgcat}
10832 @*@code{proc _ @{s@} @{return [::msgcat::mc $s]@}}
10833
10834 @item Use or emulate GNU gettext
10835 ---, uses a Tcl specific message catalog format
10836
10837 @item Extractor
10838 @code{xgettext -k_}
10839
10840 @item Formatting with positions
10841 @code{format "%2\$d %1\$d"}
10842
10843 @item Portability
10844 fully portable
10845
10846 @item po-mode marking
10847 ---
10848 @end table
10849
10850 Two examples are available in the @file{examples} directory:
10851 @code{hello-tcl}, @code{hello-tcl-tk}.
10852
10853 Before marking strings as internationalizable, substitutions of variables
10854 into the string need to be converted to @code{format} applications.  For
10855 example, @code{"file $filename not found"} becomes
10856 @code{[format "file %s not found" $filename]}.
10857 Only after this is done, can the strings be marked and extracted.
10858 After marking, this example becomes
10859 @code{[format [_ "file %s not found"] $filename]} or
10860 @code{[msgcat::mc "file %s not found" $filename]}.  Note that the
10861 @code{msgcat::mc} function implicitly calls @code{format} when more than one
10862 argument is given.
10863
10864 @node Perl, PHP, Tcl, List of Programming Languages
10865 @subsection Perl
10866 @cindex Perl
10867
10868 @table @asis
10869 @item RPMs
10870 perl
10871
10872 @item File extension
10873 @code{pl}, @code{PL}, @code{pm}, @code{perl}, @code{cgi}
10874
10875 @item String syntax
10876 @itemize @bullet
10877
10878 @item @code{"abc"}
10879
10880 @item @code{'abc'}
10881
10882 @item @code{qq (abc)}
10883
10884 @item @code{q (abc)}
10885
10886 @item @code{qr /abc/}
10887
10888 @item @code{qx (/bin/date)}
10889
10890 @item @code{/pattern match/}
10891
10892 @item @code{?pattern match?}
10893
10894 @item @code{s/substitution/operators/}
10895
10896 @item @code{$tied_hash@{"message"@}}
10897
10898 @item @code{$tied_hash_reference->@{"message"@}}
10899
10900 @item etc., issue the command @samp{man perlsyn} for details
10901
10902 @end itemize
10903
10904 @item gettext shorthand
10905 @code{__} (double underscore)
10906
10907 @item gettext/ngettext functions
10908 @code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext},
10909 @code{dngettext}, @code{dcngettext}
10910
10911 @item textdomain
10912 @code{textdomain} function
10913
10914 @item bindtextdomain
10915 @code{bindtextdomain} function
10916
10917 @item bind_textdomain_codeset
10918 @code{bind_textdomain_codeset} function
10919
10920 @item setlocale
10921 Use @code{setlocale (LC_ALL, "");}
10922
10923 @item Prerequisite
10924 @code{use POSIX;}
10925 @*@code{use Locale::TextDomain;} (included in the package libintl-perl
10926 which is available on the Comprehensive Perl Archive Network CPAN,
10927 http://www.cpan.org/).
10928
10929 @item Use or emulate GNU gettext
10930 platform dependent: gettext_pp emulates, gettext_xs uses GNU gettext
10931
10932 @item Extractor
10933 @code{xgettext -k__ -k\$__ -k%__ -k__x -k__n:1,2 -k__nx:1,2 -k__xn:1,2 -kN__ -k}
10934
10935 @item Formatting with positions
10936 Both kinds of format strings support formatting with positions.
10937 @*@code{printf "%2\$d %1\$d", ...} (requires Perl 5.8.0 or newer)
10938 @*@code{__expand("[new] replaces [old]", old => $oldvalue, new => $newvalue)}
10939
10940 @item Portability
10941 The @code{libintl-perl} package is platform independent but is not
10942 part of the Perl core.  The programmer is responsible for
10943 providing a dummy implementation of the required functions if the
10944 package is not installed on the target system.
10945
10946 @item po-mode marking
10947 ---
10948
10949 @item Documentation
10950 Included in @code{libintl-perl}, available on CPAN
10951 (http://www.cpan.org/).
10952
10953 @end table
10954
10955 An example is available in the @file{examples} directory: @code{hello-perl}.
10956
10957 @cindex marking Perl sources
10958
10959 The @code{xgettext} parser backend for Perl differs significantly from
10960 the parser backends for other programming languages, just as Perl
10961 itself differs significantly from other programming languages.  The
10962 Perl parser backend offers many more string marking facilities than
10963 the other backends but it also has some Perl specific limitations, the
10964 worst probably being its imperfectness.
10965
10966 @menu
10967 * General Problems::            General Problems Parsing Perl Code
10968 * Default Keywords::            Which Keywords Will xgettext Look For?
10969 * Special Keywords::            How to Extract Hash Keys
10970 * Quote-like Expressions::      What are Strings And Quote-like Expressions?
10971 * Interpolation I::             Invalid String Interpolation
10972 * Interpolation II::            Valid String Interpolation
10973 * Parentheses::                 When To Use Parentheses
10974 * Long Lines::                  How To Grok with Long Lines
10975 * Perl Pitfalls::               Bugs, Pitfalls, and Things That Do Not Work
10976 @end menu
10977
10978 @node General Problems, Default Keywords,  , Perl
10979 @subsubsection General Problems Parsing Perl Code
10980
10981 It is often heard that only Perl can parse Perl.  This is not true.
10982 Perl cannot be @emph{parsed} at all, it can only be @emph{executed}.
10983 Perl has various built-in ambiguities that can only be resolved at runtime.
10984
10985 The following example may illustrate one common problem:
10986
10987 @example
10988 print gettext "Hello World!";
10989 @end example
10990
10991 Although this example looks like a bullet-proof case of a function
10992 invocation, it is not:
10993
10994 @example
10995 open gettext, ">testfile" or die;
10996 print gettext "Hello world!"
10997 @end example
10998
10999 In this context, the string @code{gettext} looks more like a
11000 file handle.  But not necessarily:
11001
11002 @example
11003 use Locale::Messages qw (:libintl_h);
11004 open gettext ">testfile" or die;
11005 print gettext "Hello world!";
11006 @end example
11007
11008 Now, the file is probably syntactically incorrect, provided that the module
11009 @code{Locale::Messages} found first in the Perl include path exports a
11010 function @code{gettext}.  But what if the module
11011 @code{Locale::Messages} really looks like this?
11012
11013 @example
11014 use vars qw (*gettext);
11015
11016 1;
11017 @end example
11018
11019 In this case, the string @code{gettext} will be interpreted as a file
11020 handle again, and the above example will create a file @file{testfile}
11021 and write the string ``Hello world!'' into it.  Even advanced
11022 control flow analysis will not really help:
11023
11024 @example
11025 if (0.5 < rand) @{
11026    eval "use Sane";
11027 @} else @{
11028    eval "use InSane";
11029 @}
11030 print gettext "Hello world!";
11031 @end example
11032
11033 If the module @code{Sane} exports a function @code{gettext} that does
11034 what we expect, and the module @code{InSane} opens a file for writing
11035 and associates the @emph{handle} @code{gettext} with this output
11036 stream, we are clueless again about what will happen at runtime.  It is
11037 completely unpredictable.  The truth is that Perl has so many ways to
11038 fill its symbol table at runtime that it is impossible to interpret a
11039 particular piece of code without executing it.
11040
11041 Of course, @code{xgettext} will not execute your Perl sources while
11042 scanning for translatable strings, but rather use heuristics in order
11043 to guess what you meant.
11044
11045 Another problem is the ambiguity of the slash and the question mark.
11046 Their interpretation depends on the context:
11047
11048 @example
11049 # A pattern match.
11050 print "OK\n" if /foobar/;
11051
11052 # A division.
11053 print 1 / 2;
11054
11055 # Another pattern match.
11056 print "OK\n" if ?foobar?;
11057
11058 # Conditional.
11059 print $x ? "foo" : "bar";
11060 @end example
11061
11062 The slash may either act as the division operator or introduce a
11063 pattern match, whereas the question mark may act as the ternary
11064 conditional operator or as a pattern match, too.  Other programming
11065 languages like @code{awk} present similar problems, but the consequences of a
11066 misinterpretation are particularly nasty with Perl sources.  In @code{awk}
11067 for instance, a statement can never exceed one line and the parser
11068 can recover from a parsing error at the next newline and interpret
11069 the rest of the input stream correctly.  Perl is different, as a
11070 pattern match is terminated by the next appearance of the delimiter
11071 (the slash or the question mark) in the input stream, regardless of
11072 the semantic context.  If a slash is really a division sign but
11073 mis-interpreted as a pattern match, the rest of the input file is most
11074 probably parsed incorrectly.
11075
11076 There are certain cases, where the ambiguity cannot be resolved at all:
11077
11078 @example
11079 $x = wantarray ? 1 : 0;
11080 @end example
11081
11082 The Perl built-in function @code{wantarray} does not accept any arguments.
11083 The Perl parser therefore knows that the question mark does not start
11084 a regular expression but is the ternary conditional operator.
11085
11086 @example
11087 sub wantarrays @{@}
11088 $x = wantarrays ? 1 : 0;
11089 @end example
11090
11091 Now the situation is different.  The function @code{wantarrays} takes
11092 a variable number of arguments (like any non-prototyped Perl function).
11093 The question mark is now the delimiter of a pattern match, and hence
11094 the piece of code does not compile.
11095
11096 @example
11097 sub wantarrays() @{@}
11098 $x = wantarrays ? 1 : 0;
11099 @end example
11100
11101 Now the function is prototyped, Perl knows that it does not accept any
11102 arguments, and the question mark is therefore interpreted as the
11103 ternaray operator again.  But that unfortunately outsmarts @code{xgettext}.
11104
11105 The Perl parser in @code{xgettext} cannot know whether a function has
11106 a prototype and what that prototype would look like.  It therefore makes
11107 an educated guess.  If a function is known to be a Perl built-in and
11108 this function does not accept any arguments, a following question mark
11109 or slash is treated as an operator, otherwise as the delimiter of a
11110 following regular expression.  The Perl built-ins that do not accept
11111 arguments are @code{wantarray}, @code{fork}, @code{time}, @code{times},
11112 @code{getlogin}, @code{getppid}, @code{getpwent}, @code{getgrent},
11113 @code{gethostent}, @code{getnetent}, @code{getprotoent}, @code{getservent},
11114 @code{setpwent}, @code{setgrent}, @code{endpwent}, @code{endgrent},
11115 @code{endhostent}, @code{endnetent}, @code{endprotoent}, and
11116 @code{endservent}.
11117
11118 If you find that @code{xgettext} fails to extract strings from
11119 portions of your sources, you should therefore look out for slashes
11120 and/or question marks preceding these sections.  You may have come
11121 across a bug in @code{xgettext}'s Perl parser (and of course you
11122 should report that bug).  In the meantime you should consider to
11123 reformulate your code in a manner less challenging to @code{xgettext}.
11124
11125 In particular, if the parser is too dumb to see that a function
11126 does not accept arguments, use parentheses:
11127
11128 @example
11129 $x = somefunc() ? 1 : 0;
11130 $y = (somefunc) ? 1 : 0;
11131 @end example
11132
11133 In fact the Perl parser itself has similar problems and warns you
11134 about such constructs.
11135
11136 @node Default Keywords, Special Keywords, General Problems, Perl
11137 @subsubsection Which keywords will xgettext look for?
11138 @cindex Perl default keywords
11139
11140 Unless you instruct @code{xgettext} otherwise by invoking it with one
11141 of the options @code{--keyword} or @code{-k}, it will recognize the
11142 following keywords in your Perl sources:
11143
11144 @itemize @bullet
11145
11146 @item @code{gettext}
11147
11148 @item @code{dgettext}
11149
11150 @item @code{dcgettext}
11151
11152 @item @code{ngettext:1,2}
11153
11154 The first (singular) and the second (plural) argument will be
11155 extracted.
11156
11157 @item @code{dngettext:1,2}
11158
11159 The first (singular) and the second (plural) argument will be
11160 extracted.
11161
11162 @item @code{dcngettext:1,2}
11163
11164 The first (singular) and the second (plural) argument will be
11165 extracted.
11166
11167 @item @code{gettext_noop}
11168
11169 @item @code{%gettext}
11170
11171 The keys of lookups into the hash @code{%gettext} will be extracted.
11172
11173 @item @code{$gettext}
11174
11175 The keys of lookups into the hash reference @code{$gettext} will be extracted.
11176
11177 @end itemize
11178
11179 @node Special Keywords, Quote-like Expressions, Default Keywords, Perl
11180 @subsubsection How to Extract Hash Keys
11181 @cindex Perl special keywords for hash-lookups
11182
11183 Translating messages at runtime is normally performed by looking up the
11184 original string in the translation database and returning the
11185 translated version.  The ``natural'' Perl implementation is a hash
11186 lookup, and, of course, @code{xgettext} supports such practice.
11187
11188 @example
11189 print __"Hello world!";
11190 print $__@{"Hello world!"@};
11191 print $__->@{"Hello world!"@};
11192 print $$__@{"Hello world!"@};
11193 @end example
11194
11195 The above four lines all do the same thing.  The Perl module
11196 @code{Locale::TextDomain} exports by default a hash @code{%__} that
11197 is tied to the function @code{__()}.  It also exports a reference
11198 @code{$__} to @code{%__}.
11199
11200 If an argument to the @code{xgettext} option @code{--keyword},
11201 resp. @code{-k} starts with a percent sign, the rest of the keyword is
11202 interpreted as the name of a hash.  If it starts with a dollar
11203 sign, the rest of the keyword is interpreted as a reference to a
11204 hash.
11205
11206 Note that you can omit the quotation marks (single or double) around
11207 the hash key (almost) whenever Perl itself allows it:
11208
11209 @example
11210 print $gettext@{Error@};
11211 @end example
11212
11213 The exact rule is: You can omit the surrounding quotes, when the hash
11214 key is a valid C (!) identifier, i.e.@: when it starts with an
11215 underscore or an ASCII letter and is followed by an arbitrary number
11216 of underscores, ASCII letters or digits.  Other Unicode characters
11217 are @emph{not} allowed, regardless of the @code{use utf8} pragma.
11218
11219 @node Quote-like Expressions, Interpolation I, Special Keywords, Perl
11220 @subsubsection What are Strings And Quote-like Expressions?
11221 @cindex Perl quote-like expressions
11222
11223 Perl offers a plethora of different string constructs.  Those that can
11224 be used either as arguments to functions or inside braces for hash
11225 lookups are generally supported by @code{xgettext}.
11226
11227 @itemize @bullet
11228 @item @strong{double-quoted strings}
11229 @*
11230 @example
11231 print gettext "Hello World!";
11232 @end example
11233
11234 @item @strong{single-quoted strings}
11235 @*
11236 @example
11237 print gettext 'Hello World!';
11238 @end example
11239
11240 @item @strong{the operator qq}
11241 @*
11242 @example
11243 print gettext qq |Hello World!|;
11244 print gettext qq <E-mail: <guido\@@imperia.net>>;
11245 @end example
11246
11247 The operator @code{qq} is fully supported.  You can use arbitrary
11248 delimiters, including the four bracketing delimiters (round, angle,
11249 square, curly) that nest.
11250
11251 @item @strong{the operator q}
11252 @*
11253 @example
11254 print gettext q |Hello World!|;
11255 print gettext q <E-mail: <guido@@imperia.net>>;
11256 @end example
11257
11258 The operator @code{q} is fully supported.  You can use arbitrary
11259 delimiters, including the four bracketing delimiters (round, angle,
11260 square, curly) that nest.
11261
11262 @item @strong{the operator qx}
11263 @*
11264 @example
11265 print gettext qx ;LANGUAGE=C /bin/date;
11266 print gettext qx [/usr/bin/ls | grep '^[A-Z]*'];
11267 @end example
11268
11269 The operator @code{qx} is fully supported.  You can use arbitrary
11270 delimiters, including the four bracketing delimiters (round, angle,
11271 square, curly) that nest.
11272
11273 The example is actually a useless use of @code{gettext}.  It will
11274 invoke the @code{gettext} function on the output of the command
11275 specified with the @code{qx} operator.  The feature was included
11276 in order to make the interface consistent (the parser will extract
11277 all strings and quote-like expressions).
11278
11279 @item @strong{here documents}
11280 @*
11281 @example
11282 @group
11283 print gettext <<'EOF';
11284 program not found in $PATH
11285 EOF
11286
11287 print ngettext <<EOF, <<"EOF";
11288 one file deleted
11289 EOF
11290 several files deleted
11291 EOF
11292 @end group
11293 @end example
11294
11295 Here-documents are recognized.  If the delimiter is enclosed in single
11296 quotes, the string is not interpolated.  If it is enclosed in double
11297 quotes or has no quotes at all, the string is interpolated.
11298
11299 Delimiters that start with a digit are not supported!
11300
11301 @end itemize
11302
11303 @node Interpolation I, Interpolation II, Quote-like Expressions, Perl
11304 @subsubsection Invalid Uses Of String Interpolation
11305 @cindex Perl invalid string interpolation
11306
11307 Perl is capable of interpolating variables into strings.  This offers
11308 some nice features in localized programs but can also lead to
11309 problems.
11310
11311 A common error is a construct like the following:
11312
11313 @example
11314 print gettext "This is the program $0!\n";
11315 @end example
11316
11317 Perl will interpolate at runtime the value of the variable @code{$0}
11318 into the argument of the @code{gettext()} function.  Hence, this
11319 argument is not a string constant but a variable argument (@code{$0}
11320 is a global variable that holds the name of the Perl script being
11321 executed).  The interpolation is performed by Perl before the string
11322 argument is passed to @code{gettext()} and will therefore depend on
11323 the name of the script which can only be determined at runtime.
11324 Consequently, it is almost impossible that a translation can be looked
11325 up at runtime (except if, by accident, the interpolated string is found
11326 in the message catalog).
11327
11328 The @code{xgettext} program will therefore terminate parsing with a fatal
11329 error if it encounters a variable inside of an extracted string.  In
11330 general, this will happen for all kinds of string interpolations that
11331 cannot be safely performed at compile time.  If you absolutely know
11332 what you are doing, you can always circumvent this behavior:
11333
11334 @example
11335 my $know_what_i_am_doing = "This is program $0!\n";
11336 print gettext $know_what_i_am_doing;
11337 @end example
11338
11339 Since the parser only recognizes strings and quote-like expressions,
11340 but not variables or other terms, the above construct will be
11341 accepted.  You will have to find another way, however, to let your
11342 original string make it into your message catalog.
11343
11344 If invoked with the option @code{--extract-all}, resp. @code{-a},
11345 variable interpolation will be accepted.  Rationale: You will
11346 generally use this option in order to prepare your sources for
11347 internationalization.
11348
11349 Please see the manual page @samp{man perlop} for details of strings and
11350 quote-like expressions that are subject to interpolation and those
11351 that are not.  Safe interpolations (that will not lead to a fatal
11352 error) are:
11353
11354 @itemize @bullet
11355
11356 @item the escape sequences @code{\t} (tab, HT, TAB), @code{\n}
11357 (newline, NL), @code{\r} (return, CR), @code{\f} (form feed, FF),
11358 @code{\b} (backspace, BS), @code{\a} (alarm, bell, BEL), and @code{\e}
11359 (escape, ESC).
11360
11361 @item octal chars, like @code{\033}
11362 @*
11363 Note that octal escapes in the range of 400-777 are translated into a
11364 UTF-8 representation, regardless of the presence of the @code{use utf8} pragma.
11365
11366 @item hex chars, like @code{\x1b}
11367
11368 @item wide hex chars, like @code{\x@{263a@}}
11369 @*
11370 Note that this escape is translated into a UTF-8 representation,
11371 regardless of the presence of the @code{use utf8} pragma.
11372
11373 @item control chars, like @code{\c[} (CTRL-[)
11374
11375 @item named Unicode chars, like @code{\N@{LATIN CAPITAL LETTER C WITH CEDILLA@}}
11376 @*
11377 Note that this escape is translated into a UTF-8 representation,
11378 regardless of the presence of the @code{use utf8} pragma.
11379 @end itemize
11380
11381 The following escapes are considered partially safe:
11382
11383 @itemize @bullet
11384
11385 @item @code{\l} lowercase next char
11386
11387 @item @code{\u} uppercase next char
11388
11389 @item @code{\L} lowercase till \E
11390
11391 @item @code{\U} uppercase till \E
11392
11393 @item @code{\E} end case modification
11394
11395 @item @code{\Q} quote non-word characters till \E
11396
11397 @end itemize
11398
11399 These escapes are only considered safe if the string consists of
11400 ASCII characters only.  Translation of characters outside the range
11401 defined by ASCII is locale-dependent and can actually only be performed
11402 at runtime; @code{xgettext} doesn't do these locale-dependent translations
11403 at extraction time.
11404
11405 Except for the modifier @code{\Q}, these translations, albeit valid,
11406 are generally useless and only obfuscate your sources.  If a
11407 translation can be safely performed at compile time you can just as
11408 well write what you mean.
11409
11410 @node Interpolation II, Parentheses, Interpolation I, Perl
11411 @subsubsection Valid Uses Of String Interpolation
11412 @cindex Perl valid string interpolation
11413
11414 Perl is often used to generate sources for other programming languages
11415 or arbitrary file formats.  Web applications that output HTML code
11416 make a prominent example for such usage.
11417
11418 You will often come across situations where you want to intersperse
11419 code written in the target (programming) language with translatable
11420 messages, like in the following HTML example:
11421
11422 @example
11423 print gettext <<EOF;
11424 <h1>My Homepage</h1>
11425 <script language="JavaScript"><!--
11426 for (i = 0; i < 100; ++i) @{
11427     alert ("Thank you so much for visiting my homepage!");
11428 @}
11429 //--></script>
11430 EOF
11431 @end example
11432
11433 The parser will extract the entire here document, and it will appear
11434 entirely in the resulting PO file, including the JavaScript snippet
11435 embedded in the HTML code.  If you exaggerate with constructs like
11436 the above, you will run the risk that the translators of your package
11437 will look out for a less challenging project.  You should consider an
11438 alternative expression here:
11439
11440 @example
11441 print <<EOF;
11442 <h1>$gettext@{"My Homepage"@}</h1>
11443 <script language="JavaScript"><!--
11444 for (i = 0; i < 100; ++i) @{
11445     alert ("$gettext@{'Thank you so much for visiting my homepage!'@}");
11446 @}
11447 //--></script>
11448 EOF
11449 @end example
11450
11451 Only the translatable portions of the code will be extracted here, and
11452 the resulting PO file will begrudgingly improve in terms of readability.
11453
11454 You can interpolate hash lookups in all strings or quote-like
11455 expressions that are subject to interpolation (see the manual page
11456 @samp{man perlop} for details).  Double interpolation is invalid, however:
11457
11458 @example
11459 # TRANSLATORS: Replace "the earth" with the name of your planet.
11460 print gettext qq@{Welcome to $gettext->@{"the earth"@}@};
11461 @end example
11462
11463 The @code{qq}-quoted string is recognized as an argument to @code{xgettext} in
11464 the first place, and checked for invalid variable interpolation.  The
11465 dollar sign of hash-dereferencing will therefore terminate the parser
11466 with an ``invalid interpolation'' error.
11467
11468 It is valid to interpolate hash lookups in regular expressions:
11469
11470 @example
11471 if ($var =~ /$gettext@{"the earth"@}/) @{
11472    print gettext "Match!\n";
11473 @}
11474 s/$gettext@{"U. S. A."@}/$gettext@{"U. S. A."@} $gettext@{"(dial +0)"@}/g;
11475 @end example
11476
11477 @node Parentheses, Long Lines, Interpolation II, Perl
11478 @subsubsection When To Use Parentheses
11479 @cindex Perl parentheses
11480
11481 In Perl, parentheses around function arguments are mostly optional.
11482 @code{xgettext} will always assume that all
11483 recognized keywords (except for hashes and hash references) are names
11484 of properly prototyped functions, and will (hopefully) only require
11485 parentheses where Perl itself requires them.  All constructs in the
11486 following example are therefore ok to use:
11487
11488 @example
11489 @group
11490 print gettext ("Hello World!\n");
11491 print gettext "Hello World!\n";
11492 print dgettext ($package => "Hello World!\n");
11493 print dgettext $package, "Hello World!\n";
11494
11495 # The "fat comma" => turns the left-hand side argument into a
11496 # single-quoted string!
11497 print dgettext smellovision => "Hello World!\n";
11498
11499 # The following assignment only works with prototyped functions.
11500 # Otherwise, the functions will act as "greedy" list operators and
11501 # eat up all following arguments.
11502 my $anonymous_hash = @{
11503    planet => gettext "earth",
11504    cakes => ngettext "one cake", "several cakes", $n,
11505    still => $works,
11506 @};
11507 # The same without fat comma:
11508 my $other_hash = @{
11509    'planet', gettext "earth",
11510    'cakes', ngettext "one cake", "several cakes", $n,
11511    'still', $works,
11512 @};
11513
11514 # Parentheses are only significant for the first argument.
11515 print dngettext 'package', ("one cake", "several cakes", $n), $discarded;
11516 @end group
11517 @end example
11518
11519 @node Long Lines, Perl Pitfalls, Parentheses, Perl
11520 @subsubsection How To Grok with Long Lines
11521 @cindex Perl long lines
11522
11523 The necessity of long messages can often lead to a cumbersome or
11524 unreadable coding style.  Perl has several options that may prevent
11525 you from writing unreadable code, and
11526 @code{xgettext} does its best to do likewise.  This is where the dot
11527 operator (the string concatenation operator) may come in handy:
11528
11529 @example
11530 @group
11531 print gettext ("This is a very long"
11532                . " message that is still"
11533                . " readable, because"
11534                . " it is split into"
11535                . " multiple lines.\n");
11536 @end group
11537 @end example
11538
11539 Perl is smart enough to concatenate these constant string fragments
11540 into one long string at compile time, and so is
11541 @code{xgettext}.  You will only find one long message in the resulting
11542 POT file.
11543
11544 Note that the future Perl 6 will probably use the underscore
11545 (@samp{_}) as the string concatenation operator, and the dot
11546 (@samp{.}) for dereferencing.  This new syntax is not yet supported by
11547 @code{xgettext}.
11548
11549 If embedded newline characters are not an issue, or even desired, you
11550 may also insert newline characters inside quoted strings wherever you
11551 feel like it:
11552
11553 @example
11554 @group
11555 print gettext ("<em>In HTML output
11556 embedded newlines are generally no
11557 problem, since adjacent whitespace
11558 is always rendered into a single
11559 space character.</em>");
11560 @end group
11561 @end example
11562
11563 You may also consider to use here documents:
11564
11565 @example
11566 @group
11567 print gettext <<EOF;
11568 <em>In HTML output
11569 embedded newlines are generally no
11570 problem, since adjacent whitespace
11571 is always rendered into a single
11572 space character.</em>
11573 EOF
11574 @end group
11575 @end example
11576
11577 Please do not forget that the line breaks are real, i.e.@: they
11578 translate into newline characters that will consequently show up in
11579 the resulting POT file.
11580
11581 @node Perl Pitfalls,  , Long Lines, Perl
11582 @subsubsection Bugs, Pitfalls, And Things That Do Not Work
11583 @cindex Perl pitfalls
11584
11585 The foregoing sections should have proven that
11586 @code{xgettext} is quite smart in extracting translatable strings from
11587 Perl sources.  Yet, some more or less exotic constructs that could be
11588 expected to work, actually do not work.
11589
11590 One of the more relevant limitations can be found in the
11591 implementation of variable interpolation inside quoted strings.  Only
11592 simple hash lookups can be used there:
11593
11594 @example
11595 print <<EOF;
11596 $gettext@{"The dot operator"
11597           . " does not work"
11598           . "here!"@}
11599 Likewise, you cannot @@@{[ gettext ("interpolate function calls") ]@}
11600 inside quoted strings or quote-like expressions.
11601 EOF
11602 @end example
11603
11604 This is valid Perl code and will actually trigger invocations of the
11605 @code{gettext} function at runtime.  Yet, the Perl parser in
11606 @code{xgettext} will fail to recognize the strings.  A less obvious
11607 example can be found in the interpolation of regular expressions:
11608
11609 @example
11610 s/<!--START_OF_WEEK-->/gettext ("Sunday")/e;
11611 @end example
11612
11613 The modifier @code{e} will cause the substitution to be interpreted as
11614 an evaluable statement.  Consequently, at runtime the function
11615 @code{gettext()} is called, but again, the parser fails to extract the
11616 string ``Sunday''.  Use a temporary variable as a simple workaround if
11617 you really happen to need this feature:
11618
11619 @example
11620 my $sunday = gettext "Sunday";
11621 s/<!--START_OF_WEEK-->/$sunday/;
11622 @end example
11623
11624 Hash slices would also be handy but are not recognized:
11625
11626 @example
11627 my @@weekdays = @@gettext@{'Sunday', 'Monday', 'Tuesday', 'Wednesday',
11628                         'Thursday', 'Friday', 'Saturday'@};
11629 # Or even:
11630 @@weekdays = @@gettext@{qw (Sunday Monday Tuesday Wednesday Thursday
11631                          Friday Saturday) @};
11632 @end example
11633
11634 This is perfectly valid usage of the tied hash @code{%gettext} but the
11635 strings are not recognized and therefore will not be extracted.
11636
11637 Another caveat of the current version is its rudimentary support for
11638 non-ASCII characters in identifiers.  You may encounter serious
11639 problems if you use identifiers with characters outside the range of
11640 'A'-'Z', 'a'-'z', '0'-'9' and the underscore '_'.
11641
11642 Maybe some of these missing features will be implemented in future
11643 versions, but since you can always make do without them at minimal effort,
11644 these todos have very low priority.
11645
11646 A nasty problem are brace format strings that already contain braces
11647 as part of the normal text, for example the usage strings typically
11648 encountered in programs:
11649
11650 @example
11651 die "usage: $0 @{OPTIONS@} FILENAME...\n";
11652 @end example
11653
11654 If you want to internationalize this code with Perl brace format strings,
11655 you will run into a problem:
11656
11657 @example
11658 die __x ("usage: @{program@} @{OPTIONS@} FILENAME...\n", program => $0);
11659 @end example
11660
11661 Whereas @samp{@{program@}} is a placeholder, @samp{@{OPTIONS@}}
11662 is not and should probably be translated. Yet, there is no way to teach
11663 the Perl parser in @code{xgettext} to recognize the first one, and leave
11664 the other one alone.
11665
11666 There are two possible work-arounds for this problem.  If you are
11667 sure that your program will run under Perl 5.8.0 or newer (these
11668 Perl versions handle positional parameters in @code{printf()}) or
11669 if you are sure that the translator will not have to reorder the arguments
11670 in her translation -- for example if you have only one brace placeholder
11671 in your string, or if it describes a syntax, like in this one --, you can
11672 mark the string as @code{no-perl-brace-format} and use @code{printf()}:
11673
11674 @example
11675 # xgettext: no-perl-brace-format
11676 die sprintf ("usage: %s @{OPTIONS@} FILENAME...\n", $0);
11677 @end example
11678
11679 If you want to use the more portable Perl brace format, you will have to do
11680 put placeholders in place of the literal braces:
11681
11682 @example
11683 die __x ("usage: @{program@} @{[@}OPTIONS@{]@} FILENAME...\n",
11684          program => $0, '[' => '@{', ']' => '@}');
11685 @end example
11686
11687 Perl brace format strings know no escaping mechanism.  No matter how this
11688 escaping mechanism looked like, it would either give the programmer a
11689 hard time, make translating Perl brace format strings heavy-going, or
11690 result in a performance penalty at runtime, when the format directives
11691 get executed.  Most of the time you will happily get along with
11692 @code{printf()} for this special case.
11693
11694 @node PHP, Pike, Perl, List of Programming Languages
11695 @subsection PHP Hypertext Preprocessor
11696 @cindex PHP
11697
11698 @table @asis
11699 @item RPMs
11700 mod_php4, mod_php4-core, phpdoc
11701
11702 @item File extension
11703 @code{php}, @code{php3}, @code{php4}
11704
11705 @item String syntax
11706 @code{"abc"}, @code{'abc'}
11707
11708 @item gettext shorthand
11709 @code{_("abc")}
11710
11711 @item gettext/ngettext functions
11712 @code{gettext}, @code{dgettext}, @code{dcgettext}; starting with PHP 4.2.0
11713 also @code{ngettext}, @code{dngettext}, @code{dcngettext}
11714
11715 @item textdomain
11716 @code{textdomain} function
11717
11718 @item bindtextdomain
11719 @code{bindtextdomain} function
11720
11721 @item setlocale
11722 Programmer must call @code{setlocale (LC_ALL, "")}
11723
11724 @item Prerequisite
11725 ---
11726
11727 @item Use or emulate GNU gettext
11728 use
11729
11730 @item Extractor
11731 @code{xgettext}
11732
11733 @item Formatting with positions
11734 @code{printf "%2\$d %1\$d"}
11735
11736 @item Portability
11737 On platforms without gettext, the functions are not available.
11738
11739 @item po-mode marking
11740 ---
11741 @end table
11742
11743 An example is available in the @file{examples} directory: @code{hello-php}.
11744
11745 @node Pike, GCC-source, PHP, List of Programming Languages
11746 @subsection Pike
11747 @cindex Pike
11748
11749 @table @asis
11750 @item RPMs
11751 roxen
11752
11753 @item File extension
11754 @code{pike}
11755
11756 @item String syntax
11757 @code{"abc"}
11758
11759 @item gettext shorthand
11760 ---
11761
11762 @item gettext/ngettext functions
11763 @code{gettext}, @code{dgettext}, @code{dcgettext}
11764
11765 @item textdomain
11766 @code{textdomain} function
11767
11768 @item bindtextdomain
11769 @code{bindtextdomain} function
11770
11771 @item setlocale
11772 @code{setlocale} function
11773
11774 @item Prerequisite
11775 @code{import Locale.Gettext;}
11776
11777 @item Use or emulate GNU gettext
11778 use
11779
11780 @item Extractor
11781 ---
11782
11783 @item Formatting with positions
11784 ---
11785
11786 @item Portability
11787 On platforms without gettext, the functions are not available.
11788
11789 @item po-mode marking
11790 ---
11791 @end table
11792
11793 @node GCC-source, Lua, Pike, List of Programming Languages
11794 @subsection GNU Compiler Collection sources
11795 @cindex GCC-source
11796
11797 @table @asis
11798 @item RPMs
11799 gcc
11800
11801 @item File extension
11802 @code{c}, @code{h}.
11803
11804 @item String syntax
11805 @code{"abc"}
11806
11807 @item gettext shorthand
11808 @code{_("abc")}
11809
11810 @item gettext/ngettext functions
11811 @code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext},
11812 @code{dngettext}, @code{dcngettext}
11813
11814 @item textdomain
11815 @code{textdomain} function
11816
11817 @item bindtextdomain
11818 @code{bindtextdomain} function
11819
11820 @item setlocale
11821 Programmer must call @code{setlocale (LC_ALL, "")}
11822
11823 @item Prerequisite
11824 @code{#include "intl.h"}
11825
11826 @item Use or emulate GNU gettext
11827 Use
11828
11829 @item Extractor
11830 @code{xgettext -k_}
11831
11832 @item Formatting with positions
11833 ---
11834
11835 @item Portability
11836 Uses autoconf macros
11837
11838 @item po-mode marking
11839 yes
11840 @end table
11841
11842 @node Lua, JavaScript, GCC-source, List of Programming Languages
11843 @subsection Lua
11844
11845 @table @asis
11846 @item RPMs
11847 lua
11848
11849 @item File extension
11850 @code{lua}
11851
11852 @item String syntax
11853 @itemize @bullet
11854
11855 @item @code{"abc"}
11856
11857 @item @code{'abc'}
11858
11859 @item @code{[[abc]]}
11860
11861 @item @code{[=[abc]=]}
11862
11863 @item @code{[==[abc]==]}
11864
11865 @item ...
11866
11867 @end itemize
11868
11869 @item gettext shorthand
11870 @code{_("abc")}
11871
11872 @item gettext/ngettext functions
11873 @code{gettext.gettext}, @code{gettext.dgettext}, @code{gettext.dcgettext},
11874 @code{gettext.ngettext}, @code{gettext.dngettext}, @code{gettext.dcngettext}
11875
11876 @item textdomain
11877 @code{textdomain} function
11878
11879 @item bindtextdomain
11880 @code{bindtextdomain} function
11881
11882 @item setlocale
11883 automatic
11884
11885 @item Prerequisite
11886 @code{require 'gettext'} or running lua interpreter with @code{-l gettext} option
11887
11888 @item Use or emulate GNU gettext
11889 use
11890
11891 @item Extractor
11892 @code{xgettext}
11893
11894 @item Formatting with positions
11895 ---
11896
11897 @item Portability
11898 On platforms without gettext, the functions are not available.
11899
11900 @item po-mode marking
11901 ---
11902 @end table
11903
11904 @node JavaScript
11905 @subsection JavaScript
11906
11907 @table @asis
11908 @item RPMs
11909 js
11910
11911 @item File extension
11912 @code{js}
11913
11914 @item String syntax
11915 @itemize @bullet
11916
11917 @item @code{"abc"}
11918
11919 @item @code{'abc'}
11920
11921 @end itemize
11922
11923 @item gettext shorthand
11924 @code{_("abc")}
11925
11926 @item gettext/ngettext functions
11927 @code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext},
11928 @code{dngettext}
11929
11930 @item textdomain
11931 @code{textdomain} function
11932
11933 @item bindtextdomain
11934 @code{bindtextdomain} function
11935
11936 @item setlocale
11937 automatic
11938
11939 @item Prerequisite
11940 ---
11941
11942 @item Use or emulate GNU gettext
11943 use, or emulate
11944
11945 @item Extractor
11946 @code{xgettext}
11947
11948 @item Formatting with positions
11949 ---
11950
11951 @item Portability
11952 On platforms without gettext, the functions are not available.
11953
11954 @item po-mode marking
11955 ---
11956 @end table
11957
11958 @c This is the template for new languages.
11959 @ignore
11960
11961 @ node
11962 @ subsection
11963
11964 @table @asis
11965 @item RPMs
11966
11967 @item File extension
11968
11969 @item String syntax
11970
11971 @item gettext shorthand
11972
11973 @item gettext/ngettext functions
11974
11975 @item textdomain
11976
11977 @item bindtextdomain
11978
11979 @item setlocale
11980
11981 @item Prerequisite
11982
11983 @item Use or emulate GNU gettext
11984
11985 @item Extractor
11986
11987 @item Formatting with positions
11988
11989 @item Portability
11990
11991 @item po-mode marking
11992 @end table
11993
11994 @end ignore
11995
11996 @node List of Data Formats,  , List of Programming Languages, Programming Languages
11997 @section Internationalizable Data
11998
11999 Here is a list of other data formats which can be internationalized
12000 using GNU gettext.
12001
12002 @menu
12003 * POT::                         POT - Portable Object Template
12004 * RST::                         Resource String Table
12005 * Glade::                       Glade - GNOME user interface description
12006 @end menu
12007
12008 @node POT, RST, List of Data Formats, List of Data Formats
12009 @subsection POT - Portable Object Template
12010
12011 @table @asis
12012 @item RPMs
12013 gettext
12014
12015 @item File extension
12016 @code{pot}, @code{po}
12017
12018 @item Extractor
12019 @code{xgettext}
12020 @end table
12021
12022 @node RST, Glade, POT, List of Data Formats
12023 @subsection Resource String Table
12024 @cindex RST
12025
12026 @table @asis
12027 @item RPMs
12028 fpk
12029
12030 @item File extension
12031 @code{rst}
12032
12033 @item Extractor
12034 @code{xgettext}, @code{rstconv}
12035 @end table
12036
12037 @node Glade,   , RST, List of Data Formats
12038 @subsection Glade - GNOME user interface description
12039
12040 @table @asis
12041 @item RPMs
12042 glade, libglade, glade2, libglade2, intltool
12043
12044 @item File extension
12045 @code{glade}, @code{glade2}, @code{ui}
12046
12047 @item Extractor
12048 @code{xgettext}, @code{libglade-xgettext}, @code{xml-i18n-extract}, @code{intltool-extract}
12049 @end table
12050
12051 @c This is the template for new data formats.
12052 @ignore
12053
12054 @ node
12055 @ subsection
12056
12057 @table @asis
12058 @item RPMs
12059
12060 @item File extension
12061
12062 @item Extractor
12063 @end table
12064
12065 @end ignore
12066
12067 @node Conclusion, Language Codes, Programming Languages, Top
12068 @chapter Concluding Remarks
12069
12070 We would like to conclude this GNU @code{gettext} manual by presenting
12071 an history of the Translation Project so far.  We finally give
12072 a few pointers for those who want to do further research or readings
12073 about Native Language Support matters.
12074
12075 @menu
12076 * History::                     History of GNU @code{gettext}
12077 * References::                  Related Readings
12078 @end menu
12079
12080 @node History, References, Conclusion, Conclusion
12081 @section History of GNU @code{gettext}
12082 @cindex history of GNU @code{gettext}
12083
12084 Internationalization concerns and algorithms have been informally
12085 and casually discussed for years in GNU, sometimes around GNU
12086 @code{libc}, maybe around the incoming @code{Hurd}, or otherwise
12087 (nobody clearly remembers).  And even then, when the work started for
12088 real, this was somewhat independently of these previous discussions.
12089
12090 This all began in July 1994, when Patrick D'Cruze had the idea and
12091 initiative of internationalizing version 3.9.2 of GNU @code{fileutils}.
12092 He then asked Jim Meyering, the maintainer, how to get those changes
12093 folded into an official release.  That first draft was full of
12094 @code{#ifdef}s and somewhat disconcerting, and Jim wanted to find
12095 nicer ways.  Patrick and Jim shared some tries and experimentations
12096 in this area.  Then, feeling that this might eventually have a deeper
12097 impact on GNU, Jim wanted to know what standards were, and contacted
12098 Richard Stallman, who very quickly and verbally described an overall
12099 design for what was meant to become @code{glocale}, at that time.
12100
12101 Jim implemented @code{glocale} and got a lot of exhausting feedback
12102 from Patrick and Richard, of course, but also from Mitchum DSouza
12103 (who wrote a @code{catgets}-like package), Roland McGrath, maybe David
12104 MacKenzie, Fran@,{c}ois Pinard, and Paul Eggert, all pushing and
12105 pulling in various directions, not always compatible, to the extent
12106 that after a couple of test releases, @code{glocale} was torn apart.
12107 In particular, Paul Eggert -- always keeping an eye on developments
12108 in Solaris -- advocated the use of the @code{gettext} API over
12109 @code{glocale}'s @code{catgets}-based API.
12110
12111 While Jim took some distance and time and became dad for a second
12112 time, Roland wanted to get GNU @code{libc} internationalized, and
12113 got Ulrich Drepper involved in that project.  Instead of starting
12114 from @code{glocale}, Ulrich rewrote something from scratch, but
12115 more conforming to the set of guidelines who emerged out of the
12116 @code{glocale} effort.  Then, Ulrich got people from the previous
12117 forum to involve themselves into this new project, and the switch
12118 from @code{glocale} to what was first named @code{msgutils}, renamed
12119 @code{nlsutils}, and later @code{gettext}, became officially accepted
12120 by Richard in May 1995 or so.
12121
12122 Let's summarize by saying that Ulrich Drepper wrote GNU @code{gettext}
12123 in April 1995.  The first official release of the package, including
12124 PO mode, occurred in July 1995, and was numbered 0.7.  Other people
12125 contributed to the effort by providing a discussion forum around
12126 Ulrich, writing little pieces of code, or testing.  These are quoted
12127 in the @code{THANKS} file which comes with the GNU @code{gettext}
12128 distribution.
12129
12130 While this was being done, Fran@,{c}ois adapted half a dozen of
12131 GNU packages to @code{glocale} first, then later to @code{gettext},
12132 putting them in pretest, so providing along the way an effective
12133 user environment for fine tuning the evolving tools.  He also took
12134 the responsibility of organizing and coordinating the Translation
12135 Project.  After nearly a year of informal exchanges between people from
12136 many countries, translator teams started to exist in May 1995, through
12137 the creation and support by Patrick D'Cruze of twenty unmoderated
12138 mailing lists for that many native languages, and two moderated
12139 lists: one for reaching all teams at once, the other for reaching
12140 all willing maintainers of internationalized free software packages.
12141
12142 Fran@,{c}ois also wrote PO mode in June 1995 with the collaboration
12143 of Greg McGary, as a kind of contribution to Ulrich's package.
12144 He also gave a hand with the GNU @code{gettext} Texinfo manual.
12145
12146 In 1997, Ulrich Drepper released the GNU libc 2.0, which included the
12147 @code{gettext}, @code{textdomain} and @code{bindtextdomain} functions.
12148
12149 In 2000, Ulrich Drepper added plural form handling (the @code{ngettext}
12150 function) to GNU libc.  Later, in 2001, he released GNU libc 2.2.x,
12151 which is the first free C library with full internationalization support.
12152
12153 Ulrich being quite busy in his role of General Maintainer of GNU libc,
12154 he handed over the GNU @code{gettext} maintenance to Bruno Haible in
12155 2000.  Bruno added the plural form handling to the tools as well, added
12156 support for UTF-8 and CJK locales, and wrote a few new tools for
12157 manipulating PO files.
12158
12159 @node References,  , History, Conclusion
12160 @section Related Readings
12161 @cindex related reading
12162 @cindex bibliography
12163
12164 @strong{ NOTE: } This documentation section is outdated and needs to be
12165 revised.
12166
12167 Eugene H. Dorr (@file{dorre@@well.com}) maintains an interesting
12168 bibliography on internationalization matters, called
12169 @cite{Internationalization Reference List}, which is available as:
12170 @example
12171 ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/i18n-books.txt
12172 @end example
12173
12174 Michael Gschwind (@file{mike@@vlsivie.tuwien.ac.at}) maintains a
12175 Frequently Asked Questions (FAQ) list, entitled @cite{Programming for
12176 Internationalisation}.  This FAQ discusses writing programs which
12177 can handle different language conventions, character sets, etc.;
12178 and is applicable to all character set encodings, with particular
12179 emphasis on @w{ISO 8859-1}.  It is regularly published in Usenet
12180 groups @file{comp.unix.questions}, @file{comp.std.internat},
12181 @file{comp.software.international}, @file{comp.lang.c},
12182 @file{comp.windows.x}, @file{comp.std.c}, @file{comp.answers}
12183 and @file{news.answers}.  The home location of this document is:
12184 @example
12185 ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/ISO-programming
12186 @end example
12187
12188 Patrick D'Cruze (@file{pdcruze@@li.org}) wrote a tutorial about NLS
12189 matters, and Jochen Hein (@file{Hein@@student.tu-clausthal.de}) took
12190 over the responsibility of maintaining it.  It may be found as:
12191 @example
12192 ftp://sunsite.unc.edu/pub/Linux/utils/nls/catalogs/Incoming/...
12193      ...locale-tutorial-0.8.txt.gz
12194 @end example
12195 @noindent
12196 This site is mirrored in:
12197 @example
12198 ftp://ftp.ibp.fr/pub/linux/sunsite/
12199 @end example
12200
12201 A French version of the same tutorial should be findable at:
12202 @example
12203 ftp://ftp.ibp.fr/pub/linux/french/docs/
12204 @end example
12205 @noindent
12206 together with French translations of many Linux-related documents.
12207
12208 @node Language Codes, Country Codes, Conclusion, Top
12209 @appendix Language Codes
12210 @cindex language codes
12211 @cindex ISO 639
12212
12213 The @w{ISO 639} standard defines two-letter codes for many languages, and
12214 three-letter codes for more rarely used languages.
12215 All abbreviations for languages used in the Translation Project should
12216 come from this standard.
12217
12218 @menu
12219 * Usual Language Codes::        Two-letter ISO 639 language codes
12220 * Rare Language Codes::         Three-letter ISO 639 language codes
12221 @end menu
12222
12223 @node Usual Language Codes, Rare Language Codes, Language Codes, Language Codes
12224 @appendixsec Usual Language Codes
12225
12226 For the commonly used languages, the @w{ISO 639-1} standard defines two-letter
12227 codes.
12228
12229 @table @samp
12230 @include iso-639.texi
12231 @end table
12232
12233 @node Rare Language Codes,  , Usual Language Codes, Language Codes
12234 @appendixsec Rare Language Codes
12235
12236 For rarely used languages, the @w{ISO 639-2} standard defines three-letter
12237 codes.  Here is the current list, reduced to only living languages with at least
12238 one million of speakers.
12239
12240 @table @samp
12241 @include iso-639-2.texi
12242 @end table
12243
12244 @node Country Codes, Licenses, Language Codes, Top
12245 @appendix Country Codes
12246 @cindex country codes
12247 @cindex ISO 3166
12248
12249 The @w{ISO 3166} standard defines two character codes for many countries
12250 and territories.  All abbreviations for countries used in the Translation
12251 Project should come from this standard.
12252
12253 @table @samp
12254 @include iso-3166.texi
12255 @end table
12256
12257 @node Licenses, Program Index, Country Codes, Top
12258 @appendix Licenses
12259 @cindex Licenses
12260
12261 The files of this package are covered by the licenses indicated in each
12262 particular file or directory.  Here is a summary:
12263
12264 @itemize @bullet
12265 @item
12266 The @code{libintl} and @code{libasprintf} libraries are covered by the
12267 GNU Lesser General Public License (LGPL).
12268 A copy of the license is included in @ref{GNU LGPL}.
12269
12270 @item
12271 The executable programs of this package and the @code{libgettextpo} library
12272 are covered by the GNU General Public License (GPL).
12273 A copy of the license is included in @ref{GNU GPL}.
12274
12275 @item
12276 This manual is free documentation.  It is dually licensed under the
12277 GNU FDL and the GNU GPL.  This means that you can redistribute this
12278 manual under either of these two licenses, at your choice.
12279 @*
12280 This manual is covered by the GNU FDL.  Permission is granted to copy,
12281 distribute and/or modify this document under the terms of the
12282 GNU Free Documentation License (FDL), either version 1.2 of the
12283 License, or (at your option) any later version published by the
12284 Free Software Foundation (FSF); with no Invariant Sections, with no
12285 Front-Cover Text, and with no Back-Cover Texts.
12286 A copy of the license is included in @ref{GNU FDL}.
12287 @*
12288 This manual is covered by the GNU GPL.  You can redistribute it and/or
12289 modify it under the terms of the GNU General Public License (GPL), either
12290 version 2 of the License, or (at your option) any later version published
12291 by the Free Software Foundation (FSF).
12292 A copy of the license is included in @ref{GNU GPL}.
12293 @end itemize
12294
12295 @menu
12296 * GNU GPL::                     GNU General Public License
12297 * GNU LGPL::                    GNU Lesser General Public License
12298 * GNU FDL::                     GNU Free Documentation License
12299 @end menu
12300
12301 @page
12302 @include gpl.texi
12303 @page
12304 @include lgpl.texi
12305 @page
12306 @include fdl.texi
12307
12308 @node Program Index, Option Index, Licenses, Top
12309 @unnumbered Program Index
12310
12311 @printindex pg
12312
12313 @node Option Index, Variable Index, Program Index, Top
12314 @unnumbered Option Index
12315
12316 @printindex op
12317
12318 @node Variable Index, PO Mode Index, Option Index, Top
12319 @unnumbered Variable Index
12320
12321 @printindex vr
12322
12323 @node PO Mode Index, Autoconf Macro Index, Variable Index, Top
12324 @unnumbered PO Mode Index
12325
12326 @printindex em
12327
12328 @node Autoconf Macro Index, Index, PO Mode Index, Top
12329 @unnumbered Autoconf Macro Index
12330
12331 @printindex am
12332
12333 @node Index,  , Autoconf Macro Index, Top
12334 @unnumbered General Index
12335
12336 @printindex cp
12337
12338 @bye
12339
12340 @c Local variables:
12341 @c texinfo-column-for-description: 32
12342 @c End: