manual/message.texi

   1 @node Message Translation
   2 @chapter Message Translation
   3
   4 The program's interface with the human should be designed in a way to
   5 ease the human the task.  One of the possibilities is to use messages in
   6 whatever language the user prefers.
   7
   8 Printing messages in different languages can be implemented in different
   9 ways.  One could add all the different languages in the source code and
  10 add among the variants every time a message has to be printed.  This is
  11 certainly no good solution since extending the set of languages is
  12 difficult (the code must be changed) and the code itself can become
  13 really big with dozens of message sets.
  14
  15 A better solution is to keep the message sets for each language are kept
  16 in separate files which are loaded at runtime depending on the language
  17 selection of the user.
  18
  19 The GNU C Library provides two different sets of functions to support
  20 message translation.  The problem is that neither of the interfaces is
  21 officially defined by the POSIX standard.  The @code{catgets} family of
  22 functions is defined in the X/Open standard but this is derived from
  23 industry decisions and therefore not necessarily based on reasonable
  24 decisions.
  25
  26 As mentioned above the message catalog handling provides easy
  27 extendibility by using external data files which contain the message
  28 translations.  I.e., these files contain for each of the messages used
  29 in the program a translation for the appropriate language.  So the tasks
  30 of the message handling functions are
  31
  32 @itemize @bullet
  33 @item
  34 locate the external data file with the appropriate translations.
  35 @item
  36 load the data and make it possible to address the messages
  37 @item
  38 map a given key to the translated message
  39 @end itemize
  40
  41 The two approaches mainly differ in the implementation of this last
  42 step.  The design decisions made for this influences the whole rest.
  43
  44 @menu
  45 * Message catalogs a la X/Open::  The @code{catgets} family of functions.
  46 * The Uniforum approach::         The @code{gettext} family of functions.
  47 @end menu
  48
  49
  50 @node Message catalogs a la X/Open
  51 @section X/Open Message Catalog Handling
  52
  53 The @code{catgets} functions are based on the simple scheme:
  54
  55 @quotation
  56 Associate every message to translate in the source code with a unique
  57 identifier.  To retrieve a message from a catalog file solely the
  58 identifier is used.
  59 @end quotation
  60
  61 This means for the author of the program that s/he will have to make
  62 sure the meaning of the identifier in the program code and in the
  63 message catalogs are always the same.
  64
  65 Before a message can be translated the catalog file must be located.
  66 The user of the program must be able to guide the responsible function
  67 to find whatever catalog the user wants.  This is separated from what
  68 the programmer had in mind.
  69
  70 All the types, constants and functions for the @code{catgets} functions
  71 are defined/declared in the @file{nl_types.h} header file.
  72
  73 @menu
  74 * The catgets Functions::      The @code{catgets} function family.
  75 * The message catalog files::  Format of the message catalog files.
  76 * The gencat program::         How to generate message catalogs files which
  77                                 can be used by the functions.
  78 * Common Usage::               How to use the @code{catgets} interface.
  79 @end menu
  80
  81
  82 @node The catgets Functions
  83 @subsection The @code{catgets} function family
  84
  85 @comment nl_types.h
  86 @comment X/Open
  87 @deftypefun nl_catd catopen (const char *@var{cat_name}, int @var{flag})
  88 The @code{catgets} function tries to locate the message data file names
  89 @var{cat_name} and loads it when found.  The return value is of an
  90 opaque type and can be used in calls to the other functions to refer to
  91 this loaded catalog.
  92
  93 The return value is @code{(nl_catd) -1} in case the function failed and
  94 no catalog was loaded.  The global variable @var{errno} contains a code
  95 for the error causing the failure.  But even if the function call
  96 succeeded this does not mean that all messages can be translated.
  97
  98 Locating the catalog file must happen in a way which lets the user of
  99 the program influence the decision.  It is up to the user to decide
 100 about the language to use and sometimes it is useful to use alternate
 101 catalog files.  All this can be specified by the user by setting some
 102 environment variables.
 103
 104 The first problem is to find out where all the message catalogs are
 105 stored.  Every program could have its own place to keep all the
 106 different files but usually the catalog files are grouped by languages
 107 and the catalogs for all programs are kept in the same place.
 108
 109 @cindex NLSPATH environment variable
 110 To tell the @code{catopen} function where the catalog for the program
 111 can be found the user can set the environment variable @code{NLSPATH} to
 112 a value which describes her/his choice.  Since this value must be usable
 113 for different languages and locales it cannot be a simple string.
 114 Instead it is a format string (similar to @code{printf}'s).  An example
 115 is
 116
 117 @smallexample
 118 /usr/share/locale/%L/%N:/usr/share/locale/%L/LC_MESSAGES/%N
 119 @end smallexample
 120
 121 First one can see that more than one directory can be specified (with
 122 the usual syntax of separating them by colons).  The next things to
 123 observe are the format string, @code{%L} and @code{%N} in this case.
 124 The @code{catopen} function knows about several of them and the
 125 replacement for all of them is of course different.
 126
 127 @table @code
 128 @item %N
 129 This format element is substituted with the name of the catalog file.
 130 This is the value of the @var{cat_name} argument given to
 131 @code{catgets}.
 132
 133 @item %L
 134 This format element is substituted with the name of the currently
 135 selected locale for translating messages.  How this is determined is
 136 explained below.
 137
 138 @item %l
 139 (This is the lowercase ell.) This format element is substituted with the
 140 language element of the locale name.  The string describing the selected
 141 locale is expected to have the form
 142 @code{@var{lang}[_@var{terr}[.@var{codeset}]]} and this format uses the
 143 first part @var{lang}.
 144
 145 @item %t
 146 This format element is substituted by the territory part @var{terr} of
 147 the name of the currently selected locale.  See the explanation of the
 148 format above.
 149
 150 @item %c
 151 This format element is substituted by the codeset part @var{codeset} of
 152 the name of the currently selected locale.  See the explanation of the
 153 format above.
 154
 155 @item %%
 156 Since @code{%} is used in a meta character there must be a way to
 157 express the @code{%} character in the result itself.  Using @code{%%}
 158 does this just like it works for @code{printf}.
 159 @end table
 160
 161
 162 Using @code{NLSPATH} allows to specify arbitrary directories to be
 163 searched for message catalogs while still allowing different languages
 164 to be used.  If the @code{NLSPATH} environment variable is not set the
 165 default value is
 166
 167 @smallexample
 168 @var{prefix}/share/locale/%L/%N:@var{prefix}/share/locale/%L/LC_MESSAGES/%N
 169 @end smallexample
 170
 171 @noindent
 172 where @var{prefix} is given to @code{configure} while installing the GNU
 173 C Library (this value is in many cases @code{/usr} or the empty string).
 174
 175 The remaining problem is to decide which must be used.  The value
 176 decides about the substitution of the format elements mentioned above.
 177 First of all the user can specify a path in the message catalog name
 178 (i.e., the name contains a slash character).  In this situation the
 179 @code{NLSPATH} environment variable is not used.  The catalog must exist
 180 as specified in the program, perhaps relative to the current working
 181 directory.  This situation in not desirable and catalogs names never
 182 should be written this way.  Beside this, this behaviour is not portable
 183 to all other platforms providing the @code{catgets} interface.
 184
 185 @cindex LC_ALL environment variable
 186 @cindex LC_MESSAGES environment variable
 187 @cindex LANG environment variable
 188 Otherwise the values of environment variables from the standard
 189 environment are examined (@pxref{Standard Environment}).  Which
 190 variables are examined is decided by the @var{flag} parameter of
 191 @code{catopen}.  If the value is @code{NL_CAT_LOCALE} (which is defined
 192 in @file{nl_types.h}) then the @code{catopen} function examines the
 193 environment variable @code{LC_ALL}, @code{LC_MESSAGES}, and @code{LANG}
 194 in this order.  The first variable which is set in the current
 195 environment will be used.
 196
 197 If @var{flag} is zero only the @code{LANG} environment variable is
 198 examined.  This is a left-over from the early days of this function
 199 where the other environment variable were not known.
 200
 201 In any case the environment variable should have a value of the form
 202 @code{@var{lang}[_@var{terr}[.@var{codeset}]]} as explained above.  If
 203 no environment variable is set the @code{"C"} locale is used which
 204 prevents any translation.
 205
 206 The return value of the function is in any case a valid string.  Either
 207 it is a translation from a message catalog or it is the same as the
 208 @var{string} parameter.  So a piece of code to decide whether a
 209 translation actually happened must look like this:
 210
 211 @smallexample
 212 @{
 213   char *trans = catgets (desc, set, msg, input_string);
 214   if (trans == input_string)
 215     @{
 216       /* Something went wrong.  */
 217     @}
 218 @}
 219 @end smallexample
 220
 221 @noindent
 222 When an error occured the global variable @var{errno} is set to
 223
 224 @table @var
 225 @item EBADF
 226 The catalog does not exist.
 227 @item ENOMSG
 228 The set/message ttuple does not name an existing element in the
 229 message catalog.
 230 @end table
 231
 232 While it sometimes can be useful to test for errors programs normally
 233 will avoid any test.  If the translation is not available it is no big
 234 problem if the original, untranslated message is printed.  Either the
 235 user understands this as well or s/he will look for the reason why the
 236 messages are not translated.
 237 @end deftypefun
 238
 239 Please note that the currently selected locale does not depend on a call
 240 to the @code{setlocale} function.  It is not necessary that the locale
 241 data files for this locale exist and calling @code{setlocale} succeeds.
 242 The @code{catopen} function directly reads the values of the environment
 243 variables.
 244
 245
 246 @deftypefun {char *} catgets (nl_catd @var{catalog_desc}, int @var{set}, int @var{message}, const char *@var{string})
 247 The function @code{catgets} has to be used to access the massage catalog
 248 previously opened using the @code{catopen} function.  The
 249 @var{catalog_desc} parameter must be a value previously returned by
 250 @code{catopen}.
 251
 252 The next two parameters, @var{set} and @var{message}, reflect the
 253 internal organization of the message catalog files.  This will be
 254 explained in detail below.  For now it is interesting to know that a
 255 catalog can consists of several set and the messages in each thread are
 256 individually numbered using numbers.  Neither the set number nor the
 257 message number must be consecutive.  They can be arbitrarily chosen.
 258 But each message (unless equal to another one) must have its own unique
 259 pair of set and message number.
 260
 261 Since it is not guaranteed that the message catalog for the language
 262 selected by the user exists the last parameter @var{string} helps to
 263 handle this case gracefully.  If no matching string can be found
 264 @var{string} is returned.  This means for the programmer that
 265
 266 @itemize @bullet
 267 @item
 268 the @var{string} parameters should contain reasonable text (this also
 269 helps to understand the program seems otherwise there would be no hint
 270 on the string which is expected to be returned.
 271 @item
 272 all @var{string} arguments should be written in the same language.
 273 @end itemize
 274 @end deftypefun
 275
 276 It is somewhat uncomfortable to write a program using the @code{catgets}
 277 functions if no supporting functionality is available.  Since each
 278 set/message number tuple must be unique the programmer must keep lists
 279 of the messages at the same time the code is written.  And the work
 280 between several people working on the same project must be coordinated.
 281 In @ref{Common Usage} we will see some how these problems can be relaxed
 282 a bit.
 283
 284 @deftypefun int catclose (nl_catd @var{catalog_desc})
 285 The @code{catclose} function can be used to free the resources
 286 associated with a message catalog which previously was opened by a call
 287 to @code{catopen}.  If the resources can be successfully freed the
 288 function returns @code{0}.  Otherwise it return @code{@minus{}1} and the
 289 global variable @var{errno} is set.  Errors can occur if the catalog
 290 descriptor @var{catalog_desc} is not valid in which case @var{errno} is
 291 set to @code{EBADF}.
 292 @end deftypefun
 293
 294
 295 @node The message catalog files
 296 @subsection  Format of the message catalog files
 297
 298 The only reasonable way the translate all the messages of a function and
 299 store the result in a message catalog file which can be read by the
 300 @code{catopen} function is to write all the message text to the
 301 translator and let her/him translate them all.  I.e., we must have a
 302 file with entries which associate the set/message tuple with a specific
 303 translation.  This file format is specified in the X/Open standard and
 304 is as follows:
 305
 306 @itemize @bullet
 307 @item
 308 Lines containing only whitespace characters or empty lines are ignored.
 309
 310 @item
 311 Lines which contain as the first non-whitespace character a @code{$}
 312 followed by a whitespace character are comment and are also ignored.
 313
 314 @item
 315 If a line contains as the first non-whitespace characters the sequence
 316 @code{$set} followed by a whitespace character an additional argument
 317 is required to follow.  This argument can either be:
 318
 319 @itemize @minus
 320 @item
 321 a number.  In this case the value of this number determines the set
 322 to which the following messages are added.
 323
 324 @item
 325 an identifier consisting of alphanumeric characters plus the underscore
 326 character.  In this case the set get automatically a number assigned.
 327 This value is one added to the largest set number which so far appeared.
 328
 329 How to use the symbolic names is explained in section @ref{Common Usage}.
 330
 331 It is an error if a symbol name appears more than once.  All following
 332 messages are placed in a set with this number.
 333 @end itemize
 334
 335 @item
 336 If a line contains as the first non-whitespace characters the sequence
 337 @code{$delset} followed by a whitespace character an additional argument
 338 is required to follow.  This argument can either be:
 339
 340 @itemize @minus
 341 @item
 342 a number.  In this case the value of this number determines the set
 343 which will be deleted.
 344
 345 @item
 346 an identifier consisting of alphanumeric characters plus the underscore
 347 character.  This symbolic identifier must match a name for a set which
 348 previously was defined.  It is an error if the name is unknown.
 349 @end itemize
 350
 351 In both cases all messages in the specified set will be removed.  They
 352 will not appear in the output.  But if this set is later again selected
 353 with a @code{$set} command again messages could be added and these
 354 messages will appear in the output.
 355
 356 @item
 357 If a line contains after leading whitespaces the sequence
 358 @code{$quote}, the quoting character used for this input file is
 359 changed to the first non-whitespace character following the
 360 @code{$quote}.  If no non-whitespace character is present before the
 361 line ends quoting is disable.
 362
 363 By default no quoting character is used.  In this mode strings are
 364 terminated with the first unescaped line break.  If there is a
 365 @code{$quote} sequence present newline need not be escaped.  Instead a
 366 string is terminated with the first unescaped appearance of the quote
 367 character.
 368
 369 A common usage of this feature would be to set the quote character to
 370 @code{"}.  Then any appearance of the @code{"} in the strings must
 371 be escaped using the backslash (i.e., @code{\"} must be written).
 372
 373 @item
 374 Any other line must start with a number or an alphanumeric identifier
 375 (with the underscore character included).  The following characters
 376 (starting at the first non-whitespace character) will form the string
 377 which gets associated with the currently selected set and the message
 378 number represented by the number and identifier respectively.
 379
 380 If the start of the line is a number the message number is obvious.  It
 381 is an error if the same message number already appeared for this set.
 382
 383 If the leading token was an identifier the message number gets
 384 automatically assigned.  The value is the current maximum messages
 385 number for this set plus one.  It is an error if the identifier was
 386 already used for a message in this set.  It is ok to reuse the
 387 identifier for a message in another thread.  How to use the symbolic
 388 identifiers will be explained below (@pxref{Common Usage}).  There is
 389 one limitation with the identifier: it must not be @code{Set}.  The
 390 reason will be explained below.
 391
 392 Please note that you must use a quoting character if a message contains
 393 leading whitespace.  Since one cannot guarantee this never happens it is
 394 probably a good idea to always use quoting.
 395
 396 The text of the messages can contain escape characters.  The usual bunch
 397 of characters known from the @w{ISO C} language are recognized
 398 (@code{\n}, @code{\t}, @code{\v}, @code{\b}, @code{\r}, @code{\f},
 399 @code{\\}, and @code{\@var{nnn}}, where @var{nnn} is the octal coding of
 400 a character code).
 401 @end itemize
 402
 403 @strong{Important:} The handling of identifiers instead of numbers for
 404 the set and messages is a GNU extension.  Systems strictly following the
 405 X/Open specification do not have this feature.  An example for a message
 406 catalog file is this:
 407
 408 @smallexample
 409 $ This is a leading comment.
 410 $quote "
 411
 412 $set SetOne
 413 1 Message with ID 1.
 414 two "   Message with ID \"two\", which gets the value 2 assigned"
 415
 416 $set SetTwo
 417 $ Since the last set got the number 1 assigned this set has number 2.
 418 4000 "The numbers can be arbitrary, they need not start at one."
 419 @end smallexample
 420
 421 This small example shows various aspects:
 422 @itemize @bullet
 423 @item
 424 Lines 1 and 9 are comments since they start with @code{$} followed by
 425 a whitespace.
 426 @item
 427 The quoting character is set to @code{"}.  Otherwise the quotes in the
 428 message definition would have to be left away and in this case the
 429 message with the identifier @code{two} would loose its leading whitespace.
 430 @item
 431 Mixing numbered messages with message having symbolic names is no
 432 problem and the numbering happens automatically.
 433 @end itemize
 434
 435
 436 While this file format is pretty easy it is not the best possible for
 437 use in a running program.  The @code{catopen} function would have to
 438 parser the file and handle syntactic errors gracefully.  This is not so
 439 easy and the whole process is pretty slow.  Therefore the @code{catgets}
 440 functions expect the data in another more compact and ready-to-use file
 441 format.  There is a special program @code{gencat} which is explained in
 442 detail in the next section.
 443
 444 Files in this other format are not human readable.  To be easy to use by
 445 programs it is a binary file.  But the format is byte order independent
 446 so translation files can be shared by systems of arbitrary architecture
 447 (as long as they use the GNU C Library).
 448
 449 Details about the binary file format are not important to know since
 450 these files are always created by the @code{gencat} program.  The
 451 sources of the GNU C Library also provide the sources for the
 452 @code{gencat} program and so the interested reader can look through
 453 these source files to learn about the file format.
 454
 455
 456 @node The gencat program
 457 @subsection Generate Message Catalogs files
 458
 459 @cindex gencat
 460 The @code{gencat} program is specified in the X/Open standard and the
 461 GNU implementation follows this specification and so allows to process
 462 all correctly formed input files.  Additionally some extension are
 463 implemented which help to work in a more reasonable way with the
 464 @code{catgets} functions.
 465
 466 The @code{gencat} program can be invoked in two ways:
 467
 468 @example
 469 `gencat [@var{Option}]@dots{} [@var{Output-File} [@var{Input-File}]@dots{}]`
 470 @end example
 471
 472 This is the interface defined in the X/Open standard.  If no
 473 @var{Input-File} parameter is given input will be read from standard
 474 input.  Multiple input files will be read as if they are concatenated.
 475 If @var{Output-File} is also missing, the output will be written to
 476 standard output.  To provide the interface one is used from other
 477 programs a second interface is provided.
 478
 479 @smallexample
 480 `gencat [@var{Option}]@dots{} -o @var{Output-File} [@var{Input-File}]@dots{}`
 481 @end smallexample
 482
 483 The option @samp{-o} is used to specify the output file and all file
 484 arguments are used as input files.
 485
 486 Beside this one can use @file{-} or @file{/dev/stdin} for
 487 @var{Input-File} to denote the standard input.  Corresponding one can
 488 use @file{-} and @file{/dev/stdout} for @var{Output-File} to denote
 489 standard output.  Using @file{-} as a file name is allowed in X/Open
 490 while using the device names is a GNU extension.
 491
 492 The @code{gencat} program works by concatenating all input files and
 493 then @strong{merge} the resulting collection of message sets with a
 494 possibly existing output file.  This is done by removing all messages
 495 with set/message number tuples matching any of the generated messages
 496 from the output file and then adding all the new messages.  To
 497 regenerate a catalog file while ignoring the old contents therefore
 498 requires to remove the output file if it exists.  If the output is
 499 written to standard output no merging takes place.
 500
 501 @noindent
 502 The following table shows the options understood by the @code{gencat}
 503 program.  The X/Open standard does not specify any option for the
 504 program so all of these are GNU extensions.
 505
 506 @table @samp
 507 @item -V
 508 @itemx --version
 509 Print the version information and exit.
 510 @item -h
 511 @itemx --help
 512 Print a usage message listing all available options, then exit successfully.
 513 @item --new
 514 Do never merge the new messages from the input files with the old content
 515 of the output files.  The old content of the output file is discarded.
 516 @item -H
 517 @itemx --header=name
 518 This option is used to emit the symbolic names given to sets and
 519 messages in the input files for use in the program.  Details about how
 520 to use this are given in the next section.  The @var{name} parameter to
 521 this option specifies the name of the output file.  It will contain a
 522 number of C preprocessor @code{#define}s to associate a name with a
 523 number.
 524
 525 Please note that the generated file only contains the symbols from the
 526 input files.  If the output is merged with the previous content of the
 527 output file the possibly existing symbols from the file(s) which
 528 generated the old output files are not in the generated header file.
 529 @end table
 530
 531
 532 @node Common Usage
 533 @subsection How to use the @code{catgets} interface
 534
 535 The @code{catgets} functions can be used in two different ways.  By
 536 following slavishly the X/Open specs and not relying on the extension
 537 and by using the GNU extensions.  We will take a look at the former
 538 method first to understand the benefits of extensions.
 539
 540 @subsubsection Not using symbolic names
 541
 542 Since the X/Open format of the message catalog files does not allow
 543 symbol names we have to work with numbers all the time.  When we start
 544 writing a program we have to replace all appearances of translatable
 545 strings with something like
 546
 547 @smallexample
 548 catgets (catdesc, set, msg, "string")
 549 @end smallexample
 550
 551 @noindent
 552 @var{catgets} is retrieved from a call to @code{catopen} which is
 553 normally done once at the program start.  The @code{"string"} is the
 554 string we want to translate.  The problems start with the set and
 555 message numbers.
 556
 557 In a bigger program several programmers usually work at the same time on
 558 the program and so coordinating the number allocation is crucial.
 559 Though no two different strings must be indexed by the same tuple of
 560 numbers it is highly desirable to reuse the numbers for equal strings
 561 with equal translations (please note that there might be strings which
 562 are equal in one language but have different translations due to
 563 difference contexts).
 564
 565 The allocation process can be relaxed a bit by different set numbers for
 566 different parts of the program.  So the number of developers who have to
 567 coordinate the allocation can be reduced.  But still lists must be keep
 568 track of the allocation and errors can easily happen.  These errors
 569 cannot be discovered by the compiler or the @code{catgets} functions.
 570 Only the user of the program might see wrong messages printed.  In the
 571 worst cases the messages are so irritating that they cannot be
 572 recognized as wrong.  Think about the translations for @code{"true"} and
 573 @code{"false"} being exchanged.  This could result in a disaster.
 574
 575
 576 @subsubsection Using symbolic names
 577
 578 The problems mentioned in the last section derive from the fact that:
 579
 580 @enumerate
 581 @item
 582 the numbers are allocated once and due to the possibly frequent use of
 583 them it is difficult to change a number later.
 584 @item
 585 the numbers do not allow to guess anything about the string and
 586 therefore collisions can easily happen.
 587 @end enumerate
 588
 589 By constantly using symbolic names and by providing a method which maps
 590 the string content to a symbolic name (however this will happen) one can
 591 prevent both problems above.  The cost of this is that the programmer
 592 has to write a complete message catalog file while s/he is writing the
 593 program itself.
 594
 595 This is necessary since the symbolic names must be mapped to numbers
 596 before the program sources can be compiled.  In the last section it was
 597 described how to generate a header containing the mapping of the names.
 598 E.g., for the example message file given in the last section we could
 599 call the @code{gencat} program as follow (assume @file{ex.msg} contains
 600 the sources).
 601
 602 @smallexample
 603 gencat -H ex.h -o ex.cat ex.msg
 604 @end smallexample
 605
 606 @noindent
 607 This generates a header file with the following content:
 608
 609 @smallexample
 610 #define SetTwoSet 0x2   /* u.msg:8 */
 611
 612 #define SetOneSet 0x1   /* u.msg:4 */
 613 #define SetOnetwo 0x2   /* u.msg:6 */
 614 @end smallexample
 615
 616 As can be seen the various symbols given in the source file are mangled
 617 to generate unique identifiers and these identifiers get numbers
 618 assigned.  Reading the source file and knowing about the rules will
 619 allow to predict the content of the header file (it is deterministic)
 620 but this is not necessary.  The @code{gencat} program can take care for
 621 everything.  All the programmer has to do is to put the generated header
 622 file in the dependency list of the source files of her/his project and
 623 to add a rules to regenerate the header of any of the input files
 624 change.
 625
 626 One word about the symbol mangling.  Every symbol consists of two parts:
 627 the name of the message set plus the name of the message or the special
 628 string @code{Set}.  So @code{SetOnetwo} means this macro can be used to
 629 access the translation with identifier @code{two} in the message set
 630 @code{SetOne}.
 631
 632 The other names denote the names of the message sets.  The special
 633 string @code{Set} is used in the place of the message identifier.
 634
 635 If in the code the second string of the set @code{SetOne} is used the C
 636 code should look like this:
 637
 638 @smallexample
 639 catgets (catdesc, SetOneSet, SetOnetwo,
 640          "   Message with ID \"two\", which gets the value 2 assigned")
 641 @end smallexample
 642
 643 Writing the function this way will allow to change the message number
 644 and even the set number without requiring any change in the C source
 645 code.  (The text of the string is normally not the same; this is only
 646 for this example.)
 647
 648
 649 @subsubsection How does to this allow to develop
 650
 651 To illustrate the usual way to work with the symbolic version numbers
 652 here is a little example.  Assume we want to write the very complex and
 653 famous greeting program.  We start by writing the code as usual:
 654
 655 @smallexample
 656 #include <stdio.h>
 657 int
 658 main (void)
 659 @{
 660   printf ("Hello, world!\n");
 661   return 0;
 662 @}
 663 @end smallexample
 664
 665 Now we want to internationalize the message and therefore replace the
 666 message with whatever the user wants.
 667
 668 @smallexample
 669 #include <nl_types.h>
 670 #include <stdio.h>
 671 #include "msgnrs.h"
 672 int
 673 main (void)
 674 @{
 675   nl_catd catdesc = catopen ("hello.cat", NL_CAT_LOCALE);
 676   printf (catgets (catdesc, SetMainSet, SetMainHello,
 677                    "Hello, world!\n"));
 678   catclose (catdesc);
 679   return 0;
 680 @}
 681 @end smallexample
 682
 683 We see how the catalog object is opened and the returned descriptor used
 684 in the other function calls.  It is not really necessary to check for
 685 failure of any of the functions since even in these situations the
 686 functions will behave reasonable.  They simply will be return a
 687 translation.
 688
 689 What remains unspecified here are the constants @code{SetMainSet} and
 690 @code{SetMainHello}.  These are the symbolic names describing the
 691 message.  To get the actual definitions which match the information in
 692 the catalog file we have to create the message catalog source file and
 693 process it using the @code{gencat} program.
 694
 695 @smallexample
 696 $ Messages for the famous greeting program.
 697 $quote "
 698
 699 $set Main
 700 Hello "Hallo, Welt!\n"
 701 @end smallexample
 702
 703 Now we can start building the program (assume the message catalog source
 704 file is named @file{hello.msg} and the program source file @file{hello.c}):
 705
 706 @smallexample
 707 @cartouche
 708 % gencat -H msgnrs.h -o hello.cat hello.msg
 709 % cat msgnrs.h
 710 #define MainSet 0x1     /* hello.msg:4 */
 711 #define MainHello 0x1   /* hello.msg:5 */
 712 % gcc -o hello hello.c -I.
 713 % cp hello.cat /usr/share/locale/de/LC_MESSAGES
 714 % echo $LC_ALL
 715 de
 716 % ./hello
 717 Hallo, Welt!
 718 %
 719 @end cartouche
 720 @end smallexample
 721
 722 The call of the @code{gencat} program creates the missing header file
 723 @file{msgnrs.h} as well as the message catalog binary.  The former is
 724 used in the compilation of @file{hello.c} while the later is placed in a
 725 directory in which the @code{catopen} function will try to locate it.
 726 Please check the @code{LC_ALL} environment variable and the default path
 727 for @code{catopen} presented in the description above.
 728
 729
 730 @node The Uniforum approach
 731 @section The Uniforum approach to Message Translation
 732
 733 Sun Microsystems tried to standardize a different approach to message
 734 translation in the Uniforum group.  There never was a real standard
 735 defined but still the interface was used in Sun's operation systems.
 736 Since this approach fits better in the development process of free
 737 software it is also used throughout the GNU package and the GNU
 738 @file{gettext} package provides support for this outside the GNU C
 739 Library.
 740
 741 The code of the @file{libintl} from GNU @file{gettext} is the same as
 742 the code in the GNU C Library.  So the documentation in the GNU
 743 @file{gettext} manual is also valid for the functionality here.  The
 744 following text will describe the library functions in detail.  But the
 745 numerous helper programs are not described in this manual.  Instead
 746 people should read the GNU @file{gettext} manual
 747 (@pxref{Top,,GNU gettext utilities,gettext,Native Language Support Library and Tools}).
 748 We will only give a short overview.
 749
 750 Though the @code{catgets} functions are available by default on more
 751 systems the @code{gettext} interface is at least as portable as the
 752 former.  The GNU @file{gettext} package can be used wherever the
 753 functions are not available.
 754
 755
 756 @menu
 757 * Message catalogs with gettext::  The @code{gettext} family of functions.
 758 * Helper programs for gettext::    Programs to handle message catalogs
 759                                     for @code{gettext}.
 760 @end menu
 761
 762
 763 @node Message catalogs with gettext
 764 @subsection The @code{gettext} family of functions
 765
 766 The paradigms underlying the @code{gettext} approach to message
 767 translations is different from that of the @code{catgets} functions the
 768 basic functionally is equivalent.  There are functions of the following
 769 categories:
 770
 771 @menu
 772 * Translation with gettext::    What has to be done to translate a message.
 773 * Locating gettext catalog::    How to determine which catalog to be used.
 774 * Using gettextized software::  The possibilities of the user to influence
 775                                  the way @code{gettext} works.
 776 @end menu
 777
 778 @node Translation with gettext
 779 @subsubsection What has to be done to translate a message?
 780
 781 The @code{gettext} functions have a very simple interface.  The most
 782 basic function just takes the string which shall be translated as the
 783 argument and it returns the translation.  This is fundamentally
 784 different from the @code{catgets} approach where an extra key is
 785 necessary and the original string is only used for the error case.
 786
 787 If the string which has to be translated is the only argument this of
 788 course means the string itself is the key.  I.e., the translation will
 789 be selected based on the original string.  The message catalogs must
 790 therefore contain the original strings plus one translation for any such
 791 string.  The task of the @code{gettext} function is it to compare the
 792 argument string with the available strings in the catalog and return the
 793 appropriate translation.  Of course this process is optimized so that
 794 this process is not more expensive than an access using an atomic key
 795 like in @code{catgets}.
 796
 797 The @code{gettext} approach has some advantages but also some
 798 disadvantages.  Please see the GNU @file{gettext} manual for a detailed
 799 discussion of the pros and cons.
 800
 801 All the definitions and declarations for @code{gettext} can be found in
 802 the @file{libintl.h} header file.  On systems where these functions are
 803 not part of the C library they can be found in a separate library named
 804 @file{libintl.a} (or accordingly different for shared libraries).
 805
 806 @deftypefun {char *} gettext (const char *@var{msgid})
 807 The @code{gettext} function searches the currently selected message
 808 catalogs for a string which is equal to @var{msgid}.  If there is such a
 809 string available it is returned.  Otherwise the argument string
 810 @var{msgid} is returned.
 811
 812 Please note that all though the return value is @code{char *} the
 813 returned string must not be changed.  This broken type results from the
 814 history of the function and does not reflect the way the function should
 815 be used.
 816
 817 Please note that above we wrote ``message catalogs'' (plural).  This is
 818 a speciality of the GNU implementation of these functions and we will
 819 say more about this in section @xref{Locating gettext catalog} when we
 820 talk about the ways message catalogs are selected.
 821
 822 The @code{gettext} function does not modify the value of the global
 823 @var{errno} variable.  This is necessary to make it possible to write
 824 something like
 825
 826 @smallexample
 827   printf (gettext ("Operation failed: %m\n"));
 828 @end smallexample
 829
 830 Here the @var{errno} value is used in the @code{printf} function while
 831 processing the @code{%m} format element and if the @code{gettext}
 832 function would change this value (it is called before @code{printf} is
 833 called) we would get a wrong message.
 834
 835 So there is no easy way to detect a missing message catalog beside
 836 comparing the argument string with the result.  But it is normally the
 837 task of the user to react on missing catalogs.  The program cannot guess
 838 when a message catalog is really necessary since for a user who s peaks
 839 the language the program was developed in does not need any translation.
 840 @end deftypefun
 841
 842 The remaining two functions to access the message catalog add some
 843 functionality to select a message catalog which is not the default one.
 844 This is important if parts of the program are developed independently.
 845 Every part can have its own message catalog and all of them can be used
 846 at the same time.  The C library itself is an example: internally it
 847 uses the @code{gettext} functions but since it must not depend on a
 848 currently selected default message catalog it must specify all ambiguous
 849 information.
 850
 851 @deftypefun {char *} dgettext (const char *@var{domainname}, const char *@var{msgid})
 852 The @code{dgettext} functions acts just like the @code{gettext}
 853 function.  It only takes an additional first argument @var{domainname}
 854 which guides the selection of the message catalogs which are searched
 855 for the translation.  If the @var{domainname} parameter is the null
 856 pointer the @code{dgettext} function is exactly equivalent to
 857 @code{gettext} since the default value for the domain name is used.
 858
 859 As for @code{gettext} the return value type is @code{char *} which is an
 860 anachronism.  The returned string must never be modified.
 861 @end deftypefun
 862
 863 @deftypefun {char *} dcgettext (const char *@var{domainname}, const char *@var{msgid}, int @var{category})
 864 The @code{dcgettext} adds another argument to those which
 865 @code{dgettext} takes.  This argument @var{category} specifies the last
 866 piece of information needed to localize the message catalog.  I.e., the
 867 domain name and the locale category exactly specify which message
 868 catalog has to be used (relative to a given directory, see below).
 869
 870 The @code{dgettext} function can be expressed in terms of
 871 @code{dcgettext} by using
 872
 873 @smallexample
 874 dcgettext (domain, string, LC_MESSAGES)
 875 @end smallexample
 876
 877 @noindent
 878 instead of
 879
 880 @smallexample
 881 dgettext (domain, string)
 882 @end smallexample
 883
 884 This also shows which values are expected for the third parameter.  One
 885 has to use the available selectors for the categories available in
 886 @file{locale.h}.  Normally the available values are @code{LC_CTYPE},
 887 @code{LC_COLLATE}, @code{LC_MESSAGES}, @code{LC_MONETARY},
 888 @code{LC_NUMERIC}, and @code{LC_TIME}.  Please note that @code{LC_ALL}
 889 must not be used and even though the names might suggest this, there is
 890 no relation to the environments variables of this name.
 891
 892 The @code{dcgettext} function is only implemented for compatibility with
 893 other systems which have @code{gettext} functions.  There is not really
 894 any situation where it is necessary (or useful) to use a different value
 895 but @code{LC_MESSAGES} in for the @var{category} parameter.  We are
 896 dealing with messages here and any other choice can only be irritating.
 897
 898 As for @code{gettext} the return value type is @code{char *} which is an
 899 anachronism.  The returned string must never be modified.
 900 @end deftypefun
 901
 902 When using the three functions above in a program it is a frequent case
 903 that the @var{msgid} argument is a constant string.  So it is worth to
 904 optimize this case.  Thinking shortly about this one will realize that
 905 as long as no new message catalog is loaded the translation of a message
 906 will not change.  I.e., the algorithm to determine the translation is
 907 deterministic.
 908
 909 Exactly this is what the optimizations implemented in the
 910 @file{libintl.h} header will use.  Whenever a program is compiler with
 911 the GNU C compiler, optimization is selected and the @var{msgid}
 912 argument to @code{gettext}, @code{dgettext} or @code{dcgettext} is a
 913 constant string the actual function call will only be done the first
 914 time the message is used and then always only if any new message catalog
 915 was loaded and so the result of the translation lookup might be
 916 different.  See the @file{libintl.h} header file for details.  For the
 917 user it is only important to know that the result is always the same,
 918 independent of the compiler or compiler options in use.
 919
 920
 921 @node Locating gettext catalog
 922 @subsubsection How to determine which catalog to be used
 923
 924 The functions to retrieve the translations for a given message have a
 925 remarkable simple interface.  But to provide the user of the program
 926 still the opportunity to select exactly the translation s/he wants and
 927 also to provide the programmer the possibility to influence the way to
 928 locate the search for catalogs files there is a quite complicated
 929 underlying mechanism which controls all this.  The code is complicated
 930 the use is easy.
 931
 932 Basically we have two different tasks to perform which can also be
 933 performed by the @code{catgets} functions:
 934
 935 @enumerate
 936 @item
 937 Locate the set of message catalogs.  There are a number of files for
 938 different languages and which all belong to the package.  Usually they
 939 are all stored in the filesystem below a certain directory.
 940
 941 There can be arbitrary many packages installed and they can follow
 942 different guidelines for the placement of their files.
 943
 944 @item
 945 Relative to the location specified by the package the actual translation
 946 files must be searched, based on the wishes of the user.  I.e., for each
 947 language the user selects the program should be able to locate the
 948 appropriate file.
 949 @end enumerate
 950
 951 This is the functionality required by the specifications for
 952 @code{gettext} and this is also what the @code{catgets} functions are
 953 able to do.  But there are some problems unresolved:
 954
 955 @itemize @bullet
 956 @item
 957 The language to be used can be specified in several different ways.
 958 There is no generally accepted standard for this and the user always
 959 expects the program understand what s/he means.  E.g., to select the
 960 German translation one could write @code{de}, @code{german}, or
 961 @code{deutsch} and the program should always react the same.
 962
 963 @item
 964 Sometimes the specification of the user is too detailed.  If s/he, e.g.,
 965 specifies @code{de_DE.ISO-8859-1} which means German, spoken in Germany,
 966 coded using the @w{ISO 8859-1} character set there is the possibility
 967 that a message catalog matching this exactly is not available.  But
 968 there could be a catalog matching @code{de} and if the character set
 969 used on the machine is always @w{ISO 8859-1} there is no reason why this
 970 later message catalog should not be used.  (We call this @dfn{message
 971 inheritance}.)
 972
 973 @item
 974 If a catalog for a wanted language is not available it is not always the
 975 second best choice to fall back on the language of the developer and
 976 simply not translate any message.  Instead a user might be better able
 977 to read the messages in another language and so the user of the program
 978 should be able to define an precedence order of languages.
 979 @end itemize
 980
 981 We can divide the configuration actions in two parts: the one is
 982 performed by the programmer, the other by the user.  We will start with
 983 the functions the programmer can use since the user configuration will
 984 be based on this.
 985
 986 As the functions described in the last sections already mention separate
 987 sets of messages can be selected by a @dfn{domain name}.  This is a
 988 simple string which should be unique for each program part with uses a
 989 separate domain.  It is possible to use in one program arbitrary many
 990 domains at the same time.  E.g., the GNU C Library itself uses a domain
 991 named @code{libc} while the program using the C Library could use a
 992 domain named @code{foo}.  The important point is that at any time
 993 exactly one domain is active.  This is controlled with the following
 994 function.
 995
 996 @deftypefun {char *} textdomain (const char *@var{domainname})
 997 The @code{textdomain} function sets the default domain, which is used in
 998 all future @code{gettext} calls, to @var{domainname}.  Please note that
 999 @code{dgettext} and @code{dcgettext} calls are not influenced if the
1000 @var{domainname} parameter of these functions is not the null pointer.
1001
1002 Before the first call to @code{textdomain} the default domain is
1003 @code{messages}.  This is the name specified in the specification of
1004 the @code{gettext} API.  This name is as good as any other name.  No
1005 program should ever really use a domain with this name since this can
1006 only lead to problems.
1007
1008 The function returns the value which is from now on taken as the default
1009 domain.  If the system went out of memory the returned value is
1010 @code{NULL} and the global variable @var{errno} is set to @code{ENOMEM}.
1011 Despite the return value type being @code{char *} the return string must
1012 not be changed.  It is allocated internally by the @code{textdomain}
1013 function.
1014
1015 If the @var{domainname} parameter is the null pointer no new default
1016 domain is set.  Instead the currently selected default domain is
1017 returned.
1018
1019 If the @var{domainname} parameter is the empty string the default domain
1020 is reset to its initial value, the domain with the name @code{messages}.
1021 This possibility is questionable to use since the domain @code{messages}
1022 really never should be used.
1023 @end deftypefun
1024
1025 @deftypefun {char *} bindtextdomain (const char *@var{domainname}, const char *@var{dirname})
1026 The @code{bindtextdomain} function can be used to specify the directly
1027 which contains the message catalogs for domain @var{domainname} for the
1028 different languages.  To be correct, this is the directory where the
1029 hierarchy of directories is expected.  Details are explained below.
1030
1031 For the programmer it is important to note that the translations which
1032 come with the program have be placed in a directory hierarchy starting
1033 at, say, @file{/foo/bar}.  Then the program should make a
1034 @code{bindtextdomain} call to bind the domain for the current program to
1035 this directory.  So it is made sure the catalogs are found.  A correctly
1036 running program does not depend on the user setting an environment
1037 variable.
1038
1039 The @code{bindtextdomain} function can be used several times and if the
1040 @var{domainname} argument is different the previously bounded domains
1041 will not be overwritten.
1042
1043 If the program which wish to use @code{bindtextdomain} at some point of
1044 time use the @code{chdir} function to change the current working
1045 directory it is important that the @var{dirname} strings ought to be an
1046 absolute pathname.  Otherwise the addressed directory might vary with
1047 the time.
1048
1049 If the @var{dirname} parameter is the null pointer @code{bindtextdomain}
1050 returns the currently selected directory for the domain with the name
1051 @var{domainname}.
1052
1053 the @code{bindtextdomain} function returns a pointer to a string
1054 containing the name of the selected directory name.  The string is
1055 allocated internally in the function and must not be changed by the
1056 user.  If the system went out of core during the execution of
1057 @code{bindtextdomain} the return value is @code{NULL} and the global
1058 variable @var{errno} is set accordingly.
1059 @end deftypefun
1060
1061
1062 @node Using gettextized software
1063 @subsubsection User influence on @code{gettext}
1064
1065 The last sections described what the programmer can do to
1066 internationalize the messages of the program.  But it is finally up to
1067 the user to select the message s/he wants to see.  S/He must understand
1068 them.
1069
1070 The POSIX locale model uses the environment variables @code{LC_COLLATE},
1071 @code{LC_CTYPE}, @code{LC_MESSAGES}, @code{LC_MONETARY}, @code{NUMERIC},
1072 and @code{LC_TIME} to select the locale which is to be used.  This way
1073 the user can influence lots of functions.  As we mentioned above the
1074 @code{gettext} functions also take advantage of this.
1075
1076 To understand how this happens it is necessary to take a look at the
1077 various components of the filename which gets computed to locate a
1078 message catalog.  It is composed as follows:
1079
1080 @smallexample
1081 @var{dir_name}/@var{locale}/LC_@var{category}/@var{domain_name}.mo
1082 @end smallexample
1083
1084 The default value for @var{dir_name} is system specific.  It is computed
1085 from the value given as the prefix while configuring the C library.
1086 This value normally is @file{/usr} or @file{/}.  For the former the
1087 complete @var{dir_name} is:
1088
1089 @smallexample
1090 /usr/share/locale
1091 @end smallexample
1092
1093 We can use @file{/usr/share} since the @file{.mo} files containing the
1094 message catalogs are system independent, all systems can use the same
1095 files.  If the program executed the @code{bindtextdomain} function for
1096 the message domain that is currently handled the @code{dir_name}
1097 component is the exactly the value which was given to the function as
1098 the second parameter.  I.e., @code{bindtextdomain} allows to overwrite
1099 the only system dependent and fixed value to make it possible to
1100 address file everywhere in the filesystem.
1101
1102 The @var{category} is the name of the locale category which was selected
1103 in the program code.  For @code{gettext} and @code{dgettext} this is
1104 always @code{LC_MESSAGES}, for @code{dcgettext} this is selected by the
1105 value of the third parameter.  As said above it should be avoided to
1106 ever use a category other than @code{LC_MESSAGES}.
1107
1108 The @var{locale} component is computed based on the category used.  Just
1109 like for the @code{setlocale} function here comes the user selection
1110 into the play.  Some environment variables are examined in a fixed order
1111 and the first environment variable set determines the return value of
1112 the lookup process.  In detail, for the category @code{LC_xxx} the
1113 following variables in this order are examined:
1114
1115 @table @code
1116 @item LANGUAGE
1117 @item LC_ALL
1118 @item LC_xxx
1119 @item LANG
1120 @end table
1121
1122 This looks very familiar.  With the exception of the @code{LANGUAGE}
1123 environment variable this is exactly the lookup order the
1124 @code{setlocale} function uses.  But why introducing the @code{LANGUAGE}
1125 variable?
1126
1127 The reason is that the syntax of the values these variables can have is
1128 different to what is expected by the @code{setlocale} function.  If we
1129 would set @code{LC_ALL} to a value following the extended syntax that
1130 would mean the @code{setlocale} function will never be able to use the
1131 value of this variable as well.  An additional variable removes this
1132 problem plus we can select the language independently of the locale
1133 setting which sometimes is useful.
1134
1135 While for the @code{LC_xxx} variables the value should consist of
1136 exactly one specification of a locale the @code{LANGUAGE} variable's
1137 value can consist of a colon separated list of locale names.  The
1138 attentive reader will realize that this is the way we manage to
1139 implement one of our additional demands above: we want to be able to
1140 specify an ordered list of language.
1141
1142 Back to the constructed filename we have only one component missing.
1143 The @var{domain_name} part is the name which was either registered using
1144 the @code{textdomain} function or which was given to @code{dgettext} or
1145 @code{dcgettext} as the first parameter.  Now it becomes obvious that a
1146 good choice for the domain name in the program code is a string which is
1147 closely related to the program/package name.  E.g., for the GNU C
1148 Library the domain name is @code{libc}.
1149
1150 @noindent
1151 A limit piece of example code should show how the programmer is supposed
1152 to work:
1153
1154 @smallexample
1155 @{
1156   textdomain ("test-package");
1157   bindtextdomain ("test-package", "/usr/local/share/locale");
1158   puts (gettext ("Hello, world!");
1159 @}
1160 @end smallexample
1161
1162 At the program start the default domain is @code{messages}.  The
1163 @code{textdomain} call changes this to @code{test-package}.  The
1164 @code{bindtextdomain} call specifies that the message catalogs for the
1165 domain @code{test-package} can be found below the directory
1166 @file{/usr/local/share/locale}.
1167
1168 If now the user set in her/his environment the variable @code{LANGUAGE}
1169 to @code{de} the @code{gettext} function will try to use the
1170 translations from the file
1171
1172 @smallexample
1173 /usr/local/share/locale/de/LC_MESSAGES/test-package.mo
1174 @end smallexample
1175
1176 From the above descriptions it should be clear which component of this
1177 filename is determined by which source.
1178
1179 In the above example we assumed that the @code{LANGUAGE} environment
1180 variable to @code{de}.  This might be an appropriate selection but what
1181 happens if the user wants to use @code{LC_ALL} because of the wider
1182 usability and here the required value is @code{de_DE.ISO-8859-1}?  We
1183 already mentioned above that a situation like this is not infrequent.
1184 E.g., a person might prefer reading a dialect and if this is not
1185 available fall back on the standard language.
1186
1187 The @code{gettext} functions know about situations like this and can
1188 handle them gracefully.  The functions recognize the format of the value
1189 of the environment variable.  It can split the value is different pieces
1190 and by leaving out the only or the other part it can construct new
1191 values.  This happens of course in a predictable way.  To understand
1192 this one must know the format of the environment variable value.  There
1193 are to more or less standardized forms:
1194
1195 @table @emph
1196 @item X/Open Format
1197 @code{language[_territory[.codeset]][@@modifier]}
1198
1199 @item CEN Format (European Community Standard)
1200 @code{language[_territory][+audience][+special][,[sponsor][_revision]]}
1201 @end table
1202
1203 The functions will automatically recognize which format is used.  Less
1204 specific locale names will be stripped of in the order of the following
1205 list:
1206
1207 @enumerate
1208 @item
1209 @code{revision}
1210 @item
1211 @code{sponsor}
1212 @item
1213 @code{special}
1214 @item
1215 @code{codeset}
1216 @item
1217 @code{normalized codeset}
1218 @item
1219 @code{territory}
1220 @item
1221 @code{audience}/@code{modifier}
1222 @end enumerate
1223
1224 From the last entry one can see that the meaning of the @code{modifier}
1225 field in the X/Open format and the @code{audience} format have the same
1226 meaning.  Beside one can see that the @code{language} field for obvious
1227 reasons never will be dropped.
1228
1229 The only new thing is the @code{normalized codeset} entry.  This is
1230 another goodie which is introduced to help reducing the chaos which
1231 derives from the inability of the people to standardize the names of
1232 character sets.  Instead of @w{ISO-8859-1} one can often see @w{8859-1},
1233 @w{88591}, @w{iso8859-1}, or @w{iso_8859-1}.  The @code{normalized
1234 codeset} value is generated from the user-provided character set name by
1235 applying the following rules:
1236
1237 @enumerate
1238 @item
1239 Remove all characters beside numbers and letters.
1240 @item
1241 Fold letters to lowercase.
1242 @item
1243 If the same only contains digits prepend the string @code{"iso"}.
1244 @end enumerate
1245
1246 @noindent
1247 So all of the above name will be normalized to @code{iso88591}.  This
1248 allows the program user much more freely choosing the locale name.
1249
1250 Even this extended functionality still does not help to solve the
1251 problem that completely different names can be used to denote the same
1252 locale (e.g., @code{de} and @code{german}).  To be of help in this
1253 situation the locale implementation and also the @code{gettext}
1254 functions know about aliases.
1255
1256 The file @file{/usr/share/locale/locale.alias} (replace @file{/usr} with
1257 whatever prefix you used for configuring the C library) contains a
1258 mapping of alternative names to more regular names.  The system manager
1259 is free to add new entries to fill her/his own needs.  The selected
1260 locale from the environment is compared with the entries in the first
1261 column of this file ignoring the case.  If they match the value of the
1262 second column is used instead for the further handling.
1263
1264 In the description of the format of the environment variables we already
1265 mentioned the character set as a factor in the selection of the message
1266 catalog.  In fact, only catalogs which contain text written using the
1267 character set of the system/program can be used (directly; there will
1268 come a solution for this some day).  This means for the user that s/he
1269 will always have to take care for this.  If in the collection of the
1270 message catalogs there are files for the same language but coded using
1271 different character sets the user has to be careful.
1272
1273
1274 @node Helper programs for gettext
1275 @subsection Programs to handle message catalogs for @code{gettext}
1276
1277 The GNU C Library does not contain the source code for the programs to
1278 handle message catalogs for the @code{gettext} functions.  As part of
1279 the GNU project the GNU gettext package contains everything the
1280 developer needs.  The functionality provided by the tools in this
1281 package by far exceeds the abilities of the @code{gencat} program
1282 described above for the @code{catgets} functions.
1283
1284 There is a program @code{msgfmt} which is the equivalent program to the
1285 @code{gencat} program.  It generates from the human-readable and
1286 -editable form of the message catalog a binary file which can be used by
1287 the @code{gettext} functions.  But there are several more programs
1288 available.
1289
1290 The @code{xgettext} program can be used to automatically extract the
1291 translatable messages from a source file.  I.e., the programmer need not
1292 take care for the translations and the list of messages which have to be
1293 translated.  S/He will simply wrap the translatable string in calls to
1294 @code{gettext} et.al and the rest will be done by @code{xgettext}.  This
1295 program has a lot of option which help to customize the output or do
1296 help to understand the input better.
1297
1298 Other programs help to manage development cycle when new messages appear
1299 in the source files or when a new translation of the messages appear.
1300 here it should only be noted that using all the tools in GNU gettext it
1301 is possible to @emph{completely} automize the handling of message
1302 catalog.  Beside marking the translatable string in the source code and
1303 generating the translations the developers do not have anything to do
1304 themself.