gettext-tools/doc/gettext_4.html

   1 <HTML>
   2 <HEAD>
   3 <!-- This HTML file has been created by texi2html 1.52b
   4      from gettext.texi on 28 December 2015 -->
   5
   6 <META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
   7 <TITLE>GNU gettext utilities - 4  Preparing Program Sources</TITLE>
   8 </HEAD>
   9 <BODY>
  10 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_3.html">previous</A>, <A HREF="gettext_5.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
  11 <P><HR><P>
  12
  13
  14 <H1><A NAME="SEC16" HREF="gettext_toc.html#TOC16">4  Preparing Program Sources</A></H1>
  15 <P>
  16 <A NAME="IDX124"></A>
  17
  18 </P>
  19
  20 <P>
  21 For the programmer, changes to the C source code fall into three
  22 categories.  First, you have to make the localization functions
  23 known to all modules needing message translation.  Second, you should
  24 properly trigger the operation of GNU <CODE>gettext</CODE> when the program
  25 initializes, usually from the <CODE>main</CODE> function.  Last, you should
  26 identify, adjust and mark all constant strings in your program
  27 needing translation.
  28
  29 </P>
  30
  31
  32
  33 <H2><A NAME="SEC17" HREF="gettext_toc.html#TOC17">4.1  Importing the <CODE>gettext</CODE> declaration</A></H2>
  34
  35 <P>
  36 Presuming that your set of programs, or package, has been adjusted
  37 so all needed GNU <CODE>gettext</CODE> files are available, and your
  38 <TT>&lsquo;Makefile&rsquo;</TT> files are adjusted (see section <A HREF="gettext_13.html#SEC213">13  The Maintainer's View</A>), each C module
  39 having translated C strings should contain the line:
  40
  41 </P>
  42 <P>
  43 <A NAME="IDX125"></A>
  44
  45 <PRE>
  46 #include &#60;libintl.h&#62;
  47 </PRE>
  48
  49 <P>
  50 Similarly, each C module containing <CODE>printf()</CODE>/<CODE>fprintf()</CODE>/...
  51 calls with a format string that could be a translated C string (even if
  52 the C string comes from a different C module) should contain the line:
  53
  54 </P>
  55
  56 <PRE>
  57 #include &#60;libintl.h&#62;
  58 </PRE>
  59
  60
  61
  62 <H2><A NAME="SEC18" HREF="gettext_toc.html#TOC18">4.2  Triggering <CODE>gettext</CODE> Operations</A></H2>
  63
  64 <P>
  65 <A NAME="IDX126"></A>
  66 The initialization of locale data should be done with more or less
  67 the same code in every program, as demonstrated below:
  68
  69 </P>
  70
  71 <PRE>
  72 int
  73 main (int argc, char *argv[])
  74 {
  75   ...
  76   setlocale (LC_ALL, "");
  77   bindtextdomain (PACKAGE, LOCALEDIR);
  78   textdomain (PACKAGE);
  79   ...
  80 }
  81 </PRE>
  82
  83 <P>
  84 <VAR>PACKAGE</VAR> and <VAR>LOCALEDIR</VAR> should be provided either by
  85 <TT>&lsquo;config.h&rsquo;</TT> or by the Makefile.  For now consult the <CODE>gettext</CODE>
  86 or <CODE>hello</CODE> sources for more information.
  87
  88 </P>
  89 <P>
  90 <A NAME="IDX127"></A>
  91 <A NAME="IDX128"></A>
  92 The use of <CODE>LC_ALL</CODE> might not be appropriate for you.
  93 <CODE>LC_ALL</CODE> includes all locale categories and especially
  94 <CODE>LC_CTYPE</CODE>.  This latter category is responsible for determining
  95 character classes with the <CODE>isalnum</CODE> etc. functions from
  96 <TT>&lsquo;ctype.h&rsquo;</TT> which could especially for programs, which process some
  97 kind of input language, be wrong.  For example this would mean that a
  98 source code using the &ccedil; (c-cedilla character) is runnable in
  99 France but not in the U.S.
 100
 101 </P>
 102 <P>
 103 Some systems also have problems with parsing numbers using the
 104 <CODE>scanf</CODE> functions if an other but the <CODE>LC_ALL</CODE> locale category is
 105 used.  The standards say that additional formats but the one known in the
 106 <CODE>"C"</CODE> locale might be recognized.  But some systems seem to reject
 107 numbers in the <CODE>"C"</CODE> locale format.  In some situation, it might
 108 also be a problem with the notation itself which makes it impossible to
 109 recognize whether the number is in the <CODE>"C"</CODE> locale or the local
 110 format.  This can happen if thousands separator characters are used.
 111 Some locales define this character according to the national
 112 conventions to <CODE>'.'</CODE> which is the same character used in the
 113 <CODE>"C"</CODE> locale to denote the decimal point.
 114
 115 </P>
 116 <P>
 117 So it is sometimes necessary to replace the <CODE>LC_ALL</CODE> line in the
 118 code above by a sequence of <CODE>setlocale</CODE> lines
 119
 120 </P>
 121
 122 <PRE>
 123 {
 124   ...
 125   setlocale (LC_CTYPE, "");
 126   setlocale (LC_MESSAGES, "");
 127   ...
 128 }
 129 </PRE>
 130
 131 <P>
 132 <A NAME="IDX129"></A>
 133 <A NAME="IDX130"></A>
 134 <A NAME="IDX131"></A>
 135 <A NAME="IDX132"></A>
 136 <A NAME="IDX133"></A>
 137 <A NAME="IDX134"></A>
 138 <A NAME="IDX135"></A>
 139 On all POSIX conformant systems the locale categories <CODE>LC_CTYPE</CODE>,
 140 <CODE>LC_MESSAGES</CODE>, <CODE>LC_COLLATE</CODE>, <CODE>LC_MONETARY</CODE>,
 141 <CODE>LC_NUMERIC</CODE>, and <CODE>LC_TIME</CODE> are available.  On some systems
 142 which are only ISO C compliant, <CODE>LC_MESSAGES</CODE> is missing, but
 143 a substitute for it is defined in GNU gettext's <CODE>&#60;libintl.h&#62;</CODE> and
 144 in GNU gnulib's <CODE>&#60;locale.h&#62;</CODE>.
 145
 146 </P>
 147 <P>
 148 Note that changing the <CODE>LC_CTYPE</CODE> also affects the functions
 149 declared in the <CODE>&#60;ctype.h&#62;</CODE> standard header and some functions
 150 declared in the <CODE>&#60;string.h&#62;</CODE> and <CODE>&#60;stdlib.h&#62;</CODE> standard headers.
 151 If this is not
 152 desirable in your application (for example in a compiler's parser),
 153 you can use a set of substitute functions which hardwire the C locale,
 154 such as found in the modules <SAMP>&lsquo;c-ctype&rsquo;</SAMP>, <SAMP>&lsquo;c-strcase&rsquo;</SAMP>,
 155 <SAMP>&lsquo;c-strcasestr&rsquo;</SAMP>, <SAMP>&lsquo;c-strtod&rsquo;</SAMP>, <SAMP>&lsquo;c-strtold&rsquo;</SAMP> in the GNU gnulib
 156 source distribution.
 157
 158 </P>
 159 <P>
 160 It is also possible to switch the locale forth and back between the
 161 environment dependent locale and the C locale, but this approach is
 162 normally avoided because a <CODE>setlocale</CODE> call is expensive,
 163 because it is tedious to determine the places where a locale switch
 164 is needed in a large program's source, and because switching a locale
 165 is not multithread-safe.
 166
 167 </P>
 168
 169
 170 <H2><A NAME="SEC19" HREF="gettext_toc.html#TOC19">4.3  Preparing Translatable Strings</A></H2>
 171
 172 <P>
 173 <A NAME="IDX136"></A>
 174 Before strings can be marked for translations, they sometimes need to
 175 be adjusted.  Usually preparing a string for translation is done right
 176 before marking it, during the marking phase which is described in the
 177 next sections.  What you have to keep in mind while doing that is the
 178 following.
 179
 180 </P>
 181
 182 <UL>
 183 <LI>
 184
 185 Decent English style.
 186
 187 <LI>
 188
 189 Entire sentences.
 190
 191 <LI>
 192
 193 Split at paragraphs.
 194
 195 <LI>
 196
 197 Use format strings instead of string concatenation.
 198
 199 <LI>
 200
 201 Avoid unusual markup and unusual control characters.
 202 </UL>
 203
 204 <P>
 205 Let's look at some examples of these guidelines.
 206
 207 </P>
 208 <P>
 209 <A NAME="IDX137"></A>
 210 Translatable strings should be in good English style.  If slang language
 211 with abbreviations and shortcuts is used, often translators will not
 212 understand the message and will produce very inappropriate translations.
 213
 214 </P>
 215
 216 <PRE>
 217 "%s: is parameter\n"
 218 </PRE>
 219
 220 <P>
 221 This is nearly untranslatable: Is the displayed item <EM>a</EM> parameter or
 222 <EM>the</EM> parameter?
 223
 224 </P>
 225
 226 <PRE>
 227 "No match"
 228 </PRE>
 229
 230 <P>
 231 The ambiguity in this message makes it unintelligible: Is the program
 232 attempting to set something on fire? Does it mean "The given object does
 233 not match the template"? Does it mean "The template does not fit for any
 234 of the objects"?
 235
 236 </P>
 237 <P>
 238 <A NAME="IDX138"></A>
 239 In both cases, adding more words to the message will help both the
 240 translator and the English speaking user.
 241
 242 </P>
 243 <P>
 244 <A NAME="IDX139"></A>
 245 Translatable strings should be entire sentences.  It is often not possible
 246 to translate single verbs or adjectives in a substitutable way.
 247
 248 </P>
 249
 250 <PRE>
 251 printf ("File %s is %s protected", filename, rw ? "write" : "read");
 252 </PRE>
 253
 254 <P>
 255 Most translators will not look at the source and will thus only see the
 256 string <CODE>"File %s is %s protected"</CODE>, which is unintelligible.  Change
 257 this to
 258
 259 </P>
 260
 261 <PRE>
 262 printf (rw ? "File %s is write protected" : "File %s is read protected",
 263         filename);
 264 </PRE>
 265
 266 <P>
 267 This way the translator will not only understand the message, she will
 268 also be able to find the appropriate grammatical construction.  A French
 269 translator for example translates "write protected" like "protected
 270 against writing".
 271
 272 </P>
 273 <P>
 274 Entire sentences are also important because in many languages, the
 275 declination of some word in a sentence depends on the gender or the
 276 number (singular/plural) of another part of the sentence.  There are
 277 usually more interdependencies between words than in English.  The
 278 consequence is that asking a translator to translate two half-sentences
 279 and then combining these two half-sentences through dumb string concatenation
 280 will not work, for many languages, even though it would work for English.
 281 That's why translators need to handle entire sentences.
 282
 283 </P>
 284 <P>
 285 Often sentences don't fit into a single line.  If a sentence is output
 286 using two subsequent <CODE>printf</CODE> statements, like this
 287
 288 </P>
 289
 290 <PRE>
 291 printf ("Locale charset \"%s\" is different from\n", lcharset);
 292 printf ("input file charset \"%s\".\n", fcharset);
 293 </PRE>
 294
 295 <P>
 296 the translator would have to translate two half sentences, but nothing
 297 in the POT file would tell her that the two half sentences belong together.
 298 It is necessary to merge the two <CODE>printf</CODE> statements so that the
 299 translator can handle the entire sentence at once and decide at which
 300 place to insert a line break in the translation (if at all):
 301
 302 </P>
 303
 304 <PRE>
 305 printf ("Locale charset \"%s\" is different from\n\
 306 input file charset \"%s\".\n", lcharset, fcharset);
 307 </PRE>
 308
 309 <P>
 310 You may now ask: how about two or more adjacent sentences? Like in this case:
 311
 312 </P>
 313
 314 <PRE>
 315 puts ("Apollo 13 scenario: Stack overflow handling failed.");
 316 puts ("On the next stack overflow we will crash!!!");
 317 </PRE>
 318
 319 <P>
 320 Should these two statements merged into a single one? I would recommend to
 321 merge them if the two sentences are related to each other, because then it
 322 makes it easier for the translator to understand and translate both.  On
 323 the other hand, if one of the two messages is a stereotypic one, occurring
 324 in other places as well, you will do a favour to the translator by not
 325 merging the two.  (Identical messages occurring in several places are
 326 combined by xgettext, so the translator has to handle them once only.)
 327
 328 </P>
 329 <P>
 330 <A NAME="IDX140"></A>
 331 Translatable strings should be limited to one paragraph; don't let a
 332 single message be longer than ten lines.  The reason is that when the
 333 translatable string changes, the translator is faced with the task of
 334 updating the entire translated string.  Maybe only a single word will
 335 have changed in the English string, but the translator doesn't see that
 336 (with the current translation tools), therefore she has to proofread
 337 the entire message.
 338
 339 </P>
 340 <P>
 341 <A NAME="IDX141"></A>
 342 Many GNU programs have a <SAMP>&lsquo;--help&rsquo;</SAMP> output that extends over several
 343 screen pages.  It is a courtesy towards the translators to split such a
 344 message into several ones of five to ten lines each.  While doing that,
 345 you can also attempt to split the documented options into groups,
 346 such as the input options, the output options, and the informative
 347 output options.  This will help every user to find the option he is
 348 looking for.
 349
 350 </P>
 351 <P>
 352 <A NAME="IDX142"></A>
 353 <A NAME="IDX143"></A>
 354 Hardcoded string concatenation is sometimes used to construct English
 355 strings:
 356
 357 </P>
 358
 359 <PRE>
 360 strcpy (s, "Replace ");
 361 strcat (s, object1);
 362 strcat (s, " with ");
 363 strcat (s, object2);
 364 strcat (s, "?");
 365 </PRE>
 366
 367 <P>
 368 In order to present to the translator only entire sentences, and also
 369 because in some languages the translator might want to swap the order
 370 of <CODE>object1</CODE> and <CODE>object2</CODE>, it is necessary to change this
 371 to use a format string:
 372
 373 </P>
 374
 375 <PRE>
 376 sprintf (s, "Replace %s with %s?", object1, object2);
 377 </PRE>
 378
 379 <P>
 380 <A NAME="IDX144"></A>
 381 A similar case is compile time concatenation of strings.  The ISO C 99
 382 include file <CODE>&#60;inttypes.h&#62;</CODE> contains a macro <CODE>PRId64</CODE> that
 383 can be used as a formatting directive for outputting an <SAMP>&lsquo;int64_t&rsquo;</SAMP>
 384 integer through <CODE>printf</CODE>.  It expands to a constant string, usually
 385 "d" or "ld" or "lld" or something like this, depending on the platform.
 386 Assume you have code like
 387
 388 </P>
 389
 390 <PRE>
 391 printf ("The amount is %0" PRId64 "\n", number);
 392 </PRE>
 393
 394 <P>
 395 The <CODE>gettext</CODE> tools and library have special support for these
 396 <CODE>&#60;inttypes.h&#62;</CODE> macros.  You can therefore simply write
 397
 398 </P>
 399
 400 <PRE>
 401 printf (gettext ("The amount is %0" PRId64 "\n"), number);
 402 </PRE>
 403
 404 <P>
 405 The PO file will contain the string "The amount is %0&#60;PRId64&#62;\n".
 406 The translators will provide a translation containing "%0&#60;PRId64&#62;"
 407 as well, and at runtime the <CODE>gettext</CODE> function's result will
 408 contain the appropriate constant string, "d" or "ld" or "lld".
 409
 410 </P>
 411 <P>
 412 This works only for the predefined <CODE>&#60;inttypes.h&#62;</CODE> macros.  If
 413 you have defined your own similar macros, let's say <SAMP>&lsquo;MYPRId64&rsquo;</SAMP>,
 414 that are not known to <CODE>xgettext</CODE>, the solution for this problem
 415 is to change the code like this:
 416
 417 </P>
 418
 419 <PRE>
 420 char buf1[100];
 421 sprintf (buf1, "%0" MYPRId64, number);
 422 printf (gettext ("The amount is %s\n"), buf1);
 423 </PRE>
 424
 425 <P>
 426 This means, you put the platform dependent code in one statement, and the
 427 internationalization code in a different statement.  Note that a buffer length
 428 of 100 is safe, because all available hardware integer types are limited to
 429 128 bits, and to print a 128 bit integer one needs at most 54 characters,
 430 regardless whether in decimal, octal or hexadecimal.
 431
 432 </P>
 433 <P>
 434 <A NAME="IDX145"></A>
 435 <A NAME="IDX146"></A>
 436 All this applies to other programming languages as well.  For example, in
 437 Java and C#, string concatenation is very frequently used, because it is a
 438 compiler built-in operator.  Like in C, in Java, you would change
 439
 440 </P>
 441
 442 <PRE>
 443 System.out.println("Replace "+object1+" with "+object2+"?");
 444 </PRE>
 445
 446 <P>
 447 into a statement involving a format string:
 448
 449 </P>
 450
 451 <PRE>
 452 System.out.println(
 453     MessageFormat.format("Replace {0} with {1}?",
 454                          new Object[] { object1, object2 }));
 455 </PRE>
 456
 457 <P>
 458 Similarly, in C#, you would change
 459
 460 </P>
 461
 462 <PRE>
 463 Console.WriteLine("Replace "+object1+" with "+object2+"?");
 464 </PRE>
 465
 466 <P>
 467 into a statement involving a format string:
 468
 469 </P>
 470
 471 <PRE>
 472 Console.WriteLine(
 473     String.Format("Replace {0} with {1}?", object1, object2));
 474 </PRE>
 475
 476 <P>
 477 <A NAME="IDX147"></A>
 478 <A NAME="IDX148"></A>
 479 Unusual markup or control characters should not be used in translatable
 480 strings.  Translators will likely not understand the particular meaning
 481 of the markup or control characters.
 482
 483 </P>
 484 <P>
 485 For example, if you have a convention that <SAMP>&lsquo;|&rsquo;</SAMP> delimits the
 486 left-hand and right-hand part of some GUI elements, translators will
 487 often not understand it without specific comments.  It might be
 488 better to have the translator translate the left-hand and right-hand
 489 part separately.
 490
 491 </P>
 492 <P>
 493 Another example is the <SAMP>&lsquo;argp&rsquo;</SAMP> convention to use a single <SAMP>&lsquo;\v&rsquo;</SAMP>
 494 (vertical tab) control character to delimit two sections inside a
 495 string.  This is flawed.  Some translators may convert it to a simple
 496 newline, some to blank lines.  With some PO file editors it may not be
 497 easy to even enter a vertical tab control character.  So, you cannot
 498 be sure that the translation will contain a <SAMP>&lsquo;\v&rsquo;</SAMP> character, at the
 499 corresponding position.  The solution is, again, to let the translator
 500 translate two separate strings and combine at run-time the two translated
 501 strings with the <SAMP>&lsquo;\v&rsquo;</SAMP> required by the convention.
 502
 503 </P>
 504 <P>
 505 HTML markup, however, is common enough that it's probably ok to use in
 506 translatable strings.  But please bear in mind that the GNU gettext tools
 507 don't verify that the translations are well-formed HTML.
 508
 509 </P>
 510
 511
 512 <H2><A NAME="SEC20" HREF="gettext_toc.html#TOC20">4.4  How Marks Appear in Sources</A></H2>
 513 <P>
 514 <A NAME="IDX149"></A>
 515
 516 </P>
 517 <P>
 518 All strings requiring translation should be marked in the C sources.  Marking
 519 is done in such a way that each translatable string appears to be
 520 the sole argument of some function or preprocessor macro.  There are
 521 only a few such possible functions or macros meant for translation,
 522 and their names are said to be marking keywords.  The marking is
 523 attached to strings themselves, rather than to what we do with them.
 524 This approach has more uses.  A blatant example is an error message
 525 produced by formatting.  The format string needs translation, as
 526 well as some strings inserted through some <SAMP>&lsquo;%s&rsquo;</SAMP> specification
 527 in the format, while the result from <CODE>sprintf</CODE> may have so many
 528 different instances that it is impractical to list them all in some
 529 <SAMP>&lsquo;error_string_out()&rsquo;</SAMP> routine, say.
 530
 531 </P>
 532 <P>
 533 This marking operation has two goals.  The first goal of marking
 534 is for triggering the retrieval of the translation, at run time.
 535 The keyword is possibly resolved into a routine able to dynamically
 536 return the proper translation, as far as possible or wanted, for the
 537 argument string.  Most localizable strings are found in executable
 538 positions, that is, attached to variables or given as parameters to
 539 functions.  But this is not universal usage, and some translatable
 540 strings appear in structured initializations.  See section <A HREF="gettext_4.html#SEC23">4.7  Special Cases of Translatable Strings</A>.
 541
 542 </P>
 543 <P>
 544 The second goal of the marking operation is to help <CODE>xgettext</CODE>
 545 at properly extracting all translatable strings when it scans a set
 546 of program sources and produces PO file templates.
 547
 548 </P>
 549 <P>
 550 The canonical keyword for marking translatable strings is
 551 <SAMP>&lsquo;gettext&rsquo;</SAMP>, it gave its name to the whole GNU <CODE>gettext</CODE>
 552 package.  For packages making only light use of the <SAMP>&lsquo;gettext&rsquo;</SAMP>
 553 keyword, macro or function, it is easily used <EM>as is</EM>.  However,
 554 for packages using the <CODE>gettext</CODE> interface more heavily, it
 555 is usually more convenient to give the main keyword a shorter, less
 556 obtrusive name.  Indeed, the keyword might appear on a lot of strings
 557 all over the package, and programmers usually do not want nor need
 558 their program sources to remind them forcefully, all the time, that they
 559 are internationalized.  Further, a long keyword has the disadvantage
 560 of using more horizontal space, forcing more indentation work on
 561 sources for those trying to keep them within 79 or 80 columns.
 562
 563 </P>
 564 <P>
 565 <A NAME="IDX150"></A>
 566 Many packages use <SAMP>&lsquo;_&rsquo;</SAMP> (a simple underline) as a keyword,
 567 and write <SAMP>&lsquo;_("Translatable string")&rsquo;</SAMP> instead of <SAMP>&lsquo;gettext
 568 ("Translatable string")&rsquo;</SAMP>.  Further, the coding rule, from GNU standards,
 569 wanting that there is a space between the keyword and the opening
 570 parenthesis is relaxed, in practice, for this particular usage.
 571 So, the textual overhead per translatable string is reduced to
 572 only three characters: the underline and the two parentheses.
 573 However, even if GNU <CODE>gettext</CODE> uses this convention internally,
 574 it does not offer it officially.  The real, genuine keyword is truly
 575 <SAMP>&lsquo;gettext&rsquo;</SAMP> indeed.  It is fairly easy for those wanting to use
 576 <SAMP>&lsquo;_&rsquo;</SAMP> instead of <SAMP>&lsquo;gettext&rsquo;</SAMP> to declare:
 577
 578 </P>
 579
 580 <PRE>
 581 #include &#60;libintl.h&#62;
 582 #define _(String) gettext (String)
 583 </PRE>
 584
 585 <P>
 586 instead of merely using <SAMP>&lsquo;#include &#60;libintl.h&#62;&rsquo;</SAMP>.
 587
 588 </P>
 589 <P>
 590 The marking keywords <SAMP>&lsquo;gettext&rsquo;</SAMP> and <SAMP>&lsquo;_&rsquo;</SAMP> take the translatable
 591 string as sole argument.  It is also possible to define marking functions
 592 that take it at another argument position.  It is even possible to make
 593 the marked argument position depend on the total number of arguments of
 594 the function call; this is useful in C++.  All this is achieved using
 595 <CODE>xgettext</CODE>'s <SAMP>&lsquo;--keyword&rsquo;</SAMP> option.  How to pass such an option
 596 to <CODE>xgettext</CODE>, assuming that <CODE>gettextize</CODE> is used, is described
 597 in section <A HREF="gettext_13.html#SEC220">13.4.3  <TT>&lsquo;Makevars&rsquo;</TT> in <TT>&lsquo;po/&rsquo;</TT></A> and section <A HREF="gettext_13.html#SEC237">13.5.6  AM_XGETTEXT_OPTION in <TT>&lsquo;po.m4&rsquo;</TT></A>.
 598
 599 </P>
 600 <P>
 601 Note also that long strings can be split across lines, into multiple
 602 adjacent string tokens.  Automatic string concatenation is performed
 603 at compile time according to ISO C and ISO C++; <CODE>xgettext</CODE> also
 604 supports this syntax.
 605
 606 </P>
 607 <P>
 608 Later on, the maintenance is relatively easy.  If, as a programmer,
 609 you add or modify a string, you will have to ask yourself if the
 610 new or altered string requires translation, and include it within
 611 <SAMP>&lsquo;_()&rsquo;</SAMP> if you think it should be translated.  For example, <SAMP>&lsquo;"%s"&rsquo;</SAMP>
 612 is an example of string <EM>not</EM> requiring translation.  But
 613 <SAMP>&lsquo;"%s: %d"&rsquo;</SAMP> <EM>does</EM> require translation, because in French, unlike
 614 in English, it's customary to put a space before a colon.
 615
 616 </P>
 617
 618
 619 <H2><A NAME="SEC21" HREF="gettext_toc.html#TOC21">4.5  Marking Translatable Strings</A></H2>
 620 <P>
 621 <A NAME="IDX151"></A>
 622
 623 </P>
 624 <P>
 625 In PO mode, one set of features is meant more for the programmer than
 626 for the translator, and allows him to interactively mark which strings,
 627 in a set of program sources, are translatable, and which are not.
 628 Even if it is a fairly easy job for a programmer to find and mark
 629 such strings by other means, using any editor of his choice, PO mode
 630 makes this work more comfortable.  Further, this gives translators
 631 who feel a little like programmers, or programmers who feel a little
 632 like translators, a tool letting them work at marking translatable
 633 strings in the program sources, while simultaneously producing a set of
 634 translation in some language, for the package being internationalized.
 635
 636 </P>
 637 <P>
 638 <A NAME="IDX152"></A>
 639 The set of program sources, targeted by the PO mode commands describe
 640 here, should have an Emacs tags table constructed for your project,
 641 prior to using these PO file commands.  This is easy to do.  In any
 642 shell window, change the directory to the root of your project, then
 643 execute a command resembling:
 644
 645 </P>
 646
 647 <PRE>
 648 etags src/*.[hc] lib/*.[hc]
 649 </PRE>
 650
 651 <P>
 652 presuming here you want to process all <TT>&lsquo;.h&rsquo;</TT> and <TT>&lsquo;.c&rsquo;</TT> files
 653 from the <TT>&lsquo;src/&rsquo;</TT> and <TT>&lsquo;lib/&rsquo;</TT> directories.  This command will
 654 explore all said files and create a <TT>&lsquo;TAGS&rsquo;</TT> file in your root
 655 directory, somewhat summarizing the contents using a special file
 656 format Emacs can understand.
 657
 658 </P>
 659 <P>
 660 <A NAME="IDX153"></A>
 661 For packages following the GNU coding standards, there is
 662 a make goal <CODE>tags</CODE> or <CODE>TAGS</CODE> which constructs the tag files in
 663 all directories and for all files containing source code.
 664
 665 </P>
 666 <P>
 667 Once your <TT>&lsquo;TAGS&rsquo;</TT> file is ready, the following commands assist
 668 the programmer at marking translatable strings in his set of sources.
 669 But these commands are necessarily driven from within a PO file
 670 window, and it is likely that you do not even have such a PO file yet.
 671 This is not a problem at all, as you may safely open a new, empty PO
 672 file, mainly for using these commands.  This empty PO file will slowly
 673 fill in while you mark strings as translatable in your program sources.
 674
 675 </P>
 676 <DL COMPACT>
 677
 678 <DT><KBD>,</KBD>
 679 <DD>
 680 <A NAME="IDX154"></A>
 681 Search through program sources for a string which looks like a
 682 candidate for translation (<CODE>po-tags-search</CODE>).
 683
 684 <DT><KBD>M-,</KBD>
 685 <DD>
 686 <A NAME="IDX155"></A>
 687 Mark the last string found with <SAMP>&lsquo;_()&rsquo;</SAMP> (<CODE>po-mark-translatable</CODE>).
 688
 689 <DT><KBD>M-.</KBD>
 690 <DD>
 691 <A NAME="IDX156"></A>
 692 Mark the last string found with a keyword taken from a set of possible
 693 keywords.  This command with a prefix allows some management of these
 694 keywords (<CODE>po-select-mark-and-mark</CODE>).
 695
 696 </DL>
 697
 698 <P>
 699 <A NAME="IDX157"></A>
 700 The <KBD>,</KBD> (<CODE>po-tags-search</CODE>) command searches for the next
 701 occurrence of a string which looks like a possible candidate for
 702 translation, and displays the program source in another Emacs window,
 703 positioned in such a way that the string is near the top of this other
 704 window.  If the string is too big to fit whole in this window, it is
 705 positioned so only its end is shown.  In any case, the cursor
 706 is left in the PO file window.  If the shown string would be better
 707 presented differently in different native languages, you may mark it
 708 using <KBD>M-,</KBD> or <KBD>M-.</KBD>.  Otherwise, you might rather ignore it
 709 and skip to the next string by merely repeating the <KBD>,</KBD> command.
 710
 711 </P>
 712 <P>
 713 A string is a good candidate for translation if it contains a sequence
 714 of three or more letters.  A string containing at most two letters in
 715 a row will be considered as a candidate if it has more letters than
 716 non-letters.  The command disregards strings containing no letters,
 717 or isolated letters only.  It also disregards strings within comments,
 718 or strings already marked with some keyword PO mode knows (see below).
 719
 720 </P>
 721 <P>
 722 If you have never told Emacs about some <TT>&lsquo;TAGS&rsquo;</TT> file to use, the
 723 command will request that you specify one from the minibuffer, the
 724 first time you use the command.  You may later change your <TT>&lsquo;TAGS&rsquo;</TT>
 725 file by using the regular Emacs command <KBD>M-x visit-tags-table</KBD>,
 726 which will ask you to name the precise <TT>&lsquo;TAGS&rsquo;</TT> file you want
 727 to use.  See section ‘Tag Tables’ in <CITE>The Emacs Editor</CITE>.
 728
 729 </P>
 730 <P>
 731 Each time you use the <KBD>,</KBD> command, the search resumes from where it was
 732 left by the previous search, and goes through all program sources,
 733 obeying the <TT>&lsquo;TAGS&rsquo;</TT> file, until all sources have been processed.
 734 However, by giving a prefix argument to the command (<KBD>C-u
 735 ,)</KBD>, you may request that the search be restarted all over again
 736 from the first program source; but in this case, strings that you
 737 recently marked as translatable will be automatically skipped.
 738
 739 </P>
 740 <P>
 741 Using this <KBD>,</KBD> command does not prevent using of other regular
 742 Emacs tags commands.  For example, regular <CODE>tags-search</CODE> or
 743 <CODE>tags-query-replace</CODE> commands may be used without disrupting the
 744 independent <KBD>,</KBD> search sequence.  However, as implemented, the
 745 <EM>initial</EM> <KBD>,</KBD> command (or the <KBD>,</KBD> command is used with a
 746 prefix) might also reinitialize the regular Emacs tags searching to the
 747 first tags file, this reinitialization might be considered spurious.
 748
 749 </P>
 750 <P>
 751 <A NAME="IDX158"></A>
 752 <A NAME="IDX159"></A>
 753 The <KBD>M-,</KBD> (<CODE>po-mark-translatable</CODE>) command will mark the
 754 recently found string with the <SAMP>&lsquo;_&rsquo;</SAMP> keyword.  The <KBD>M-.</KBD>
 755 (<CODE>po-select-mark-and-mark</CODE>) command will request that you type
 756 one keyword from the minibuffer and use that keyword for marking
 757 the string.  Both commands will automatically create a new PO file
 758 untranslated entry for the string being marked, and make it the
 759 current entry (making it easy for you to immediately proceed to its
 760 translation, if you feel like doing it right away).  It is possible
 761 that the modifications made to the program source by <KBD>M-,</KBD> or
 762 <KBD>M-.</KBD> render some source line longer than 80 columns, forcing you
 763 to break and re-indent this line differently.  You may use the <KBD>O</KBD>
 764 command from PO mode, or any other window changing command from
 765 Emacs, to break out into the program source window, and do any
 766 needed adjustments.  You will have to use some regular Emacs command
 767 to return the cursor to the PO file window, if you want command
 768 <KBD>,</KBD> for the next string, say.
 769
 770 </P>
 771 <P>
 772 The <KBD>M-.</KBD> command has a few built-in speedups, so you do not
 773 have to explicitly type all keywords all the time.  The first such
 774 speedup is that you are presented with a <EM>preferred</EM> keyword,
 775 which you may accept by merely typing <KBD><KBD>RET</KBD></KBD> at the prompt.
 776 The second speedup is that you may type any non-ambiguous prefix of the
 777 keyword you really mean, and the command will complete it automatically
 778 for you.  This also means that PO mode has to <EM>know</EM> all
 779 your possible keywords, and that it will not accept mistyped keywords.
 780
 781 </P>
 782 <P>
 783 If you reply <KBD>?</KBD> to the keyword request, the command gives a
 784 list of all known keywords, from which you may choose.  When the
 785 command is prefixed by an argument (<KBD>C-u M-.</KBD>), it inhibits
 786 updating any program source or PO file buffer, and does some simple
 787 keyword management instead.  In this case, the command asks for a
 788 keyword, written in full, which becomes a new allowed keyword for
 789 later <KBD>M-.</KBD> commands.  Moreover, this new keyword automatically
 790 becomes the <EM>preferred</EM> keyword for later commands.  By typing
 791 an already known keyword in response to <KBD>C-u M-.</KBD>, one merely
 792 changes the <EM>preferred</EM> keyword and does nothing more.
 793
 794 </P>
 795 <P>
 796 All keywords known for <KBD>M-.</KBD> are recognized by the <KBD>,</KBD> command
 797 when scanning for strings, and strings already marked by any of those
 798 known keywords are automatically skipped.  If many PO files are opened
 799 simultaneously, each one has its own independent set of known keywords.
 800 There is no provision in PO mode, currently, for deleting a known
 801 keyword, you have to quit the file (maybe using <KBD>q</KBD>) and reopen
 802 it afresh.  When a PO file is newly brought up in an Emacs window, only
 803 <SAMP>&lsquo;gettext&rsquo;</SAMP> and <SAMP>&lsquo;_&rsquo;</SAMP> are known as keywords, and <SAMP>&lsquo;gettext&rsquo;</SAMP>
 804 is preferred for the <KBD>M-.</KBD> command.  In fact, this is not useful to
 805 prefer <SAMP>&lsquo;_&rsquo;</SAMP>, as this one is already built in the <KBD>M-,</KBD> command.
 806
 807 </P>
 808
 809
 810 <H2><A NAME="SEC22" HREF="gettext_toc.html#TOC22">4.6  Special Comments preceding Keywords</A></H2>
 811
 812 <P>
 813 <A NAME="IDX160"></A>
 814 In C programs strings are often used within calls of functions from the
 815 <CODE>printf</CODE> family.  The special thing about these format strings is
 816 that they can contain format specifiers introduced with <KBD>%</KBD>.  Assume
 817 we have the code
 818
 819 </P>
 820
 821 <PRE>
 822 printf (gettext ("String `%s' has %d characters\n"), s, strlen (s));
 823 </PRE>
 824
 825 <P>
 826 A possible German translation for the above string might be:
 827
 828 </P>
 829
 830 <PRE>
 831 "%d Zeichen lang ist die Zeichenkette `%s'"
 832 </PRE>
 833
 834 <P>
 835 A C programmer, even if he cannot speak German, will recognize that
 836 there is something wrong here.  The order of the two format specifiers
 837 is changed but of course the arguments in the <CODE>printf</CODE> don't have.
 838 This will most probably lead to problems because now the length of the
 839 string is regarded as the address.
 840
 841 </P>
 842 <P>
 843 To prevent errors at runtime caused by translations, the <CODE>msgfmt</CODE>
 844 tool can check statically whether the arguments in the original and the
 845 translation string match in type and number.  If this is not the case
 846 and the <SAMP>&lsquo;-c&rsquo;</SAMP> option has been passed to <CODE>msgfmt</CODE>, <CODE>msgfmt</CODE>
 847 will give an error and refuse to produce a MO file.  Thus consistent
 848 use of <SAMP>&lsquo;msgfmt -c&rsquo;</SAMP> will catch the error, so that it cannot cause
 849 problems at runtime.
 850
 851 </P>
 852 <P>
 853 If the word order in the above German translation would be correct one
 854 would have to write
 855
 856 </P>
 857
 858 <PRE>
 859 "%2$d Zeichen lang ist die Zeichenkette `%1$s'"
 860 </PRE>
 861
 862 <P>
 863 The routines in <CODE>msgfmt</CODE> know about this special notation.
 864
 865 </P>
 866 <P>
 867 Because not all strings in a program will be format strings, it is not
 868 useful for <CODE>msgfmt</CODE> to test all the strings in the <TT>&lsquo;.po&rsquo;</TT> file.
 869 This might cause problems because the string might contain what looks
 870 like a format specifier, but the string is not used in <CODE>printf</CODE>.
 871
 872 </P>
 873 <P>
 874 Therefore <CODE>xgettext</CODE> adds a special tag to those messages it
 875 thinks might be a format string.  There is no absolute rule for this,
 876 only a heuristic.  In the <TT>&lsquo;.po&rsquo;</TT> file the entry is marked using the
 877 <CODE>c-format</CODE> flag in the <CODE>#,</CODE> comment line (see section <A HREF="gettext_3.html#SEC15">3  The Format of PO Files</A>).
 878
 879 </P>
 880 <P>
 881 <A NAME="IDX161"></A>
 882 <A NAME="IDX162"></A>
 883 The careful reader now might say that this again can cause problems.
 884 The heuristic might guess it wrong.  This is true and therefore
 885 <CODE>xgettext</CODE> knows about a special kind of comment which lets
 886 the programmer take over the decision.  If in the same line as or
 887 the immediately preceding line to the <CODE>gettext</CODE> keyword
 888 the <CODE>xgettext</CODE> program finds a comment containing the words
 889 <CODE>xgettext:c-format</CODE>, it will mark the string in any case with
 890 the <CODE>c-format</CODE> flag.  This kind of comment should be used when
 891 <CODE>xgettext</CODE> does not recognize the string as a format string but
 892 it really is one and it should be tested.  Please note that when the
 893 comment is in the same line as the <CODE>gettext</CODE> keyword, it must be
 894 before the string to be translated.
 895
 896 </P>
 897 <P>
 898 This situation happens quite often.  The <CODE>printf</CODE> function is often
 899 called with strings which do not contain a format specifier.  Of course
 900 one would normally use <CODE>fputs</CODE> but it does happen.  In this case
 901 <CODE>xgettext</CODE> does not recognize this as a format string but what
 902 happens if the translation introduces a valid format specifier?  The
 903 <CODE>printf</CODE> function will try to access one of the parameters but none
 904 exists because the original code does not pass any parameters.
 905
 906 </P>
 907 <P>
 908 <CODE>xgettext</CODE> of course could make a wrong decision the other way
 909 round, i.e. a string marked as a format string actually is not a format
 910 string.  In this case the <CODE>msgfmt</CODE> might give too many warnings and
 911 would prevent translating the <TT>&lsquo;.po&rsquo;</TT> file.  The method to prevent
 912 this wrong decision is similar to the one used above, only the comment
 913 to use must contain the string <CODE>xgettext:no-c-format</CODE>.
 914
 915 </P>
 916 <P>
 917 If a string is marked with <CODE>c-format</CODE> and this is not correct the
 918 user can find out who is responsible for the decision.  See
 919 section <A HREF="gettext_5.html#SEC28">5.1  Invoking the <CODE>xgettext</CODE> Program</A> to see how the <CODE>--debug</CODE> option can be
 920 used for solving this problem.
 921
 922 </P>
 923
 924
 925 <H2><A NAME="SEC23" HREF="gettext_toc.html#TOC23">4.7  Special Cases of Translatable Strings</A></H2>
 926
 927 <P>
 928 <A NAME="IDX163"></A>
 929 The attentive reader might now point out that it is not always possible
 930 to mark translatable string with <CODE>gettext</CODE> or something like this.
 931 Consider the following case:
 932
 933 </P>
 934
 935 <PRE>
 936 {
 937   static const char *messages[] = {
 938     "some very meaningful message",
 939     "and another one"
 940   };
 941   const char *string;
 942   ...
 943   string
 944     = index &#62; 1 ? "a default message" : messages[index];
 945
 946   fputs (string);
 947   ...
 948 }
 949 </PRE>
 950
 951 <P>
 952 While it is no problem to mark the string <CODE>"a default message"</CODE> it
 953 is not possible to mark the string initializers for <CODE>messages</CODE>.
 954 What is to be done?  We have to fulfill two tasks.  First we have to mark the
 955 strings so that the <CODE>xgettext</CODE> program (see section <A HREF="gettext_5.html#SEC28">5.1  Invoking the <CODE>xgettext</CODE> Program</A>)
 956 can find them, and second we have to translate the string at runtime
 957 before printing them.
 958
 959 </P>
 960 <P>
 961 The first task can be fulfilled by creating a new keyword, which names a
 962 no-op.  For the second we have to mark all access points to a string
 963 from the array.  So one solution can look like this:
 964
 965 </P>
 966
 967 <PRE>
 968 #define gettext_noop(String) String
 969
 970 {
 971   static const char *messages[] = {
 972     gettext_noop ("some very meaningful message"),
 973     gettext_noop ("and another one")
 974   };
 975   const char *string;
 976   ...
 977   string
 978     = index &#62; 1 ? gettext ("a default message") : gettext (messages[index]);
 979
 980   fputs (string);
 981   ...
 982 }
 983 </PRE>
 984
 985 <P>
 986 Please convince yourself that the string which is written by
 987 <CODE>fputs</CODE> is translated in any case.  How to get <CODE>xgettext</CODE> know
 988 the additional keyword <CODE>gettext_noop</CODE> is explained in section <A HREF="gettext_5.html#SEC28">5.1  Invoking the <CODE>xgettext</CODE> Program</A>.
 989
 990 </P>
 991 <P>
 992 The above is of course not the only solution.  You could also come along
 993 with the following one:
 994
 995 </P>
 996
 997 <PRE>
 998 #define gettext_noop(String) String
 999
1000 {
1001   static const char *messages[] = {
1002     gettext_noop ("some very meaningful message"),
1003     gettext_noop ("and another one")
1004   };
1005   const char *string;
1006   ...
1007   string
1008     = index &#62; 1 ? gettext_noop ("a default message") : messages[index];
1009
1010   fputs (gettext (string));
1011   ...
1012 }
1013 </PRE>
1014
1015 <P>
1016 But this has a drawback.  The programmer has to take care that
1017 he uses <CODE>gettext_noop</CODE> for the string <CODE>"a default message"</CODE>.
1018 A use of <CODE>gettext</CODE> could have in rare cases unpredictable results.
1019
1020 </P>
1021 <P>
1022 One advantage is that you need not make control flow analysis to make
1023 sure the output is really translated in any case.  But this analysis is
1024 generally not very difficult.  If it should be in any situation you can
1025 use this second method in this situation.
1026
1027 </P>
1028
1029
1030 <H2><A NAME="SEC24" HREF="gettext_toc.html#TOC24">4.8  Letting Users Report Translation Bugs</A></H2>
1031
1032 <P>
1033 Code sometimes has bugs, but translations sometimes have bugs too.  The
1034 users need to be able to report them.  Reporting translation bugs to the
1035 programmer or maintainer of a package is not very useful, since the
1036 maintainer must never change a translation, except on behalf of the
1037 translator.  Hence the translation bugs must be reported to the
1038 translators.
1039
1040 </P>
1041 <P>
1042 Here is a way to organize this so that the maintainer does not need to
1043 forward translation bug reports, nor even keep a list of the addresses of
1044 the translators or their translation teams.
1045
1046 </P>
1047 <P>
1048 Every program has a place where is shows the bug report address.  For
1049 GNU programs, it is the code which handles the “--help” option,
1050 typically in a function called “usage”.  In this place, instruct the
1051 translator to add her own bug reporting address.  For example, if that
1052 code has a statement
1053
1054 </P>
1055
1056 <PRE>
1057 printf (_("Report bugs to &#60;%s&#62;.\n"), PACKAGE_BUGREPORT);
1058 </PRE>
1059
1060 <P>
1061 you can add some translator instructions like this:
1062
1063 </P>
1064
1065 <PRE>
1066 /* TRANSLATORS: The placeholder indicates the bug-reporting address
1067    for this package.  Please add _another line_ saying
1068    "Report translation bugs to &#60;...&#62;\n" with the address for translation
1069    bugs (typically your translation team's web or email address).  */
1070 printf (_("Report bugs to &#60;%s&#62;.\n"), PACKAGE_BUGREPORT);
1071 </PRE>
1072
1073 <P>
1074 These will be extracted by <SAMP>&lsquo;xgettext&rsquo;</SAMP>, leading to a .pot file that
1075 contains this:
1076
1077 </P>
1078
1079 <PRE>
1080 #. TRANSLATORS: The placeholder indicates the bug-reporting address
1081 #. for this package.  Please add _another line_ saying
1082 #. "Report translation bugs to &#60;...&#62;\n" with the address for translation
1083 #. bugs (typically your translation team's web or email address).
1084 #: src/hello.c:178
1085 #, c-format
1086 msgid "Report bugs to &#60;%s&#62;.\n"
1087 msgstr ""
1088 </PRE>
1089
1090
1091
1092 <H2><A NAME="SEC25" HREF="gettext_toc.html#TOC25">4.9  Marking Proper Names for Translation</A></H2>
1093
1094 <P>
1095 Should names of persons, cities, locations etc. be marked for translation
1096 or not?  People who only know languages that can be written with Latin
1097 letters (English, Spanish, French, German, etc.) are tempted to say “no”,
1098 because names usually do not change when transported between these languages.
1099 However, in general when translating from one script to another, names
1100 are translated too, usually phonetically or by transliteration.  For
1101 example, Russian or Greek names are converted to the Latin alphabet when
1102 being translated to English, and English or French names are converted
1103 to the Katakana script when being translated to Japanese.  This is
1104 necessary because the speakers of the target language in general cannot
1105 read the script the name is originally written in.
1106
1107 </P>
1108 <P>
1109 As a programmer, you should therefore make sure that names are marked
1110 for translation, with a special comment telling the translators that it
1111 is a proper name and how to pronounce it.  In its simple form, it looks
1112 like this:
1113
1114 </P>
1115
1116 <PRE>
1117 printf (_("Written by %s.\n"),
1118         /* TRANSLATORS: This is a proper name.  See the gettext
1119            manual, section Names.  Note this is actually a non-ASCII
1120            name: The first name is (with Unicode escapes)
1121            "Fran\u00e7ois" or (with HTML entities) "Fran&#38;ccedil;ois".
1122            Pronunciation is like "fraa-swa pee-nar".  */
1123         _("Francois Pinard"));
1124 </PRE>
1125
1126 <P>
1127 The GNU gnulib library offers a module <SAMP>&lsquo;propername&rsquo;</SAMP>
1128 (<A HREF="http://www.gnu.org/software/gnulib/MODULES.html#module=propername">http://www.gnu.org/software/gnulib/MODULES.html#module=propername</A>)
1129 which takes care to automatically append the original name, in parentheses,
1130 to the translated name.  For names that cannot be written in ASCII, it
1131 also frees the translator from the task of entering the appropriate non-ASCII
1132 characters if no script change is needed.  In this more comfortable form,
1133 it looks like this:
1134
1135 </P>
1136
1137 <PRE>
1138 printf (_("Written by %s and %s.\n"),
1139         proper_name ("Ulrich Drepper"),
1140         /* TRANSLATORS: This is a proper name.  See the gettext
1141            manual, section Names.  Note this is actually a non-ASCII
1142            name: The first name is (with Unicode escapes)
1143            "Fran\u00e7ois" or (with HTML entities) "Fran&#38;ccedil;ois".
1144            Pronunciation is like "fraa-swa pee-nar".  */
1145         proper_name_utf8 ("Francois Pinard", "Fran\303\247ois Pinard"));
1146 </PRE>
1147
1148 <P>
1149 You can also write the original name directly in Unicode (rather than with
1150 Unicode escapes or HTML entities) and denote the pronunciation using the
1151 International Phonetic Alphabet (see
1152 <A HREF="http://www.wikipedia.org/wiki/International_Phonetic_Alphabet">http://www.wikipedia.org/wiki/International_Phonetic_Alphabet</A>).
1153
1154 </P>
1155 <P>
1156 As a translator, you should use some care when translating names, because
1157 it is frustrating if people see their names mutilated or distorted.
1158
1159 </P>
1160 <P>
1161 If your language uses the Latin script, all you need to do is to reproduce
1162 the name as perfectly as you can within the usual character set of your
1163 language.  In this particular case, this means to provide a translation
1164 containing the c-cedilla character.  If your language uses a different
1165 script and the people speaking it don't usually read Latin words, it means
1166 transliteration.  If the programmer used the simple case, you should still
1167 give, in parentheses, the original writing of the name -- for the sake of
1168 the people that do read the Latin script.  If the programmer used the
1169 <SAMP>&lsquo;propername&rsquo;</SAMP> module mentioned above, you don't need to give the original
1170 writing of the name in parentheses, because the program will already do so.
1171 Here is an example, using Greek as the target script:
1172
1173 </P>
1174
1175 <PRE>
1176 #. This is a proper name.  See the gettext
1177 #. manual, section Names.  Note this is actually a non-ASCII
1178 #. name: The first name is (with Unicode escapes)
1179 #. "Fran\u00e7ois" or (with HTML entities) "Fran&#38;ccedil;ois".
1180 #. Pronunciation is like "fraa-swa pee-nar".
1181 msgid "Francois Pinard"
1182 msgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho"
1183        " (Francois Pinard)"
1184 </PRE>
1185
1186 <P>
1187 Because translation of names is such a sensitive domain, it is a good
1188 idea to test your translation before submitting it.
1189
1190 </P>
1191
1192
1193 <H2><A NAME="SEC26" HREF="gettext_toc.html#TOC26">4.10  Preparing Library Sources</A></H2>
1194
1195 <P>
1196 When you are preparing a library, not a program, for the use of
1197 <CODE>gettext</CODE>, only a few details are different.  Here we assume that
1198 the library has a translation domain and a POT file of its own.  (If
1199 it uses the translation domain and POT file of the main program, then
1200 the previous sections apply without changes.)
1201
1202 </P>
1203
1204 <OL>
1205 <LI>
1206
1207 The library code doesn't call <CODE>setlocale (LC_ALL, "")</CODE>.  It's the
1208 responsibility of the main program to set the locale.  The library's
1209 documentation should mention this fact, so that developers of programs
1210 using the library are aware of it.
1211
1212 <LI>
1213
1214 The library code doesn't call <CODE>textdomain (PACKAGE)</CODE>, because it
1215 would interfere with the text domain set by the main program.
1216
1217 <LI>
1218
1219 The initialization code for a program was
1220
1221
1222 <PRE>
1223   setlocale (LC_ALL, "");
1224   bindtextdomain (PACKAGE, LOCALEDIR);
1225   textdomain (PACKAGE);
1226 </PRE>
1227
1228 For a library it is reduced to
1229
1230
1231 <PRE>
1232   bindtextdomain (PACKAGE, LOCALEDIR);
1233 </PRE>
1234
1235 If your library's API doesn't already have an initialization function,
1236 you need to create one, containing at least the <CODE>bindtextdomain</CODE>
1237 invocation.  However, you usually don't need to export and document this
1238 initialization function: It is sufficient that all entry points of the
1239 library call the initialization function if it hasn't been called before.
1240 The typical idiom used to achieve this is a static boolean variable that
1241 indicates whether the initialization function has been called. Like this:
1242
1243
1244 <PRE>
1245 static bool libfoo_initialized;
1246
1247 static void
1248 libfoo_initialize (void)
1249 {
1250   bindtextdomain (PACKAGE, LOCALEDIR);
1251   libfoo_initialized = true;
1252 }
1253
1254 /* This function is part of the exported API.  */
1255 struct foo *
1256 create_foo (...)
1257 {
1258   /* Must ensure the initialization is performed.  */
1259   if (!libfoo_initialized)
1260     libfoo_initialize ();
1261   ...
1262 }
1263
1264 /* This function is part of the exported API.  The argument must be
1265    non-NULL and have been created through create_foo().  */
1266 int
1267 foo_refcount (struct foo *argument)
1268 {
1269   /* No need to invoke the initialization function here, because
1270      create_foo() must already have been called before.  */
1271   ...
1272 }
1273 </PRE>
1274
1275 <LI>
1276
1277 The usual declaration of the <SAMP>&lsquo;_&rsquo;</SAMP> macro in each source file was
1278
1279
1280 <PRE>
1281 #include &#60;libintl.h&#62;
1282 #define _(String) gettext (String)
1283 </PRE>
1284
1285 for a program.  For a library, which has its own translation domain,
1286 it reads like this:
1287
1288
1289 <PRE>
1290 #include &#60;libintl.h&#62;
1291 #define _(String) dgettext (PACKAGE, String)
1292 </PRE>
1293
1294 In other words, <CODE>dgettext</CODE> is used instead of <CODE>gettext</CODE>.
1295 Similarly, the <CODE>dngettext</CODE> function should be used in place of the
1296 <CODE>ngettext</CODE> function.
1297 </OL>
1298
1299 <P><HR><P>
1300 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_3.html">previous</A>, <A HREF="gettext_5.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
1301 </BODY>
1302 </HTML>