3 <!-- This HTML file has been created by texi2html 1.52b
4 from gettext.texi on 11 June 2016 -->
6 <META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
7 <TITLE>GNU gettext utilities - 6 Creating a New PO File</TITLE>
10 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_5.html">previous</A>, <A HREF="gettext_7.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
14 <H1><A NAME="SEC37" HREF="gettext_toc.html#TOC37">6 Creating a New PO File</A></H1>
20 When starting a new translation, the translator creates a file called
21 <TT>‘<VAR>LANG</VAR>.po’</TT>, as a copy of the <TT>‘<VAR>package</VAR>.pot’</TT> template
22 file with modifications in the initial comments (at the beginning of the file)
23 and in the header entry (the first entry, near the beginning of the file).
27 The easiest way to do so is by use of the <SAMP>‘msginit’</SAMP> program.
33 $ cd <VAR>PACKAGE</VAR>-<VAR>VERSION</VAR>
39 The alternative way is to do the copy and modifications by hand.
40 To do so, the translator copies <TT>‘<VAR>package</VAR>.pot’</TT> to
41 <TT>‘<VAR>LANG</VAR>.po’</TT>. Then she modifies the initial comments and
42 the header entry of this file.
48 <H2><A NAME="SEC38" HREF="gettext_toc.html#TOC38">6.1 Invoking the <CODE>msginit</CODE> Program</A></H2>
55 msginit [<VAR>option</VAR>]
61 The <CODE>msginit</CODE> program creates a new PO file, initializing the meta
62 information with values from the user's environment.
66 Here are more details. The following header fields of a PO file are
67 automatically filled, when possible.
72 <DT><SAMP>‘Project-Id-Version’</SAMP>
74 The value is guessed from the <CODE>configure</CODE> script or any other files
75 in the current directory.
77 <DT><SAMP>‘PO-Revision-Date’</SAMP>
79 The value is taken from the <CODE>PO-Creation-Data</CODE> in the input POT
80 file, or the current date is used.
82 <DT><SAMP>‘Last-Translator’</SAMP>
84 The value is taken from user's password file entry and the mailer
87 <DT><SAMP>‘Language-Team, Language’</SAMP>
89 These values are set according to the current locale and the predefined
90 list of translation teams.
92 <DT><SAMP>‘MIME-Version, Content-Type, Content-Transfer-Encoding’</SAMP>
94 These values are set according to the content of the POT file and the
95 current locale. If the POT file contains charset=UTF-8, it means that
96 the POT file contains non-ASCII characters, and we keep the UTF-8
97 encoding. Otherwise, when the POT file is plain ASCII, we use the
100 <DT><SAMP>‘Plural-Forms’</SAMP>
102 The value is first looked up from the embedded table.
104 As an experimental feature, you can instruct <CODE>msginit</CODE> to use the
105 information from Unicode CLDR, by setting the <CODE>GETTEXTCLDRDIR</CODE>
106 environment variable.
112 <H3><A NAME="SEC39" HREF="gettext_toc.html#TOC39">6.1.1 Input file location</A></H3>
116 <DT><SAMP>‘-i <VAR>inputfile</VAR>’</SAMP>
118 <DT><SAMP>‘--input=<VAR>inputfile</VAR>’</SAMP>
120 <A NAME="IDX255"></A>
121 <A NAME="IDX256"></A>
127 If no <VAR>inputfile</VAR> is given, the current directory is searched for the
128 POT file. If it is <SAMP>‘-’</SAMP>, standard input is read.
133 <H3><A NAME="SEC40" HREF="gettext_toc.html#TOC40">6.1.2 Output file location</A></H3>
137 <DT><SAMP>‘-o <VAR>file</VAR>’</SAMP>
139 <DT><SAMP>‘--output-file=<VAR>file</VAR>’</SAMP>
141 <A NAME="IDX257"></A>
142 <A NAME="IDX258"></A>
143 Write output to specified PO file.
148 If no output file is given, it depends on the <SAMP>‘--locale’</SAMP> option or the
149 user's locale setting. If it is <SAMP>‘-’</SAMP>, the results are written to
155 <H3><A NAME="SEC41" HREF="gettext_toc.html#TOC41">6.1.3 Input file syntax</A></H3>
159 <DT><SAMP>‘-P’</SAMP>
161 <DT><SAMP>‘--properties-input’</SAMP>
163 <A NAME="IDX259"></A>
164 <A NAME="IDX260"></A>
165 Assume the input file is a Java ResourceBundle in Java <CODE>.properties</CODE>
166 syntax, not in PO file syntax.
168 <DT><SAMP>‘--stringtable-input’</SAMP>
170 <A NAME="IDX261"></A>
171 Assume the input file is a NeXTstep/GNUstep localized resource file in
172 <CODE>.strings</CODE> syntax, not in PO file syntax.
178 <H3><A NAME="SEC42" HREF="gettext_toc.html#TOC42">6.1.4 Output details</A></H3>
182 <DT><SAMP>‘-l <VAR>ll_CC</VAR>’</SAMP>
184 <DT><SAMP>‘--locale=<VAR>ll_CC</VAR>’</SAMP>
186 <A NAME="IDX262"></A>
187 <A NAME="IDX263"></A>
188 Set target locale. <VAR>ll</VAR> should be a language code, and <VAR>CC</VAR> should
189 be a country code. The command <SAMP>‘locale -a’</SAMP> can be used to output a list
190 of all installed locales. The default is the user's locale setting.
192 <DT><SAMP>‘--no-translator’</SAMP>
194 <A NAME="IDX264"></A>
195 Declares that the PO file will not have a human translator and is instead
196 automatically generated.
198 <DT><SAMP>‘--color’</SAMP>
200 <DT><SAMP>‘--color=<VAR>when</VAR>’</SAMP>
202 <A NAME="IDX265"></A>
203 Specify whether or when to use colors and other text attributes.
204 See section <A HREF="gettext_9.html#SEC150">9.11.1 The <CODE>--color</CODE> option</A> for details.
206 <DT><SAMP>‘--style=<VAR>style_file</VAR>’</SAMP>
208 <A NAME="IDX266"></A>
209 Specify the CSS style rule file to use for <CODE>--color</CODE>.
210 See section <A HREF="gettext_9.html#SEC152">9.11.3 The <CODE>--style</CODE> option</A> for details.
212 <DT><SAMP>‘-p’</SAMP>
214 <DT><SAMP>‘--properties-output’</SAMP>
216 <A NAME="IDX267"></A>
217 <A NAME="IDX268"></A>
218 Write out a Java ResourceBundle in Java <CODE>.properties</CODE> syntax. Note
219 that this file format doesn't support plural forms and silently drops
222 <DT><SAMP>‘--stringtable-output’</SAMP>
224 <A NAME="IDX269"></A>
225 Write out a NeXTstep/GNUstep localized resource file in <CODE>.strings</CODE> syntax.
226 Note that this file format doesn't support plural forms.
228 <DT><SAMP>‘-w <VAR>number</VAR>’</SAMP>
230 <DT><SAMP>‘--width=<VAR>number</VAR>’</SAMP>
232 <A NAME="IDX270"></A>
233 <A NAME="IDX271"></A>
234 Set the output page width. Long strings in the output files will be
235 split across multiple lines in order to ensure that each line's width
236 (= number of screen columns) is less or equal to the given <VAR>number</VAR>.
238 <DT><SAMP>‘--no-wrap’</SAMP>
240 <A NAME="IDX272"></A>
241 Do not break long message lines. Message lines whose width exceeds the
242 output page width will not be split into several lines. Only file reference
243 lines which are wider than the output page width will be split.
249 <H3><A NAME="SEC43" HREF="gettext_toc.html#TOC43">6.1.5 Informative output</A></H3>
253 <DT><SAMP>‘-h’</SAMP>
255 <DT><SAMP>‘--help’</SAMP>
257 <A NAME="IDX273"></A>
258 <A NAME="IDX274"></A>
259 Display this help and exit.
261 <DT><SAMP>‘-V’</SAMP>
263 <DT><SAMP>‘--version’</SAMP>
265 <A NAME="IDX275"></A>
266 <A NAME="IDX276"></A>
267 Output version information and exit.
273 <H2><A NAME="SEC44" HREF="gettext_toc.html#TOC44">6.2 Filling in the Header Entry</A></H2>
275 <A NAME="IDX277"></A>
279 The initial comments "SOME DESCRIPTIVE TITLE", "YEAR" and
280 "FIRST AUTHOR <EMAIL@ADDRESS>, YEAR" ought to be replaced by sensible
281 information. This can be done in any text editor; if Emacs is used
282 and it switched to PO mode automatically (because it has recognized
283 the file's suffix), you can disable it by typing <KBD>M-x fundamental-mode</KBD>.
287 Modifying the header entry can already be done using PO mode: in Emacs,
288 type <KBD>M-x po-mode RET</KBD> and then <KBD>RET</KBD> again to start editing the
289 entry. You should fill in the following fields.
294 <DT>Project-Id-Version
296 This is the name and version of the package. Fill it in if it has not
297 already been filled in by <CODE>xgettext</CODE>.
299 <DT>Report-Msgid-Bugs-To
301 This has already been filled in by <CODE>xgettext</CODE>. It contains an email
302 address or URL where you can report bugs in the untranslated strings:
306 <LI>Strings which are not entire sentences, see the maintainer guidelines
308 in section <A HREF="gettext_4.html#SEC19">4.3 Preparing Translatable Strings</A>.
309 <LI>Strings which use unclear terms or require additional context to be
312 <LI>Strings which make invalid assumptions about notation of date, time or
315 <LI>Pluralisation problems.
317 <LI>Incorrect English spelling.
319 <LI>Incorrect formatting.
323 <DT>POT-Creation-Date
325 This has already been filled in by <CODE>xgettext</CODE>.
329 You don't need to fill this in. It will be filled by the PO file editor
330 when you save the file.
334 Fill in your name and email address (without double quotes).
338 Fill in the English name of the language, and the email address or
339 homepage URL of the language team you are part of.
341 Before starting a translation, it is a good idea to get in touch with
342 your translation team, not only to make sure you don't do duplicated work,
343 but also to coordinate difficult linguistic issues.
345 <A NAME="IDX278"></A>
346 In the Free Translation Project, each translation team has its own mailing
347 list. The up-to-date list of teams can be found at the Free Translation
348 Project's homepage, <A HREF="http://translationproject.org/">http://translationproject.org/</A>, in the "Teams"
353 Fill in the language code of the language. This can be in one of three
360 <SAMP>‘<VAR>ll</VAR>’</SAMP>, an ISO 639 two-letter language code (lowercase).
361 See section <A HREF="gettext_17.html#SEC331">A Language Codes</A> for the list of codes.
365 <SAMP>‘<VAR>ll</VAR>_<VAR>CC</VAR>’</SAMP>, where <SAMP>‘<VAR>ll</VAR>’</SAMP> is an ISO 639 two-letter
366 language code (lowercase) and <SAMP>‘<VAR>CC</VAR>’</SAMP> is an ISO 3166 two-letter
367 country code (uppercase). The country code specification is not redundant:
368 Some languages have dialects in different countries. For example,
369 <SAMP>‘de_AT’</SAMP> is used for Austria, and <SAMP>‘pt_BR’</SAMP> for Brazil. The country
370 code serves to distinguish the dialects. See section <A HREF="gettext_17.html#SEC331">A Language Codes</A> and
371 section <A HREF="gettext_18.html#SEC334">B Country Codes</A> for the lists of codes.
375 <SAMP>‘<VAR>ll</VAR>_<VAR>CC</VAR>@<VAR>variant</VAR>’</SAMP>, where <SAMP>‘<VAR>ll</VAR>’</SAMP> is an
376 ISO 639 two-letter language code (lowercase), <SAMP>‘<VAR>CC</VAR>’</SAMP> is an
377 ISO 3166 two-letter country code (uppercase), and <SAMP>‘<VAR>variant</VAR>’</SAMP> is
378 a variant designator. The variant designator (lowercase) can be a script
379 designator, such as <SAMP>‘latin’</SAMP> or <SAMP>‘cyrillic’</SAMP>.
382 The naming convention <SAMP>‘<VAR>ll</VAR>_<VAR>CC</VAR>’</SAMP> is also the way locales are
383 named on systems based on GNU libc. But there are three important differences:
389 In this PO file field, but not in locale names, <SAMP>‘<VAR>ll</VAR>_<VAR>CC</VAR>’</SAMP>
390 combinations denoting a language's main dialect are abbreviated as
391 <SAMP>‘<VAR>ll</VAR>’</SAMP>. For example, <SAMP>‘de’</SAMP> is equivalent to <SAMP>‘de_DE’</SAMP>
392 (German as spoken in Germany), and <SAMP>‘pt’</SAMP> to <SAMP>‘pt_PT’</SAMP> (Portuguese as
393 spoken in Portugal) in this context.
397 In this PO file field, suffixes like <SAMP>‘.<VAR>encoding</VAR>’</SAMP> are not used.
401 In this PO file field, variant designators that are not relevant to message
402 translation, such as <SAMP>‘@euro’</SAMP>, are not used.
405 So, if your locale name is <SAMP>‘de_DE.UTF-8’</SAMP>, the language specification in
406 PO files is just <SAMP>‘de’</SAMP>.
410 <A NAME="IDX279"></A>
411 <A NAME="IDX280"></A>
412 Replace <SAMP>‘CHARSET’</SAMP> with the character encoding used for your language,
413 in your locale, or UTF-8. This field is needed for correct operation of the
414 <CODE>msgmerge</CODE> and <CODE>msgfmt</CODE> programs, as well as for users whose
415 locale's character encoding differs from yours (see section <A HREF="gettext_11.html#SEC188">11.2.4 How to specify the output character set <CODE>gettext</CODE> uses</A>).
417 <A NAME="IDX281"></A>
418 You get the character encoding of your locale by running the shell command
419 <SAMP>‘locale charmap’</SAMP>. If the result is <SAMP>‘C’</SAMP> or <SAMP>‘ANSI_X3.4-1968’</SAMP>,
420 which is equivalent to <SAMP>‘ASCII’</SAMP> (= <SAMP>‘US-ASCII’</SAMP>), it means that your
421 locale is not correctly configured. In this case, ask your translation
422 team which charset to use. <SAMP>‘ASCII’</SAMP> is not usable for any language
425 <A NAME="IDX282"></A>
426 Because the PO files must be portable to operating systems with less advanced
427 internationalization facilities, the character encodings that can be used
428 are limited to those supported by both GNU <CODE>libc</CODE> and GNU
429 <CODE>libiconv</CODE>. These are:
430 <CODE>ASCII</CODE>, <CODE>ISO-8859-1</CODE>, <CODE>ISO-8859-2</CODE>, <CODE>ISO-8859-3</CODE>,
431 <CODE>ISO-8859-4</CODE>, <CODE>ISO-8859-5</CODE>, <CODE>ISO-8859-6</CODE>, <CODE>ISO-8859-7</CODE>,
432 <CODE>ISO-8859-8</CODE>, <CODE>ISO-8859-9</CODE>, <CODE>ISO-8859-13</CODE>, <CODE>ISO-8859-14</CODE>,
433 <CODE>ISO-8859-15</CODE>,
434 <CODE>KOI8-R</CODE>, <CODE>KOI8-U</CODE>, <CODE>KOI8-T</CODE>,
435 <CODE>CP850</CODE>, <CODE>CP866</CODE>, <CODE>CP874</CODE>,
436 <CODE>CP932</CODE>, <CODE>CP949</CODE>, <CODE>CP950</CODE>, <CODE>CP1250</CODE>, <CODE>CP1251</CODE>,
437 <CODE>CP1252</CODE>, <CODE>CP1253</CODE>, <CODE>CP1254</CODE>, <CODE>CP1255</CODE>, <CODE>CP1256</CODE>,
438 <CODE>CP1257</CODE>, <CODE>GB2312</CODE>, <CODE>EUC-JP</CODE>, <CODE>EUC-KR</CODE>, <CODE>EUC-TW</CODE>,
439 <CODE>BIG5</CODE>, <CODE>BIG5-HKSCS</CODE>, <CODE>GBK</CODE>, <CODE>GB18030</CODE>, <CODE>SHIFT_JIS</CODE>,
440 <CODE>JOHAB</CODE>, <CODE>TIS-620</CODE>, <CODE>VISCII</CODE>, <CODE>GEORGIAN-PS</CODE>, <CODE>UTF-8</CODE>.
442 <A NAME="IDX283"></A>
443 In the GNU system, the following encodings are frequently used for the
444 corresponding languages.
446 <A NAME="IDX284"></A>
449 <LI><CODE>ISO-8859-1</CODE> for
451 Afrikaans, Albanian, Basque, Breton, Catalan, Cornish, Danish, Dutch,
452 English, Estonian, Faroese, Finnish, French, Galician, German,
453 Greenlandic, Icelandic, Indonesian, Irish, Italian, Malay, Manx,
454 Norwegian, Occitan, Portuguese, Spanish, Swedish, Tagalog, Uzbek,
456 <LI><CODE>ISO-8859-2</CODE> for
458 Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak,
460 <LI><CODE>ISO-8859-3</CODE> for Maltese,
462 <LI><CODE>ISO-8859-5</CODE> for Macedonian, Serbian,
464 <LI><CODE>ISO-8859-6</CODE> for Arabic,
466 <LI><CODE>ISO-8859-7</CODE> for Greek,
468 <LI><CODE>ISO-8859-8</CODE> for Hebrew,
470 <LI><CODE>ISO-8859-9</CODE> for Turkish,
472 <LI><CODE>ISO-8859-13</CODE> for Latvian, Lithuanian, Maori,
474 <LI><CODE>ISO-8859-14</CODE> for Welsh,
476 <LI><CODE>ISO-8859-15</CODE> for
478 Basque, Catalan, Dutch, English, Finnish, French, Galician, German, Irish,
479 Italian, Portuguese, Spanish, Swedish, Walloon,
480 <LI><CODE>KOI8-R</CODE> for Russian,
482 <LI><CODE>KOI8-U</CODE> for Ukrainian,
484 <LI><CODE>KOI8-T</CODE> for Tajik,
486 <LI><CODE>CP1251</CODE> for Bulgarian, Belarusian,
488 <LI><CODE>GB2312</CODE>, <CODE>GBK</CODE>, <CODE>GB18030</CODE>
490 for simplified writing of Chinese,
491 <LI><CODE>BIG5</CODE>, <CODE>BIG5-HKSCS</CODE>
493 for traditional writing of Chinese,
494 <LI><CODE>EUC-JP</CODE> for Japanese,
496 <LI><CODE>EUC-KR</CODE> for Korean,
498 <LI><CODE>TIS-620</CODE> for Thai,
500 <LI><CODE>GEORGIAN-PS</CODE> for Georgian,
502 <LI><CODE>UTF-8</CODE> for any language, including those listed above.
506 <A NAME="IDX285"></A>
507 <A NAME="IDX286"></A>
508 When single quote characters or double quote characters are used in
509 translations for your language, and your locale's encoding is one of the
510 ISO-8859-* charsets, it is best if you create your PO files in UTF-8
511 encoding, instead of your locale's encoding. This is because in UTF-8
512 the real quote characters can be represented (single quote characters:
513 U+2018, U+2019, double quote characters: U+201C, U+201D), whereas none of
514 ISO-8859-* charsets has them all. Users in UTF-8 locales will see the
515 real quote characters, whereas users in ISO-8859-* locales will see the
516 vertical apostrophe and the vertical double quote instead (because that's
517 what the character set conversion will transliterate them to).
519 <A NAME="IDX287"></A>
520 To enter such quote characters under X11, you can change your keyboard
521 mapping using the <CODE>xmodmap</CODE> program. The X11 names of the quote
522 characters are "leftsinglequotemark", "rightsinglequotemark",
523 "leftdoublequotemark", "rightdoublequotemark", "singlelowquotemark",
524 "doublelowquotemark".
526 Note that only recent versions of GNU Emacs support the UTF-8 encoding:
527 Emacs 20 with Mule-UCS, and Emacs 21. As of January 2001, XEmacs doesn't
528 support the UTF-8 encoding.
530 The character encoding name can be written in either upper or lower case.
531 Usually upper case is preferred.
533 <DT>Content-Transfer-Encoding
535 Set this to <CODE>8bit</CODE>.
539 This field is optional. It is only needed if the PO file has plural forms.
540 You can find them by searching for the <SAMP>‘msgid_plural’</SAMP> keyword. The
541 format of the plural forms field is described in section <A HREF="gettext_11.html#SEC190">11.2.6 Additional functions for plural forms</A> and
542 section <A HREF="gettext_12.html#SEC211">12.6 Translating plural forms</A>.
546 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_5.html">previous</A>, <A HREF="gettext_7.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.