3 <!-- This HTML file has been created by texi2html 1.52b
4 from gettext.texi on 28 December 2015 -->
6 <META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
7 <TITLE>GNU gettext utilities - 3 The Format of PO Files</TITLE>
10 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_2.html">previous</A>, <A HREF="gettext_4.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
14 <H1><A NAME="SEC15" HREF="gettext_toc.html#TOC15">3 The Format of PO Files</A></H1>
21 The GNU <CODE>gettext</CODE> toolset helps programmers and translators
22 at producing, updating and using translation files, mainly those
23 PO files which are textual, editable files. This chapter explains
24 the format of PO files.
28 A PO file is made up of many entries, each entry holding the relation
29 between an original untranslated string and its corresponding
30 translation. All entries in a given PO file usually pertain
31 to a single project, and all translations are expressed in a single
32 target language. One PO file <EM>entry</EM> has the following schematic
38 <VAR>white-space</VAR>
39 # <VAR>translator-comments</VAR>
40 #. <VAR>extracted-comments</VAR>
41 #: <VAR>reference</VAR>...
43 #| msgid <VAR>previous-untranslated-string</VAR>
44 msgid <VAR>untranslated-string</VAR>
45 msgstr <VAR>translated-string</VAR>
49 The general structure of a PO file should be well understood by
50 the translator. When using PO mode, very little has to be known
51 about the format details, as PO mode takes care of them for her.
55 A simple entry can look like this:
61 msgid "Unknown system error"
62 msgstr "Error desconegut del sistema"
69 Entries begin with some optional white space. Usually, when generated
70 through GNU <CODE>gettext</CODE> tools, there is exactly one blank line
71 between entries. Then comments follow, on lines all starting with the
72 character <CODE>#</CODE>. There are two kinds of comments: those which have
73 some white space immediately following the <CODE>#</CODE> - the <VAR>translator
74 comments</VAR> -, which comments are created and maintained exclusively by the
75 translator, and those which have some non-white character just after the
76 <CODE>#</CODE> - the <VAR>automatic comments</VAR> -, which comments are created and
77 maintained automatically by GNU <CODE>gettext</CODE> tools. Comment lines
78 starting with <CODE>#.</CODE> contain comments given by the programmer, directed
79 at the translator; these comments are called <VAR>extracted comments</VAR>
80 because the <CODE>xgettext</CODE> program extracts them from the program's
81 source code. Comment lines starting with <CODE>#:</CODE> contain references to
82 the program's source code. Comment lines starting with <CODE>#,</CODE> contain
83 flags; more about these below. Comment lines starting with <CODE>#|</CODE>
84 contain the previous untranslated string for which the translator gave
89 All comments, of either kind, are optional.
95 After white space and comments, entries show two strings, namely
96 first the untranslated string as it appears in the original program
97 sources, and then, the translation of this string. The original
98 string is introduced by the keyword <CODE>msgid</CODE>, and the translation,
99 by <CODE>msgstr</CODE>. The two strings, untranslated and translated,
100 are quoted in various ways in the PO file, using <CODE>"</CODE>
101 delimiters and <CODE>\</CODE> escapes, but the translator does not really
102 have to pay attention to the precise quoting format, as PO mode fully
103 takes care of quoting for her.
107 The <CODE>msgid</CODE> strings, as well as automatic comments, are produced
108 and managed by other GNU <CODE>gettext</CODE> tools, and PO mode does not
109 provide means for the translator to alter these. The most she can
110 do is merely deleting them, and only by deleting the whole entry.
111 On the other hand, the <CODE>msgstr</CODE> string, as well as translator
112 comments, are really meant for the translator, and PO mode gives her
113 the full control she needs.
117 The comment lines beginning with <CODE>#,</CODE> are special because they are
118 not completely ignored by the programs as comments generally are. The
119 comma separated list of <VAR>flag</VAR>s is used by the <CODE>msgfmt</CODE>
120 program to give the user some better diagnostic messages. Currently
121 there are two forms of flags defined:
126 <DT><CODE>fuzzy</CODE>
129 This flag can be generated by the <CODE>msgmerge</CODE> program or it can be
130 inserted by the translator herself. It shows that the <CODE>msgstr</CODE>
131 string might not be a correct translation (anymore). Only the translator
132 can judge if the translation requires further modification, or is
133 acceptable as is. Once satisfied with the translation, she then removes
134 this <CODE>fuzzy</CODE> attribute. The <CODE>msgmerge</CODE> program inserts this
135 when it combined the <CODE>msgid</CODE> and <CODE>msgstr</CODE> entries after fuzzy
136 search only. See section <A HREF="gettext_8.html#SEC64">8.3.6 Fuzzy Entries</A>.
138 <DT><CODE>c-format</CODE>
141 <DT><CODE>no-c-format</CODE>
144 These flags should not be added by a human. Instead only the
145 <CODE>xgettext</CODE> program adds them. In an automated PO file processing
146 system as proposed here, the user's changes would be thrown away again as
147 soon as the <CODE>xgettext</CODE> program generates a new template file.
149 The <CODE>c-format</CODE> flag indicates that the untranslated string and the
150 translation are supposed to be C format strings. The <CODE>no-c-format</CODE>
151 flag indicates that they are not C format strings, even though the untranslated
152 string happens to look like a C format string (with <SAMP>‘%’</SAMP> directives).
154 When the <CODE>c-format</CODE> flag is given for a string the <CODE>msgfmt</CODE>
155 program does some more tests to check the validity of the translation.
156 See section <A HREF="gettext_10.html#SEC157">10.1 Invoking the <CODE>msgfmt</CODE> Program</A>, section <A HREF="gettext_4.html#SEC22">4.6 Special Comments preceding Keywords</A> and section <A HREF="gettext_15.html#SEC252">15.3.1 C Format Strings</A>.
158 <DT><CODE>objc-format</CODE>
161 <DT><CODE>no-objc-format</CODE>
164 Likewise for Objective C, see section <A HREF="gettext_15.html#SEC253">15.3.2 Objective C Format Strings</A>.
166 <DT><CODE>sh-format</CODE>
169 <DT><CODE>no-sh-format</CODE>
172 Likewise for Shell, see section <A HREF="gettext_15.html#SEC254">15.3.3 Shell Format Strings</A>.
174 <DT><CODE>python-format</CODE>
177 <DT><CODE>no-python-format</CODE>
180 Likewise for Python, see section <A HREF="gettext_15.html#SEC255">15.3.4 Python Format Strings</A>.
182 <DT><CODE>python-brace-format</CODE>
185 <DT><CODE>no-python-brace-format</CODE>
188 Likewise for Python brace, see section <A HREF="gettext_15.html#SEC255">15.3.4 Python Format Strings</A>.
190 <DT><CODE>lisp-format</CODE>
193 <DT><CODE>no-lisp-format</CODE>
196 Likewise for Lisp, see section <A HREF="gettext_15.html#SEC256">15.3.5 Lisp Format Strings</A>.
198 <DT><CODE>elisp-format</CODE>
201 <DT><CODE>no-elisp-format</CODE>
204 Likewise for Emacs Lisp, see section <A HREF="gettext_15.html#SEC257">15.3.6 Emacs Lisp Format Strings</A>.
206 <DT><CODE>librep-format</CODE>
209 <DT><CODE>no-librep-format</CODE>
212 Likewise for librep, see section <A HREF="gettext_15.html#SEC258">15.3.7 librep Format Strings</A>.
214 <DT><CODE>scheme-format</CODE>
217 <DT><CODE>no-scheme-format</CODE>
220 Likewise for Scheme, see section <A HREF="gettext_15.html#SEC259">15.3.8 Scheme Format Strings</A>.
222 <DT><CODE>smalltalk-format</CODE>
225 <DT><CODE>no-smalltalk-format</CODE>
228 Likewise for Smalltalk, see section <A HREF="gettext_15.html#SEC260">15.3.9 Smalltalk Format Strings</A>.
230 <DT><CODE>java-format</CODE>
233 <DT><CODE>no-java-format</CODE>
236 Likewise for Java, see section <A HREF="gettext_15.html#SEC261">15.3.10 Java Format Strings</A>.
238 <DT><CODE>csharp-format</CODE>
241 <DT><CODE>no-csharp-format</CODE>
244 Likewise for C#, see section <A HREF="gettext_15.html#SEC262">15.3.11 C# Format Strings</A>.
246 <DT><CODE>awk-format</CODE>
249 <DT><CODE>no-awk-format</CODE>
252 Likewise for awk, see section <A HREF="gettext_15.html#SEC263">15.3.12 awk Format Strings</A>.
254 <DT><CODE>object-pascal-format</CODE>
257 <DT><CODE>no-object-pascal-format</CODE>
260 Likewise for Object Pascal, see section <A HREF="gettext_15.html#SEC264">15.3.13 Object Pascal Format Strings</A>.
262 <DT><CODE>ycp-format</CODE>
265 <DT><CODE>no-ycp-format</CODE>
268 Likewise for YCP, see section <A HREF="gettext_15.html#SEC265">15.3.14 YCP Format Strings</A>.
270 <DT><CODE>tcl-format</CODE>
273 <DT><CODE>no-tcl-format</CODE>
276 Likewise for Tcl, see section <A HREF="gettext_15.html#SEC266">15.3.15 Tcl Format Strings</A>.
278 <DT><CODE>perl-format</CODE>
281 <DT><CODE>no-perl-format</CODE>
284 Likewise for Perl, see section <A HREF="gettext_15.html#SEC267">15.3.16 Perl Format Strings</A>.
286 <DT><CODE>perl-brace-format</CODE>
289 <DT><CODE>no-perl-brace-format</CODE>
292 Likewise for Perl brace, see section <A HREF="gettext_15.html#SEC267">15.3.16 Perl Format Strings</A>.
294 <DT><CODE>php-format</CODE>
297 <DT><CODE>no-php-format</CODE>
299 <A NAME="IDX100"></A>
300 Likewise for PHP, see section <A HREF="gettext_15.html#SEC268">15.3.17 PHP Format Strings</A>.
302 <DT><CODE>gcc-internal-format</CODE>
304 <A NAME="IDX101"></A>
305 <DT><CODE>no-gcc-internal-format</CODE>
307 <A NAME="IDX102"></A>
308 Likewise for the GCC sources, see section <A HREF="gettext_15.html#SEC269">15.3.18 GCC internal Format Strings</A>.
310 <DT><CODE>gfc-internal-format</CODE>
312 <A NAME="IDX103"></A>
313 <DT><CODE>no-gfc-internal-format</CODE>
315 <A NAME="IDX104"></A>
316 Likewise for the GNU Fortran Compiler sources, see section <A HREF="gettext_15.html#SEC270">15.3.19 GFC internal Format Strings</A>.
318 <DT><CODE>qt-format</CODE>
320 <A NAME="IDX105"></A>
321 <DT><CODE>no-qt-format</CODE>
323 <A NAME="IDX106"></A>
324 Likewise for Qt, see section <A HREF="gettext_15.html#SEC271">15.3.20 Qt Format Strings</A>.
326 <DT><CODE>qt-plural-format</CODE>
328 <A NAME="IDX107"></A>
329 <DT><CODE>no-qt-plural-format</CODE>
331 <A NAME="IDX108"></A>
332 Likewise for Qt plural forms, see section <A HREF="gettext_15.html#SEC272">15.3.21 Qt Format Strings</A>.
334 <DT><CODE>kde-format</CODE>
336 <A NAME="IDX109"></A>
337 <DT><CODE>no-kde-format</CODE>
339 <A NAME="IDX110"></A>
340 Likewise for KDE, see section <A HREF="gettext_15.html#SEC273">15.3.22 KDE Format Strings</A>.
342 <DT><CODE>boost-format</CODE>
344 <A NAME="IDX111"></A>
345 <DT><CODE>no-boost-format</CODE>
347 <A NAME="IDX112"></A>
348 Likewise for Boost, see section <A HREF="gettext_15.html#SEC275">15.3.24 Boost Format Strings</A>.
350 <DT><CODE>lua-format</CODE>
352 <A NAME="IDX113"></A>
353 <DT><CODE>no-lua-format</CODE>
355 <A NAME="IDX114"></A>
356 Likewise for Lua, see section <A HREF="gettext_15.html#SEC276">15.3.25 Lua Format Strings</A>.
358 <DT><CODE>javascript-format</CODE>
360 <A NAME="IDX115"></A>
361 <DT><CODE>no-javascript-format</CODE>
363 <A NAME="IDX116"></A>
364 Likewise for JavaScript, see section <A HREF="gettext_15.html#SEC277">15.3.26 JavaScript Format Strings</A>.
369 <A NAME="IDX117"></A>
370 <A NAME="IDX118"></A>
371 It is also possible to have entries with a context specifier. They look like
377 <VAR>white-space</VAR>
378 # <VAR>translator-comments</VAR>
379 #. <VAR>extracted-comments</VAR>
380 #: <VAR>reference</VAR>...
381 #, <VAR>flag</VAR>...
382 #| msgctxt <VAR>previous-context</VAR>
383 #| msgid <VAR>previous-untranslated-string</VAR>
384 msgctxt <VAR>context</VAR>
385 msgid <VAR>untranslated-string</VAR>
386 msgstr <VAR>translated-string</VAR>
390 The context serves to disambiguate messages with the same
391 <VAR>untranslated-string</VAR>. It is possible to have several entries with
392 the same <VAR>untranslated-string</VAR> in a PO file, provided that they each
393 have a different <VAR>context</VAR>. Note that an empty <VAR>context</VAR> string
394 and an absent <CODE>msgctxt</CODE> line do not mean the same thing.
398 <A NAME="IDX119"></A>
399 <A NAME="IDX120"></A>
400 A different kind of entries is used for translations which involve
406 <VAR>white-space</VAR>
407 # <VAR>translator-comments</VAR>
408 #. <VAR>extracted-comments</VAR>
409 #: <VAR>reference</VAR>...
410 #, <VAR>flag</VAR>...
411 #| msgid <VAR>previous-untranslated-string-singular</VAR>
412 #| msgid_plural <VAR>previous-untranslated-string-plural</VAR>
413 msgid <VAR>untranslated-string-singular</VAR>
414 msgid_plural <VAR>untranslated-string-plural</VAR>
415 msgstr[0] <VAR>translated-string-case-0</VAR>
417 msgstr[N] <VAR>translated-string-case-n</VAR>
421 Such an entry can look like this:
426 #: src/msgcmp.c:338 src/po-lex.c:699
428 msgid "found %d fatal error"
429 msgid_plural "found %d fatal errors"
430 msgstr[0] "s'ha trobat %d error fatal"
431 msgstr[1] "s'han trobat %d errors fatals"
435 Here also, a <CODE>msgctxt</CODE> context can be specified before <CODE>msgid</CODE>,
440 Here, additional kinds of flags can be used:
445 <DT><CODE>range:</CODE>
447 <A NAME="IDX121"></A>
448 This flag is followed by a range of non-negative numbers, using the syntax
449 <CODE>range: <VAR>minimum-value</VAR>..<VAR>maximum-value</VAR></CODE>. It designates the
450 possible values that the numeric parameter of the message can take. In some
451 languages, translators may produce slightly better translations if they know
452 that the value can only take on values between 0 and 10, for example.
456 The <VAR>previous-untranslated-string</VAR> is optionally inserted by the
457 <CODE>msgmerge</CODE> program, at the same time when it marks a message fuzzy.
458 It helps the translator to see which changes were done by the developers
459 on the <VAR>untranslated-string</VAR>.
463 It happens that some lines, usually whitespace or comments, follow the
464 very last entry of a PO file. Such lines are not part of any entry,
465 and will be dropped when the PO file is processed by the tools, or may
466 disturb some PO file editors.
470 The remainder of this section may be safely skipped by those using
471 a PO file editor, yet it may be interesting for everybody to have a better
472 idea of the precise format of a PO file. On the other hand, those
473 wishing to modify PO files by hand should carefully continue reading on.
477 An empty <VAR>untranslated-string</VAR> is reserved to contain the header
478 entry with the meta information (see section <A HREF="gettext_6.html#SEC44">6.2 Filling in the Header Entry</A>). This header
479 entry should be the first entry of the file. The empty
480 <VAR>untranslated-string</VAR> is reserved for this purpose and must
481 not be used anywhere else.
485 Each of <VAR>untranslated-string</VAR> and <VAR>translated-string</VAR> respects
486 the C syntax for a character string, including the surrounding quotes
487 and embedded backslashed escape sequences. When the time comes
488 to write multi-line strings, one should not use escaped newlines.
489 Instead, a closing quote should follow the last character on the
490 line to be continued, and an opening quote should resume the string
491 at the beginning of the following PO file line. For example:
497 "Here is an example of how one might continue a very long string\n"
498 "for the common case the string represents multi-line output.\n"
502 In this example, the empty string is used on the first line, to
503 allow better alignment of the <CODE>H</CODE> from the word <SAMP>‘Here’</SAMP>
504 over the <CODE>f</CODE> from the word <SAMP>‘for’</SAMP>. In this example, the
505 <CODE>msgid</CODE> keyword is followed by three strings, which are meant
506 to be concatenated. Concatenating the empty string does not change
507 the resulting overall string, but it is a way for us to comply with
508 the necessity of <CODE>msgid</CODE> to be followed by a string on the same
509 line, while keeping the multi-line presentation left-justified, as
510 we find this to be a cleaner disposition. The empty string could have
511 been omitted, but only if the string starting with <SAMP>‘Here’</SAMP> was
512 promoted on the first line, right after <CODE>msgid</CODE>.<A NAME="DOCF2" HREF="gettext_foot.html#FOOT2">(2)</A> It was not really necessary
513 either to switch between the two last quoted strings immediately after
514 the newline <SAMP>‘\n’</SAMP>, the switch could have occurred after <EM>any</EM>
515 other character, we just did it this way because it is neater.
519 <A NAME="IDX122"></A>
520 One should carefully distinguish between end of lines marked as
521 <SAMP>‘\n’</SAMP> <EM>inside</EM> quotes, which are part of the represented
522 string, and end of lines in the PO file itself, outside string quotes,
523 which have no incidence on the represented string.
527 <A NAME="IDX123"></A>
528 Outside strings, white lines and comments may be used freely.
529 Comments start at the beginning of a line with <SAMP>‘#’</SAMP> and extend
530 until the end of the PO file line. Comments written by translators
531 should have the initial <SAMP>‘#’</SAMP> immediately followed by some white
532 space. If the <SAMP>‘#’</SAMP> is not immediately followed by white space,
533 this comment is most likely generated and managed by specialized GNU
534 tools, and might disappear or be replaced unexpectedly when the PO
535 file is given to <CODE>msgmerge</CODE>.
539 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_2.html">previous</A>, <A HREF="gettext_4.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.