3 <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
4 <title>Localization</title>
5 <link rel="stylesheet" href="../../../../../../doc/src/boostbook.css" type="text/css">
6 <meta name="generator" content="DocBook XSL Stylesheets V1.76.1">
7 <link rel="home" href="../../index.html" title="Boost.Regex">
8 <link rel="up" href="../background_information.html" title="Background Information">
9 <link rel="prev" href="headers.html" title="Headers">
10 <link rel="next" href="thread_safety.html" title="Thread Safety">
12 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
13 <table cellpadding="2" width="100%"><tr>
14 <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../boost.png"></td>
15 <td align="center"><a href="../../../../../../index.html">Home</a></td>
16 <td align="center"><a href="../../../../../../libs/libraries.htm">Libraries</a></td>
17 <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
18 <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
19 <td align="center"><a href="../../../../../../more/index.htm">More</a></td>
22 <div class="spirit-nav">
23 <a accesskey="p" href="headers.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../background_information.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="thread_safety.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
26 <div class="titlepage"><div><div><h3 class="title">
27 <a name="boost_regex.background_information.locale"></a><a class="link" href="locale.html" title="Localization">Localization</a>
28 </h3></div></div></div>
30 Boost.Regex provides extensive support for run-time localization, the localization
31 model used can be split into two parts: front-end and back-end.
34 Front-end localization deals with everything which the user sees - error
35 messages, and the regular expression syntax itself. For example a French
36 application could change [[:word:]] to [[:mot:]] and \w to \m. Modifying
37 the front end locale requires active support from the developer, by providing
38 the library with a message catalogue to load, containing the localized strings.
39 Front-end locale is affected by the LC_MESSAGES category only.
42 Back-end localization deals with everything that occurs after the expression
43 has been parsed - in other words everything that the user does not see or
44 interact with directly. It deals with case conversion, collation, and character
45 class membership. The back-end locale does not require any intervention from
46 the developer - the library will acquire all the information it requires
47 for the current locale from the underlying operating system / run time library.
48 This means that if the program user does not interact with regular expressions
49 directly - for example if the expressions are embedded in your C++ code -
50 then no explicit localization is required, as the library will take care
51 of everything for you. For example embedding the expression [[:word:]]+ in
52 your code will always match a whole word, if the program is run on a machine
53 with, for example, a Greek locale, then it will still match a whole word,
54 but in Greek characters rather than Latin ones. The back-end locale is affected
55 by the LC_TYPE and LC_COLLATE categories.
58 There are three separate localization mechanisms supported by Boost.Regex:
61 <a name="boost_regex.background_information.locale.h0"></a>
62 <span><a name="boost_regex.background_information.locale.win32_localization_model_"></a></span><a class="link" href="locale.html#boost_regex.background_information.locale.win32_localization_model_">Win32
63 localization model.</a>
66 This is the default model when the library is compiled under Win32, and is
67 encapsulated by the traits class <code class="computeroutput"><span class="identifier">w32_regex_traits</span></code>.
68 When this model is in effect each <a class="link" href="../ref/basic_regex.html" title="basic_regex"><code class="computeroutput"><span class="identifier">basic_regex</span></code></a> object gets it's own
69 LCID, by default this is the users default setting as returned by GetUserDefaultLCID,
70 but you can call imbue on the <code class="computeroutput"><span class="identifier">basic_regex</span></code>
71 object to set it's locale to some other LCID if you wish. All the settings
72 used by Boost.Regex are acquired directly from the operating system bypassing
73 the C run time library. Front-end localization requires a resource dll, containing
74 a string table with the user-defined strings. The traits class exports the
77 <pre class="programlisting"><span class="keyword">static</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">set_message_catalogue</span><span class="special">(</span><span class="keyword">const</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">&</span> <span class="identifier">s</span><span class="special">);</span>
80 which needs to be called with a string identifying the name of the resource
81 dll, before your code compiles any regular expressions (but not necessarily
82 before you construct any <code class="computeroutput"><span class="identifier">basic_regex</span></code>
85 <pre class="programlisting"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">w32_regex_traits</span><span class="special"><</span><span class="keyword">char</span><span class="special">>::</span><span class="identifier">set_message_catalogue</span><span class="special">(</span><span class="string">"mydll.dll"</span><span class="special">);</span>
88 The library provides full Unicode support under NT, under Windows 9x the
89 library degrades gracefully - characters 0 to 255 are supported, the remainder
90 are treated as "unknown" graphic characters.
93 <a name="boost_regex.background_information.locale.h1"></a>
94 <span><a name="boost_regex.background_information.locale.c_localization_model_"></a></span><a class="link" href="locale.html#boost_regex.background_information.locale.c_localization_model_">C
95 localization model.</a>
98 This model has been deprecated in favor of the C++ locale for all non-Windows
99 compilers that support it. This locale is encapsulated by the traits class
100 <code class="computeroutput"><span class="identifier">c_regex_traits</span></code>, Win32 users
101 can force this model to take effect by defining the pre-processor symbol
102 BOOST_REGEX_USE_C_LOCALE. When this model is in effect there is a single
103 global locale, as set by <code class="computeroutput"><span class="identifier">setlocale</span></code>.
104 All settings are acquired from your run time library, consequently Unicode
105 support is dependent upon your run time library implementation.
108 Front end localization is not supported.
111 Note that calling setlocale invalidates all compiled regular expressions,
112 calling <code class="computeroutput"><span class="identifier">setlocale</span><span class="special">(</span><span class="identifier">LC_ALL</span><span class="special">,</span> <span class="string">"C"</span><span class="special">)</span></code>
113 will make this library behave equivalent to most traditional regular expression
114 libraries including version 1 of this library.
117 <a name="boost_regex.background_information.locale.h2"></a>
118 <span><a name="boost_regex.background_information.locale.c___localization_model_"></a></span><a class="link" href="locale.html#boost_regex.background_information.locale.c___localization_model_">C++
119 localization model.</a>
122 This model is the default for non-Windows compilers.
125 When this model is in effect each instance of <a class="link" href="../ref/basic_regex.html" title="basic_regex"><code class="computeroutput"><span class="identifier">basic_regex</span></code></a> has its own instance
126 of <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span></code>, class <a class="link" href="../ref/basic_regex.html" title="basic_regex"><code class="computeroutput"><span class="identifier">basic_regex</span></code></a> also has a member function
127 <code class="computeroutput"><span class="identifier">imbue</span></code> which allows the locale
128 for the expression to be set on a per-instance basis. Front end localization
129 requires a POSIX message catalogue, which will be loaded via the <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">messages</span></code>
130 facet of the expression's locale, the traits class exports the symbol:
132 <pre class="programlisting"><span class="keyword">static</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">set_message_catalogue</span><span class="special">(</span><span class="keyword">const</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">&</span> <span class="identifier">s</span><span class="special">);</span>
135 which needs to be called with a string identifying the name of the message
136 catalogue, before your code compiles any regular expressions (but not necessarily
137 before you construct any basic_regex instances):
139 <pre class="programlisting"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">cpp_regex_traits</span><span class="special"><</span><span class="keyword">char</span><span class="special">>::</span><span class="identifier">set_message_catalogue</span><span class="special">(</span><span class="string">"mycatalogue"</span><span class="special">);</span>
142 Note that calling <code class="computeroutput"><span class="identifier">basic_regex</span><span class="special"><>::</span><span class="identifier">imbue</span></code>
143 will invalidate any expression currently compiled in that instance of <a class="link" href="../ref/basic_regex.html" title="basic_regex"><code class="computeroutput"><span class="identifier">basic_regex</span></code></a>.
146 Finally note that if you build the library with a non-default localization
147 model, then the appropriate pre-processor symbol (BOOST_REGEX_USE_C_LOCALE
148 or BOOST_REGEX_USE_CPP_LOCALE) must be defined both when you build the support
149 library, and when you include <code class="computeroutput"><span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">regex</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span></code>
150 or <code class="computeroutput"><span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">cregex</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span></code>
151 in your code. The best way to ensure this is to add the #define to <code class="computeroutput"><span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">regex</span><span class="special">/</span><span class="identifier">user</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span></code>.
154 <a name="boost_regex.background_information.locale.h3"></a>
155 <span><a name="boost_regex.background_information.locale.providing_a_message_catalogue"></a></span><a class="link" href="locale.html#boost_regex.background_information.locale.providing_a_message_catalogue">Providing
156 a message catalogue</a>
159 In order to localize the front end of the library, you need to provide the
160 library with the appropriate message strings contained either in a resource
161 dll's string table (Win32 model), or a POSIX message catalogue (C++ models).
162 In the latter case the messages must appear in message set zero of the catalogue.
163 The messages and their id's are as follows:
165 <div class="informaltable"><table class="table">
202 The character used to start a sub-expression.
219 The character used to end a sub-expression declaration.
236 The character used to denote an end of line assertion.
253 The character used to denote the start of line assertion.
270 The character used to denote the "match any character expression".
287 The match zero or more times repetition operator.
304 The match one or more repetition operator.
321 The match zero or one repetition operator.
338 The character set opening character.
355 The character set closing character.
372 The alternation operator.
389 The escape character.
406 The hash character (not currently used).
440 The repetition operator opening character.
457 The repetition operator closing character.
474 The digit characters.
491 The character which when preceded by an escape character represents
492 the word boundary assertion.
509 The character which when preceded by an escape character represents
510 the non-word boundary assertion.
527 The character which when preceded by an escape character represents
528 the word-start boundary assertion.
545 The character which when preceded by an escape character represents
546 the word-end boundary assertion.
563 The character which when preceded by an escape character represents
581 The character which when preceded by an escape character represents
582 a non-word character.
599 The character which when preceded by an escape character represents
600 a start of buffer assertion.
617 The character which when preceded by an escape character represents
618 an end of buffer assertion.
635 The newline character.
669 The character which when preceded by an escape character represents
687 The character which when preceded by an escape character represents
688 the form feed character.
705 The character which when preceded by an escape character represents
706 the newline character.
723 The character which when preceded by an escape character represents
724 the carriage return character.
741 The character which when preceded by an escape character represents
759 The character which when preceded by an escape character represents
760 the vertical tab character.
777 The character which when preceded by an escape character represents
778 the start of a hexadecimal character constant.
795 The character which when preceded by an escape character represents
796 the start of an ASCII escape character.
830 The equals character.
847 The character which when preceded by an escape character represents
848 the ASCII escape character.
865 The character which when preceded by an escape character represents
866 any lower case character.
883 The character which when preceded by an escape character represents
884 any non-lower case character.
901 The character which when preceded by an escape character represents
902 any upper case character.
919 The character which when preceded by an escape character represents
920 any non-upper case character.
937 The character which when preceded by an escape character represents
955 The character which when preceded by an escape character represents
956 any non-space character.
973 The character which when preceded by an escape character represents
991 The character which when preceded by an escape character represents
992 any non-digit character.
1009 The character which when preceded by an escape character represents
1010 the end quote operator.
1027 The character which when preceded by an escape character represents
1028 the start quote operator.
1045 The character which when preceded by an escape character represents
1046 a Unicode combining character sequence.
1063 The character which when preceded by an escape character represents
1064 any single character.
1081 The character which when preceded by an escape character represents
1082 end of buffer operator.
1099 The character which when preceded by an escape character represents
1100 the continuation assertion.
1117 The character which when preceeded by (? indicates a zero width
1118 negated forward lookahead assert.
1130 Custom error messages are loaded as follows:
1132 <div class="informaltable"><table class="table">
1186 "Invalid regular expression"
1203 "Invalid collation character"
1220 "Invalid character class name"
1237 "Trailing backslash"
1254 "Invalid back reference"
1271 "Unmatched [ or <code class="literal">" </code>
1320 "Invalid content of \{\}"
1371 "Invalid preceding regular expression"
1388 "Premature end of regular expression"
1405 "Regular expression too big"
1463 Custom character class names are loaded as followed:
1465 <div class="informaltable"><table class="table">
1484 Equivalent default class name
1497 The character class name for alphanumeric characters.
1514 The character class name for alphabetic characters.
1531 The character class name for control characters.
1548 The character class name for digit characters.
1565 The character class name for graphics characters.
1582 The character class name for lower case characters.
1599 The character class name for printable characters.
1616 The character class name for punctuation characters.
1633 The character class name for space characters.
1650 The character class name for upper case characters.
1667 The character class name for hexadecimal characters.
1684 The character class name for blank characters.
1701 The character class name for word characters.
1718 The character class name for Unicode characters.
1730 Finally, custom collating element names are loaded starting from message
1731 id 400, and terminating when the first load thereafter fails. Each message
1732 looks something like: "tagname string" where tagname is the name
1733 used inside [[.tagname.]] and string is the actual text of the collating
1734 element. Note that the value of collating element [[.zero.]] is used for
1735 the conversion of strings to numbers - if you replace this with another value
1736 then that will be used for string parsing - for example use the Unicode character
1737 0x0660 for [[.zero.]] if you want to use Unicode Arabic-Indic digits in your
1738 regular expressions in place of Latin digits.
1741 Note that the POSIX defined names for character classes and collating elements
1742 are always available - even if custom names are defined, in contrast, custom
1743 error messages, and custom syntax messages replace the default ones.
1746 <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
1747 <td align="left"></td>
1748 <td align="right"><div class="copyright-footer">Copyright © 1998-2010 John Maddock<p>
1749 Distributed under the Boost Software License, Version 1.0. (See accompanying
1750 file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
1755 <div class="spirit-nav">
1756 <a accesskey="p" href="headers.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../background_information.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="thread_safety.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>