3 <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
4 <title>POSIX Basic Regular Expression Syntax</title>
5 <link rel="stylesheet" href="../../../../../../doc/src/boostbook.css" type="text/css">
6 <meta name="generator" content="DocBook XSL Stylesheets V1.78.1">
7 <link rel="home" href="../../index.html" title="Boost.Regex 5.0.0">
8 <link rel="up" href="../syntax.html" title="Regular Expression Syntax">
9 <link rel="prev" href="basic_extended.html" title="POSIX Extended Regular Expression Syntax">
10 <link rel="next" href="character_classes.html" title="Character Class Names">
12 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
13 <table cellpadding="2" width="100%"><tr>
14 <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../boost.png"></td>
15 <td align="center"><a href="../../../../../../index.html">Home</a></td>
16 <td align="center"><a href="../../../../../../libs/libraries.htm">Libraries</a></td>
17 <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
18 <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
19 <td align="center"><a href="../../../../../../more/index.htm">More</a></td>
22 <div class="spirit-nav">
23 <a accesskey="p" href="basic_extended.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../syntax.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="character_classes.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
26 <div class="titlepage"><div><div><h3 class="title">
27 <a name="boost_regex.syntax.basic_syntax"></a><a class="link" href="basic_syntax.html" title="POSIX Basic Regular Expression Syntax">POSIX Basic Regular
29 </h3></div></div></div>
31 <a name="boost_regex.syntax.basic_syntax.h0"></a>
32 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.synopsis"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.synopsis">Synopsis</a>
35 The POSIX-Basic regular expression syntax is used by the Unix utility <code class="computeroutput"><span class="identifier">sed</span></code>, and variations are used by <code class="computeroutput"><span class="identifier">grep</span></code> and <code class="computeroutput"><span class="identifier">emacs</span></code>.
36 You can construct POSIX basic regular expressions in Boost.Regex by passing
37 the flag <code class="computeroutput"><span class="identifier">basic</span></code> to the regex
38 constructor (see <a class="link" href="../ref/syntax_option_type.html" title="syntax_option_type"><code class="computeroutput"><span class="identifier">syntax_option_type</span></code></a>), for example:
40 <pre class="programlisting"><span class="comment">// e1 is a case sensitive POSIX-Basic expression:</span>
41 <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e1</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">basic</span><span class="special">);</span>
42 <span class="comment">// e2 a case insensitive POSIX-Basic expression:</span>
43 <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e2</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">basic</span><span class="special">|</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">icase</span><span class="special">);</span>
45 <a name="boost_regex.posix_basic"></a><h4>
46 <a name="boost_regex.syntax.basic_syntax.h1"></a>
47 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.posix_basic_syntax"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.posix_basic_syntax">POSIX
51 In POSIX-Basic regular expressions, all characters are match themselves except
52 for the following special characters:
54 <pre class="programlisting">.[\*^$</pre>
56 <a name="boost_regex.syntax.basic_syntax.h2"></a>
57 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.wildcard_"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.wildcard_">Wildcard:</a>
60 The single character '.' when used outside of a character set will match
61 any single character except:
63 <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
65 The NULL character when the flag <code class="computeroutput"><span class="identifier">match_no_dot_null</span></code>
66 is passed to the matching algorithms.
69 The newline character when the flag <code class="computeroutput"><span class="identifier">match_not_dot_newline</span></code>
70 is passed to the matching algorithms.
74 <a name="boost_regex.syntax.basic_syntax.h3"></a>
75 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.anchors_"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.anchors_">Anchors:</a>
78 A '^' character shall match the start of a line when used as the first character
79 of an expression, or the first character of a sub-expression.
82 A '$' character shall match the end of a line when used as the last character
83 of an expression, or the last character of a sub-expression.
86 <a name="boost_regex.syntax.basic_syntax.h4"></a>
87 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.marked_sub_expressions_"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.marked_sub_expressions_">Marked
91 A section beginning <code class="computeroutput"><span class="special">\(</span></code> and ending
92 <code class="computeroutput"><span class="special">\)</span></code> acts as a marked sub-expression.
93 Whatever matched the sub-expression is split out in a separate field by the
94 matching algorithms. Marked sub-expressions can also repeated, or referred-to
98 <a name="boost_regex.syntax.basic_syntax.h5"></a>
99 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.repeats_"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.repeats_">Repeats:</a>
102 Any atom (a single character, a marked sub-expression, or a character class)
103 can be repeated with the * operator.
106 For example <code class="computeroutput"><span class="identifier">a</span><span class="special">*</span></code>
107 will match any number of letter a's repeated zero or more times (an atom
108 repeated zero times matches an empty string), so the expression <code class="computeroutput"><span class="identifier">a</span><span class="special">*</span><span class="identifier">b</span></code>
109 will match any of the following:
111 <pre class="programlisting">b
116 An atom can also be repeated with a bounded repeat:
119 <code class="computeroutput"><span class="identifier">a</span><span class="special">\{</span><span class="identifier">n</span><span class="special">\}</span></code> Matches
120 'a' repeated exactly n times.
123 <code class="computeroutput"><span class="identifier">a</span><span class="special">\{</span><span class="identifier">n</span><span class="special">,\}</span></code> Matches
124 'a' repeated n or more times.
127 <code class="computeroutput"><span class="identifier">a</span><span class="special">\{</span><span class="identifier">n</span><span class="special">,</span> <span class="identifier">m</span><span class="special">\}</span></code> Matches 'a' repeated between n and m times
133 <pre class="programlisting">^a{2,3}$</pre>
135 Will match either of:
137 <pre class="programlisting">aa
143 <pre class="programlisting">a
147 It is an error to use a repeat operator, if the preceding construct can not
148 be repeated, for example:
150 <pre class="programlisting">a(*)</pre>
152 Will raise an error, as there is nothing for the * operator to be applied
156 <a name="boost_regex.syntax.basic_syntax.h6"></a>
157 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.back_references_"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.back_references_">Back
161 An escape character followed by a digit <span class="emphasis"><em>n</em></span>, where <span class="emphasis"><em>n</em></span>
162 is in the range 1-9, matches the same string that was matched by sub-expression
163 <span class="emphasis"><em>n</em></span>. For example the expression:
165 <pre class="programlisting">^\(a*\).*\1$</pre>
167 Will match the string:
169 <pre class="programlisting">aaabbaaa</pre>
173 <pre class="programlisting">aaabba</pre>
175 <a name="boost_regex.syntax.basic_syntax.h7"></a>
176 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.character_sets_"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.character_sets_">Character
180 A character set is a bracket-expression starting with [ and ending with ],
181 it defines a set of characters, and matches any single character that is
182 a member of that set.
185 A bracket expression may contain any combination of the following:
188 <a name="boost_regex.syntax.basic_syntax.h8"></a>
189 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.single_characters_"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.single_characters_">Single
193 For example <code class="computeroutput"><span class="special">[</span><span class="identifier">abc</span><span class="special">]</span></code>, will match any of the characters 'a', 'b',
197 <a name="boost_regex.syntax.basic_syntax.h9"></a>
198 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.character_ranges_"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.character_ranges_">Character
202 For example <code class="computeroutput"><span class="special">[</span><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span><span class="special">]</span></code>
203 will match any single character in the range 'a' to 'c'. By default, for
204 POSIX-Basic regular expressions, a character <span class="emphasis"><em>x</em></span> is within
205 the range <span class="emphasis"><em>y</em></span> to <span class="emphasis"><em>z</em></span>, if it collates
206 within that range; this results in locale specific behavior. This behavior
207 can be turned off by unsetting the <code class="computeroutput"><span class="identifier">collate</span></code>
208 option flag when constructing the regular expression - in which case whether
209 a character appears within a range is determined by comparing the code points
210 of the characters only.
213 <a name="boost_regex.syntax.basic_syntax.h10"></a>
214 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.negation_"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.negation_">Negation:</a>
217 If the bracket-expression begins with the ^ character, then it matches the
218 complement of the characters it contains, for example <code class="computeroutput"><span class="special">[^</span><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span><span class="special">]</span></code> matches any character that is not in the
222 <a name="boost_regex.syntax.basic_syntax.h11"></a>
223 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.character_classes_"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.character_classes_">Character
227 An expression of the form <code class="computeroutput"><span class="special">[[:</span><span class="identifier">name</span><span class="special">:]]</span></code>
228 matches the named character class "name", for example <code class="computeroutput"><span class="special">[[:</span><span class="identifier">lower</span><span class="special">:]]</span></code> matches any lower case character. See
229 <a class="link" href="character_classes.html" title="Character Class Names">character class names</a>.
232 <a name="boost_regex.syntax.basic_syntax.h12"></a>
233 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.collating_elements_"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.collating_elements_">Collating
237 An expression of the form <code class="computeroutput"><span class="special">[[.</span><span class="identifier">col</span><span class="special">.]</span></code> matches
238 the collating element <span class="emphasis"><em>col</em></span>. A collating element is any
239 single character, or any sequence of characters that collates as a single
240 unit. Collating elements may also be used as the end point of a range, for
241 example: <code class="computeroutput"><span class="special">[[.</span><span class="identifier">ae</span><span class="special">.]-</span><span class="identifier">c</span><span class="special">]</span></code>
242 matches the character sequence "ae", plus any single character
243 in the rangle "ae"-c, assuming that "ae" is treated as
244 a single collating element in the current locale.
247 Collating elements may be used in place of escapes (which are not normally
248 allowed inside character sets), for example <code class="computeroutput"><span class="special">[[.^.]</span><span class="identifier">abc</span><span class="special">]</span></code> would
249 match either one of the characters 'abc^'.
252 As an extension, a collating element may also be specified via its symbolic
255 <pre class="programlisting">[[.NUL.]]</pre>
257 matches a 'NUL' character. See <a class="link" href="collating_names.html" title="Collating Names">collating
261 <a name="boost_regex.syntax.basic_syntax.h13"></a>
262 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.equivalence_classes_"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.equivalence_classes_">Equivalence
266 An expression of the form <code class="computeroutput"><span class="special">[[=</span><span class="identifier">col</span><span class="special">=]]</span></code>,
267 matches any character or collating element whose primary sort key is the
268 same as that for collating element <span class="emphasis"><em>col</em></span>, as with collating
269 elements the name <span class="emphasis"><em>col</em></span> may be a <a class="link" href="collating_names.html" title="Collating Names">collating
270 symbolic name</a>. A primary sort key is one that ignores case, accentation,
271 or locale-specific tailorings; so for example <code class="computeroutput"><span class="special">[[=</span><span class="identifier">a</span><span class="special">=]]</span></code> matches
272 any of the characters: a, À, Á, Â, Ã, Ä, Å, A, à, á, â, ã, ä and å. Unfortunately implementation
273 of this is reliant on the platform's collation and localisation support;
274 this feature can not be relied upon to work portably across all platforms,
275 or even all locales on one platform.
278 <a name="boost_regex.syntax.basic_syntax.h14"></a>
279 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.combinations_"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.combinations_">Combinations:</a>
282 All of the above can be combined in one character set declaration, for example:
283 <code class="computeroutput"><span class="special">[[:</span><span class="identifier">digit</span><span class="special">:]</span><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span><span class="special">[.</span><span class="identifier">NUL</span><span class="special">.]].</span></code>
286 <a name="boost_regex.syntax.basic_syntax.h15"></a>
287 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.escapes"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.escapes">Escapes</a>
290 With the exception of the escape sequences \{, \}, \(, and \), which are
291 documented above, an escape followed by any character matches that character.
292 This can be used to make the special characters
294 <pre class="programlisting">.[\*^$</pre>
296 "ordinary". Note that the escape character loses its special meaning
297 inside a character set, so <code class="computeroutput"><span class="special">[\^]</span></code>
298 will match either a literal '\' or a '^'.
301 <a name="boost_regex.syntax.basic_syntax.h16"></a>
302 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.what_gets_matched"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.what_gets_matched">What
306 When there is more that one way to match a regular expression, the "best"
307 possible match is obtained using the <a class="link" href="leftmost_longest_rule.html" title="The Leftmost Longest Rule">leftmost-longest
311 <a name="boost_regex.syntax.basic_syntax.h17"></a>
312 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.variations"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.variations">Variations</a>
314 <a name="boost_regex.grep_syntax"></a><h5>
315 <a name="boost_regex.syntax.basic_syntax.h18"></a>
316 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.grep"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.grep">Grep</a>
319 When an expression is compiled with the flag <code class="computeroutput"><span class="identifier">grep</span></code>
320 set, then the expression is treated as a newline separated list of <a class="link" href="basic_syntax.html#boost_regex.posix_basic">POSIX-Basic expressions</a>, a match
321 is found if any of the expressions in the list match, for example:
323 <pre class="programlisting"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e</span><span class="special">(</span><span class="string">"abc\ndef"</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">grep</span><span class="special">);</span>
326 will match either of the <a class="link" href="basic_syntax.html#boost_regex.posix_basic">POSIX-Basic
327 expressions</a> "abc" or "def".
330 As its name suggests, this behavior is consistent with the Unix utility grep.
333 <a name="boost_regex.syntax.basic_syntax.h19"></a>
334 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.emacs"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.emacs">emacs</a>
337 In addition to the <a class="link" href="basic_syntax.html#boost_regex.posix_basic">POSIX-Basic features</a>
338 the following characters are also special:
340 <div class="informaltable"><table class="table">
366 repeats the preceding atom one or more times.
378 repeats the preceding atom zero or one times.
390 A non-greedy version of *.
402 A non-greedy version of +.
414 A non-greedy version of ?.
421 And the following escape sequences are also recognised:
423 <div class="informaltable"><table class="table">
449 specifies an alternative.
461 is a non-marking grouping construct - allows you to lexically group
462 something without spitting out an extra sub-expression.
474 matches any word character.
486 matches any non-word character.
498 matches any character in the syntax group x, the following emacs
499 groupings are supported: 's', ' ', '_', 'w', '.', ')', '(', '"',
500 '\'', '>' and '<'. Refer to the emacs docs for details.
512 matches any character not in the syntax grouping x.
524 These are not supported.
536 matches zero characters only at the start of a buffer (or string
549 matches zero characters only at the end of a buffer (or string
562 matches zero characters at a word boundary.
574 matches zero characters, not at a word boundary.
586 matches zero characters only at the start of a word.
598 matches zero characters only at the end of a word.
605 Finally, you should note that emacs style regular expressions are matched
606 according to the <a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.what_gets_matched">Perl
607 "depth first search" rules</a>. Emacs expressions are matched
608 this way because they contain Perl-like extensions, that do not interact
609 well with the <a class="link" href="leftmost_longest_rule.html" title="The Leftmost Longest Rule">POSIX-style
610 leftmost-longest rule</a>.
613 <a name="boost_regex.syntax.basic_syntax.h20"></a>
614 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.options"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.options">Options</a>
617 There are a <a class="link" href="../ref/syntax_option_type/syntax_option_type_basic.html" title="Options for POSIX Basic Regular Expressions">variety
618 of flags</a> that may be combined with the <code class="computeroutput"><span class="identifier">basic</span></code>
619 and <code class="computeroutput"><span class="identifier">grep</span></code> options when constructing
620 the regular expression, in particular note that the <a class="link" href="../ref/syntax_option_type/syntax_option_type_basic.html" title="Options for POSIX Basic Regular Expressions"><code class="computeroutput"><span class="identifier">newline_alt</span></code>, <code class="computeroutput"><span class="identifier">no_char_classes</span></code>,
621 <code class="computeroutput"><span class="identifier">no</span><span class="special">-</span><span class="identifier">intervals</span></code>, <code class="computeroutput"><span class="identifier">bk_plus_qm</span></code>
622 and <code class="computeroutput"><span class="identifier">bk_plus_vbar</span></code></a> options
623 all alter the syntax, while the <a class="link" href="../ref/syntax_option_type/syntax_option_type_basic.html" title="Options for POSIX Basic Regular Expressions"><code class="computeroutput"><span class="identifier">collate</span></code> and <code class="computeroutput"><span class="identifier">icase</span></code>
624 options</a> modify how the case and locale sensitivity are to be applied.
627 <a name="boost_regex.syntax.basic_syntax.h21"></a>
628 <span class="phrase"><a name="boost_regex.syntax.basic_syntax.references"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.references">References</a>
631 <a href="http://www.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap09.html" target="_top">IEEE
632 Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions
633 and Headers, Section 9, Regular Expressions (FWD.1).</a>
636 <a href="http://www.opengroup.org/onlinepubs/000095399/utilities/grep.html" target="_top">IEEE
637 Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and
638 Utilities, Section 4, Utilities, grep (FWD.1).</a>
641 <a href="http://www.gnu.org/software/emacs/" target="_top">Emacs Version 21.3.</a>
644 <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
645 <td align="left"></td>
646 <td align="right"><div class="copyright-footer">Copyright © 1998-2013 John Maddock<p>
647 Distributed under the Boost Software License, Version 1.0. (See accompanying
648 file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
653 <div class="spirit-nav">
654 <a accesskey="p" href="basic_extended.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../syntax.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="character_classes.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>