3 <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
4 <title>Understanding Marked Sub-Expressions and Captures</title>
5 <link rel="stylesheet" href="../../../../../doc/src/boostbook.css" type="text/css">
6 <meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
7 <link rel="home" href="../index.html" title="Boost.Regex 5.1.3">
8 <link rel="up" href="../index.html" title="Boost.Regex 5.1.3">
9 <link rel="prev" href="unicode.html" title="Unicode and Boost.Regex">
10 <link rel="next" href="partial_matches.html" title="Partial Matches">
12 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
13 <table cellpadding="2" width="100%"><tr>
14 <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../boost.png"></td>
15 <td align="center"><a href="../../../../../index.html">Home</a></td>
16 <td align="center"><a href="../../../../../libs/libraries.htm">Libraries</a></td>
17 <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
18 <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
19 <td align="center"><a href="../../../../../more/index.htm">More</a></td>
22 <div class="spirit-nav">
23 <a accesskey="p" href="unicode.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="partial_matches.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
26 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
27 <a name="boost_regex.captures"></a><a class="link" href="captures.html" title="Understanding Marked Sub-Expressions and Captures">Understanding Marked Sub-Expressions
29 </h2></div></div></div>
31 Captures are the iterator ranges that are "captured" by marked sub-expressions
32 as a regular expression gets matched. Each marked sub-expression can result
33 in more than one capture, if it is matched more than once. This document explains
34 how captures and marked sub-expressions in Boost.Regex are represented and
38 <a name="boost_regex.captures.h0"></a>
39 <span class="phrase"><a name="boost_regex.captures.marked_sub_expressions"></a></span><a class="link" href="captures.html#boost_regex.captures.marked_sub_expressions">Marked
43 Every time a Perl regular expression contains a parenthesis group <code class="computeroutput"><span class="special">()</span></code>, it spits out an extra field, known as a
44 marked sub-expression, for example the expression:
46 <pre class="programlisting">(\w+)\W+(\w+)</pre>
48 Has two marked sub-expressions (known as $1 and $2 respectively), in addition
49 the complete match is known as $&, everything before the first match as
50 $`, and everything after the match as $'. So if the above expression is searched
51 for within <code class="computeroutput"><span class="string">"@abc def--"</span></code>,
54 <div class="informaltable"><table class="table">
135 In Boost.Regex all these are accessible via the <a class="link" href="ref/match_results.html" title="match_results"><code class="computeroutput"><span class="identifier">match_results</span></code></a> class that gets filled
136 in when calling one of the regular expression matching algorithms ( <a class="link" href="ref/regex_search.html" title="regex_search"><code class="computeroutput"><span class="identifier">regex_search</span></code></a>, <a class="link" href="ref/regex_match.html" title="regex_match"><code class="computeroutput"><span class="identifier">regex_match</span></code></a>, or <a class="link" href="ref/regex_iterator.html" title="regex_iterator"><code class="computeroutput"><span class="identifier">regex_iterator</span></code></a>). So given:
138 <pre class="programlisting"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">match_results</span><span class="special"><</span><span class="identifier">IteratorType</span><span class="special">></span> <span class="identifier">m</span><span class="special">;</span>
141 The Perl and Boost.Regex equivalents are as follows:
143 <div class="informaltable"><table class="table">
169 <code class="computeroutput"><span class="identifier">m</span><span class="special">.</span><span class="identifier">prefix</span><span class="special">()</span></code>
181 <code class="computeroutput"><span class="identifier">m</span><span class="special">[</span><span class="number">0</span><span class="special">]</span></code>
193 <code class="computeroutput"><span class="identifier">m</span><span class="special">[</span><span class="identifier">n</span><span class="special">]</span></code>
205 <code class="computeroutput"><span class="identifier">m</span><span class="special">.</span><span class="identifier">suffix</span><span class="special">()</span></code>
212 In Boost.Regex each sub-expression match is represented by a <a class="link" href="ref/sub_match.html" title="sub_match"><code class="computeroutput"><span class="identifier">sub_match</span></code></a> object, this is basically
213 just a pair of iterators denoting the start and end position of the sub-expression
214 match, but there are some additional operators provided so that objects of
215 type <a class="link" href="ref/sub_match.html" title="sub_match"><code class="computeroutput"><span class="identifier">sub_match</span></code></a>
216 behave a lot like a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">basic_string</span></code>: for example they are implicitly
217 convertible to a <code class="computeroutput"><span class="identifier">basic_string</span></code>,
218 they can be compared to a string, added to a string, or streamed out to an
222 <a name="boost_regex.captures.h1"></a>
223 <span class="phrase"><a name="boost_regex.captures.unmatched_sub_expressions"></a></span><a class="link" href="captures.html#boost_regex.captures.unmatched_sub_expressions">Unmatched
227 When a regular expression match is found there is no need for all of the marked
228 sub-expressions to have participated in the match, for example the expression:
230 <pre class="programlisting">(abc)|(def)</pre>
232 can match either $1 or $2, but never both at the same time. In Boost.Regex
233 you can determine which sub-expressions matched by accessing the <code class="computeroutput"><span class="identifier">sub_match</span><span class="special">::</span><span class="identifier">matched</span></code> data member.
236 <a name="boost_regex.captures.h2"></a>
237 <span class="phrase"><a name="boost_regex.captures.repeated_captures"></a></span><a class="link" href="captures.html#boost_regex.captures.repeated_captures">Repeated
241 When a marked sub-expression is repeated, then the sub-expression gets "captured"
242 multiple times, however normally only the final capture is available, for example
245 <pre class="programlisting">(?:(\w+)\W+)+</pre>
249 <pre class="programlisting">one fine day</pre>
251 Then $1 will contain the string "day", and all the previous captures
252 will have been forgotten.
255 However, Boost.Regex has an experimental feature that allows all the capture
256 information to be retained - this is accessed either via the <code class="computeroutput"><span class="identifier">match_results</span><span class="special">::</span><span class="identifier">captures</span></code> member function or the <code class="computeroutput"><span class="identifier">sub_match</span><span class="special">::</span><span class="identifier">captures</span></code> member function. These functions
257 return a container that contains a sequence of all the captures obtained during
258 the regular expression matching. The following example program shows how this
259 information may be used:
261 <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">regex</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span>
262 <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span>
264 <span class="keyword">void</span> <span class="identifier">print_captures</span><span class="special">(</span><span class="keyword">const</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">&</span> <span class="identifier">regx</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">&</span> <span class="identifier">text</span><span class="special">)</span>
265 <span class="special">{</span>
266 <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e</span><span class="special">(</span><span class="identifier">regx</span><span class="special">);</span>
267 <span class="identifier">boost</span><span class="special">::</span><span class="identifier">smatch</span> <span class="identifier">what</span><span class="special">;</span>
268 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">"Expression: \""</span> <span class="special"><<</span> <span class="identifier">regx</span> <span class="special"><<</span> <span class="string">"\"\n"</span><span class="special">;</span>
269 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">"Text: \""</span> <span class="special"><<</span> <span class="identifier">text</span> <span class="special"><<</span> <span class="string">"\"\n"</span><span class="special">;</span>
270 <span class="keyword">if</span><span class="special">(</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex_match</span><span class="special">(</span><span class="identifier">text</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">e</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">match_extra</span><span class="special">))</span>
271 <span class="special">{</span>
272 <span class="keyword">unsigned</span> <span class="identifier">i</span><span class="special">,</span> <span class="identifier">j</span><span class="special">;</span>
273 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">"** Match found **\n Sub-Expressions:\n"</span><span class="special">;</span>
274 <span class="keyword">for</span><span class="special">(</span><span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> <span class="identifier">i</span> <span class="special"><</span> <span class="identifier">what</span><span class="special">.</span><span class="identifier">size</span><span class="special">();</span> <span class="special">++</span><span class="identifier">i</span><span class="special">)</span>
275 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">" $"</span> <span class="special"><<</span> <span class="identifier">i</span> <span class="special"><<</span> <span class="string">" = \""</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="identifier">i</span><span class="special">]</span> <span class="special"><<</span> <span class="string">"\"\n"</span><span class="special">;</span>
276 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">" Captures:\n"</span><span class="special">;</span>
277 <span class="keyword">for</span><span class="special">(</span><span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> <span class="identifier">i</span> <span class="special"><</span> <span class="identifier">what</span><span class="special">.</span><span class="identifier">size</span><span class="special">();</span> <span class="special">++</span><span class="identifier">i</span><span class="special">)</span>
278 <span class="special">{</span>
279 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">" $"</span> <span class="special"><<</span> <span class="identifier">i</span> <span class="special"><<</span> <span class="string">" = {"</span><span class="special">;</span>
280 <span class="keyword">for</span><span class="special">(</span><span class="identifier">j</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> <span class="identifier">j</span> <span class="special"><</span> <span class="identifier">what</span><span class="special">.</span><span class="identifier">captures</span><span class="special">(</span><span class="identifier">i</span><span class="special">).</span><span class="identifier">size</span><span class="special">();</span> <span class="special">++</span><span class="identifier">j</span><span class="special">)</span>
281 <span class="special">{</span>
282 <span class="keyword">if</span><span class="special">(</span><span class="identifier">j</span><span class="special">)</span>
283 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">", "</span><span class="special">;</span>
284 <span class="keyword">else</span>
285 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">" "</span><span class="special">;</span>
286 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">"\""</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">.</span><span class="identifier">captures</span><span class="special">(</span><span class="identifier">i</span><span class="special">)[</span><span class="identifier">j</span><span class="special">]</span> <span class="special"><<</span> <span class="string">"\""</span><span class="special">;</span>
287 <span class="special">}</span>
288 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">" }\n"</span><span class="special">;</span>
289 <span class="special">}</span>
290 <span class="special">}</span>
291 <span class="keyword">else</span>
292 <span class="special">{</span>
293 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">"** No Match found **\n"</span><span class="special">;</span>
294 <span class="special">}</span>
295 <span class="special">}</span>
297 <span class="keyword">int</span> <span class="identifier">main</span><span class="special">(</span><span class="keyword">int</span> <span class="special">,</span> <span class="keyword">char</span><span class="special">*</span> <span class="special">[])</span>
298 <span class="special">{</span>
299 <span class="identifier">print_captures</span><span class="special">(</span><span class="string">"(([[:lower:]]+)|([[:upper:]]+))+"</span><span class="special">,</span> <span class="string">"aBBcccDDDDDeeeeeeee"</span><span class="special">);</span>
300 <span class="identifier">print_captures</span><span class="special">(</span><span class="string">"(.*)bar|(.*)bah"</span><span class="special">,</span> <span class="string">"abcbar"</span><span class="special">);</span>
301 <span class="identifier">print_captures</span><span class="special">(</span><span class="string">"(.*)bar|(.*)bah"</span><span class="special">,</span> <span class="string">"abcbah"</span><span class="special">);</span>
302 <span class="identifier">print_captures</span><span class="special">(</span><span class="string">"^(?:(\\w+)|(?>\\W+))*$"</span><span class="special">,</span>
303 <span class="string">"now is the time for all good men to come to the aid of the party"</span><span class="special">);</span>
304 <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span>
305 <span class="special">}</span>
308 Which produces the following output:
310 <pre class="programlisting">Expression: "(([[:lower:]]+)|([[:upper:]]+))+"
311 Text: "aBBcccDDDDDeeeeeeee"
314 $0 = "aBBcccDDDDDeeeeeeee"
319 $0 = { "aBBcccDDDDDeeeeeeee" }
320 $1 = { "a", "BB", "ccc", "DDDDD", "eeeeeeee" }
321 $2 = { "a", "ccc", "eeeeeeee" }
322 $3 = { "BB", "DDDDD" }
323 Expression: "(.*)bar|(.*)bah"
334 Expression: "(.*)bar|(.*)bah"
345 Expression: "^(?:(\w+)|(?>\W+))*$"
346 Text: "now is the time for all good men to come to the aid of the party"
349 $0 = "now is the time for all good men to come to the aid of the party"
352 $0 = { "now is the time for all good men to come to the aid of the party" }
353 $1 = { "now", "is", "the", "time", "for", "all", "good", "men", "to",
354 "come", "to", "the", "aid", "of", "the", "party" }
357 Unfortunately enabling this feature has an impact on performance (even if you
358 don't use it), and a much bigger impact if you do use it, therefore to use
359 this feature you need to:
361 <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
362 <li class="listitem">
363 Define BOOST_REGEX_MATCH_EXTRA for all translation units including the
364 library source (the best way to do this is to uncomment this define in
365 boost/regex/user.hpp and then rebuild everything.
367 <li class="listitem">
368 Pass the match_extra flag to the particular algorithms where you actually
369 need the captures information (regex_search, regex_match, or regex_iterator).
373 <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
374 <td align="left"></td>
375 <td align="right"><div class="copyright-footer">Copyright © 1998-2013 John Maddock<p>
376 Distributed under the Boost Software License, Version 1.0. (See accompanying
377 file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
382 <div class="spirit-nav">
383 <a accesskey="p" href="unicode.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="partial_matches.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>