Imported Upstream version 1.51.0
[platform/upstream/boost.git] / libs / algorithm / doc / html / the_boost_algorithm_library / Searching / BoyerMooreHorspool.html
1 <html>
2 <head>
3 <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
4 <title>Boyer-Moore-Horspool Search</title>
5 <link rel="stylesheet" href="../../../../../../doc/src/boostbook.css" type="text/css">
6 <meta name="generator" content="DocBook XSL Stylesheets V1.76.1">
7 <link rel="home" href="../../index.html" title="The Boost Algorithm Library">
8 <link rel="up" href="../../algorithm/Searching.html" title="Searching Algorithms">
9 <link rel="prev" href="../../algorithm/Searching.html" title="Searching Algorithms">
10 <link rel="next" href="KnuthMorrisPratt.html" title="Knuth-Morris-Pratt Search">
11 </head>
12 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
13 <table cellpadding="2" width="100%"><tr>
14 <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../boost.png"></td>
15 <td align="center"><a href="../../../../../../index.html">Home</a></td>
16 <td align="center"><a href="../../../../../../libs/libraries.htm">Libraries</a></td>
17 <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
18 <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
19 <td align="center"><a href="../../../../../../more/index.htm">More</a></td>
20 </tr></table>
21 <hr>
22 <div class="spirit-nav">
23 <a accesskey="p" href="../../algorithm/Searching.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../../algorithm/Searching.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="KnuthMorrisPratt.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
24 </div>
25 <div class="section">
26 <div class="titlepage"><div><div><h3 class="title">
27 <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool"></a><a class="link" href="BoyerMooreHorspool.html" title="Boyer-Moore-Horspool Search">Boyer-Moore-Horspool
28       Search</a>
29 </h3></div></div></div>
30 <h5>
31 <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h0"></a>
32         <span><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.overview"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.overview">Overview</a>
33       </h5>
34 <p>
35         The header file 'boyer_moore_horspool.hpp' contains an an implementation
36         of the Boyer-Moore-Horspool algorithm for searching sequences of values.
37       </p>
38 <p>
39         The Boyer-Moore-Horspool search algorithm was published by Nigel Horspool
40         in 1980. It is a refinement of the Boyer-Moore algorithm that trades space
41         for time. It uses less space for internal tables than Boyer-Moore, and has
42         poorer worst-case performance.
43       </p>
44 <p>
45         The Boyer-Moore-Horspool algorithm cannot be used with comparison predicates
46         like <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">search</span></code>.
47       </p>
48 <h5>
49 <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h1"></a>
50         <span><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.interface"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.interface">Interface</a>
51       </h5>
52 <p>
53         Nomenclature: I refer to the sequence being searched for as the "pattern",
54         and the sequence being searched in as the "corpus".
55       </p>
56 <p>
57         For flexibility, the Boyer-Moore-Horspool algorithm has has two interfaces;
58         an object-based interface and a procedural one. The object-based interface
59         builds the tables in the constructor, and uses operator () to perform the
60         search. The procedural interface builds the table and does the search all
61         in one step. If you are going to be searching for the same pattern in multiple
62         corpora, then you should use the object interface, and only build the tables
63         once.
64       </p>
65 <p>
66         Here is the object interface:
67 </p>
68 <pre class="programlisting"><span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">patIter</span><span class="special">&gt;</span>
69 <span class="keyword">class</span> <span class="identifier">boyer_moore_horspool</span> <span class="special">{</span>
70 <span class="keyword">public</span><span class="special">:</span>
71     <span class="identifier">boyer_moore_horspool</span> <span class="special">(</span> <span class="identifier">patIter</span> <span class="identifier">first</span><span class="special">,</span> <span class="identifier">patIter</span> <span class="identifier">last</span> <span class="special">);</span>
72     <span class="special">~</span><span class="identifier">boyer_moore_horspool</span> <span class="special">();</span>
73
74     <span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">corpusIter</span><span class="special">&gt;</span>
75     <span class="identifier">corpusIter</span> <span class="keyword">operator</span> <span class="special">()</span> <span class="special">(</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_first</span><span class="special">,</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_last</span> <span class="special">);</span>
76     <span class="special">};</span>
77 </pre>
78 <p>
79       </p>
80 <p>
81         and here is the corresponding procedural interface:
82       </p>
83 <p>
84 </p>
85 <pre class="programlisting"><span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">patIter</span><span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">corpusIter</span><span class="special">&gt;</span>
86 <span class="identifier">corpusIter</span> <span class="identifier">boyer_moore_horspool_search</span> <span class="special">(</span>
87         <span class="identifier">corpusIter</span> <span class="identifier">corpus_first</span><span class="special">,</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_last</span><span class="special">,</span>
88         <span class="identifier">patIter</span> <span class="identifier">pat_first</span><span class="special">,</span> <span class="identifier">patIter</span> <span class="identifier">pat_last</span> <span class="special">);</span>
89 </pre>
90 <p>
91       </p>
92 <p>
93         Each of the functions is passed two pairs of iterators. The first two define
94         the corpus and the second two define the pattern. Note that the two pairs
95         need not be of the same type, but they do need to "point" at the
96         same type. In other words, <code class="computeroutput"><span class="identifier">patIter</span><span class="special">::</span><span class="identifier">value_type</span></code>
97         and <code class="computeroutput"><span class="identifier">curpusIter</span><span class="special">::</span><span class="identifier">value_type</span></code> need to be the same type.
98       </p>
99 <p>
100         The return value of the function is an iterator pointing to the start of
101         the pattern in the corpus. If the pattern is not found, it returns the end
102         of the corpus (<code class="computeroutput"><span class="identifier">corpus_last</span></code>).
103       </p>
104 <h5>
105 <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h2"></a>
106         <span><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.performance"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.performance">Performance</a>
107       </h5>
108 <p>
109         The execution time of the Boyer-Moore-Horspool algorithm is linear in the
110         size of the string being searched; it can have a significantly lower constant
111         factor than many other search algorithms: it doesn't need to check every
112         character of the string to be searched, but rather skips over some of them.
113         Generally the algorithm gets faster as the pattern being searched for becomes
114         longer. Its efficiency derives from the fact that with each unsuccessful
115         attempt to find a match between the search string and the text it is searching,
116         it uses the information gained from that attempt to rule out as many positions
117         of the text as possible where the string cannot match.
118       </p>
119 <h5>
120 <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h3"></a>
121         <span><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.memory_use"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.memory_use">Memory
122         Use</a>
123       </h5>
124 <p>
125         The algorithm an internal table that has one entry for each member of the
126         "alphabet" in the pattern. For (8-bit) character types, this table
127         contains 256 entries.
128       </p>
129 <h5>
130 <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h4"></a>
131         <span><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.complexity"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.complexity">Complexity</a>
132       </h5>
133 <p>
134         The worst-case performance is <span class="emphasis"><em>O(m x n)</em></span>, where <span class="emphasis"><em>m</em></span>
135         is the length of the pattern and <span class="emphasis"><em>n</em></span> is the length of
136         the corpus. The average time is <span class="emphasis"><em>O(n)</em></span>. The best case
137         performance is sub-linear, and is, in fact, identical to Boyer-Moore, but
138         the initialization is quicker and the internal loop is simpler than Boyer-Moore.
139       </p>
140 <h5>
141 <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h5"></a>
142         <span><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.exception_safety"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.exception_safety">Exception
143         Safety</a>
144       </h5>
145 <p>
146         Both the object-oriented and procedural versions of the Boyer-Moore-Horspool
147         algorithm take their parameters by value and do not use any information other
148         than what is passed in. Therefore, both interfaces provide the strong exception
149         guarantee.
150       </p>
151 <h5>
152 <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h6"></a>
153         <span><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.notes"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.notes">Notes</a>
154       </h5>
155 <div class="itemizedlist"><ul class="itemizedlist" type="disc">
156 <li class="listitem">
157             When using the object-based interface, the pattern must remain unchanged
158             for during the searches; i.e, from the time the object is constructed
159             until the final call to operator () returns.
160           </li>
161 <li class="listitem">
162             The Boyer-Moore-Horspool algorithm requires random-access iterators for
163             both the pattern and the corpus.
164           </li>
165 </ul></div>
166 <h5>
167 <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h7"></a>
168         <span><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.customization_points"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.customization_points">Customization
169         points</a>
170       </h5>
171 <p>
172         The Boyer-Moore-Horspool object takes a traits template parameter which enables
173         the caller to customize how the precomputed table is stored. This table,
174         called the skip table, contains (logically) one entry for every possible
175         value that the pattern can contain. When searching 8-bit character data,
176         this table contains 256 elements. The traits class defines the table to be
177         used.
178       </p>
179 <p>
180         The default traits class uses a <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">array</span></code>
181         for small 'alphabets' and a <code class="computeroutput"><span class="identifier">tr1</span><span class="special">::</span><span class="identifier">unordered_map</span></code>
182         for larger ones. The array-based skip table gives excellent performance,
183         but could be prohibitively large when the 'alphabet' of elements to be searched
184         grows. The unordered_map based version only grows as the number of unique
185         elements in the pattern, but makes many more heap allocations, and gives
186         slower lookup performance.
187       </p>
188 <p>
189         To use a different skip table, you should define your own skip table object
190         and your own traits class, and use them to instantiate the Boyer-Moore-Horspool
191         object. The interface to these objects is described TBD.
192       </p>
193 </div>
194 <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
195 <td align="left"></td>
196 <td align="right"><div class="copyright-footer">Copyright &#169; 2010-2012 Marshall Clow<p>
197         Distributed under the Boost Software License, Version 1.0. (See accompanying
198         file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
199       </p>
200 </div></td>
201 </tr></table>
202 <hr>
203 <div class="spirit-nav">
204 <a accesskey="p" href="../../algorithm/Searching.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../../algorithm/Searching.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="KnuthMorrisPratt.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
205 </div>
206 </body>
207 </html>