Imported Upstream version 1.72.0
[platform/upstream/boost.git] / libs / sort / doc / html / sort / single_thread / spreadsort / sort_hpp / string_sort.html
1 <html>
2 <head>
3 <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
4 <title>String Sort</title>
5 <link rel="stylesheet" href="../../../../../../../../doc/src/boostbook.css" type="text/css">
6 <meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
7 <link rel="home" href="../../../../index.html" title="Boost.Sort">
8 <link rel="up" href="../sort_hpp.html" title="Spreadsort">
9 <link rel="prev" href="float_sort.html" title="Float Sort">
10 <link rel="next" href="rationale.html" title="Rationale">
11 </head>
12 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
13 <table cellpadding="2" width="100%"><tr>
14 <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../../../boost.png"></td>
15 <td align="center"><a href="../../../../../../../../index.html">Home</a></td>
16 <td align="center"><a href="../../../../../../../../libs/libraries.htm">Libraries</a></td>
17 <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
18 <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
19 <td align="center"><a href="../../../../../../../../more/index.htm">More</a></td>
20 </tr></table>
21 <hr>
22 <div class="spirit-nav">
23 <a accesskey="p" href="float_sort.html"><img src="../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../sort_hpp.html"><img src="../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../index.html"><img src="../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="rationale.html"><img src="../../../../../../../../doc/src/images/next.png" alt="Next"></a>
24 </div>
25 <div class="section">
26 <div class="titlepage"><div><div><h5 class="title">
27 <a name="sort.single_thread.spreadsort.sort_hpp.string_sort"></a><a class="link" href="string_sort.html" title="String Sort">String
28           Sort</a>
29 </h5></div></div></div>
30 <p>
31             <code class="literal"><code class="computeroutput"><a class="link" href="../../../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>
32             is a hybrid radix-based/comparison-based algorithm that sorts strings
33             of characters (or arrays of binary data) in ascending order.
34           </p>
35 <p>
36             The simplest version (no functors) sorts strings of items that can cast
37             to an unsigned data type (such as an unsigned char), have a &amp;lt;
38             operator, have a size function, and have a data() function that returns
39             a pointer to an array of characters, such as a std::string. The functor
40             version can sort any data type that has a strict weak ordering, via templating,
41             but requires definitions of a get_char (acts like x[offset] on a string
42             or a byte array), get_length (returns length of the string being sorted),
43             and a comparison functor. Individual characters returned by get_char
44             must support the != operator and have an unsigned value that defines
45             their lexicographical order.
46           </p>
47 <p>
48             This algorithm is not efficient for character types larger than 2 bytes
49             each, and is optimized for one-byte character strings. For this reason,
50             <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
51             will be called instead if the character type is of size &gt; 2.
52           </p>
53 <p>
54             <code class="literal"><code class="computeroutput"><a class="link" href="../../../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>
55             has a special optimization for identical substrings. This adds some overhead
56             on random data, but identical substrings are common in real strings.
57           </p>
58 <p>
59             reverse_string_sort sorts strings in reverse (decending) order, but is
60             otherwise identical. <code class="literal"><code class="computeroutput"><a class="link" href="../../../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>
61             is sufficiently flexible that it should sort any data type that <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
62             can, assuming the user provides appropriate functors that index into
63             a key.
64           </p>
65 <p>
66             <a href="../../../../../../doc/graph/windows_string_sort.htm" target="_top">Windows String Sort</a>
67           </p>
68 <p>
69             <a href="../../../../../../doc/graph/osx_string_sort.htm" target="_top">OSX String Sort</a>
70           </p>
71 <div class="section">
72 <div class="titlepage"><div><div><h6 class="title">
73 <a name="sort.single_thread.spreadsort.sort_hpp.string_sort.stringsort_examples"></a><a class="link" href="string_sort.html#sort.single_thread.spreadsort.sort_hpp.string_sort.stringsort_examples" title="String Sort Examples">String
74             Sort Examples</a>
75 </h6></div></div></div>
76 <p>
77               See <a href="../../../../../../example/stringfunctorsample.cpp" target="_top">stringfunctorsample.cpp</a>
78               for an example of how to sort structs using a string key and all functors:
79             </p>
80 <pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">lessthan</span> <span class="special">{</span>
81   <span class="keyword">inline</span> <span class="keyword">bool</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&amp;</span><span class="identifier">x</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&amp;</span><span class="identifier">y</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span>
82     <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">a</span> <span class="special">&lt;</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">a</span><span class="special">;</span>
83   <span class="special">}</span>
84 <span class="special">};</span>
85 </pre>
86 <pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">bracket</span> <span class="special">{</span>
87   <span class="keyword">inline</span> <span class="keyword">unsigned</span> <span class="keyword">char</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&amp;</span><span class="identifier">x</span><span class="special">,</span> <span class="identifier">size_t</span> <span class="identifier">offset</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span>
88     <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">a</span><span class="special">[</span><span class="identifier">offset</span><span class="special">];</span>
89   <span class="special">}</span>
90 <span class="special">};</span>
91 </pre>
92 <pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">getsize</span> <span class="special">{</span>
93   <span class="keyword">inline</span> <span class="identifier">size_t</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&amp;</span><span class="identifier">x</span><span class="special">)</span> <span class="keyword">const</span><span class="special">{</span> <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">a</span><span class="special">.</span><span class="identifier">size</span><span class="special">();</span> <span class="special">}</span>
94 <span class="special">};</span>
95 </pre>
96 <p>
97               and these functors are used thus:
98             </p>
99 <pre class="programlisting"><span class="identifier">string_sort</span><span class="special">(</span><span class="identifier">array</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">array</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">bracket</span><span class="special">(),</span> <span class="identifier">getsize</span><span class="special">(),</span> <span class="identifier">lessthan</span><span class="special">());</span>
100 </pre>
101 <p>
102               See <a href="../../../../../../example/generalizedstruct.cpp" target="_top">generalizedstruct.cpp</a>
103               for a working example of a generalized approach to sort structs by
104               a sequence of integer, float, and multiple string keys using string_sort:
105             </p>
106 <pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">DATA_TYPE</span> <span class="special">{</span>
107   <span class="identifier">time_t</span> <span class="identifier">birth</span><span class="special">;</span>
108   <span class="keyword">float</span> <span class="identifier">net_worth</span><span class="special">;</span>
109   <span class="identifier">string</span> <span class="identifier">first_name</span><span class="special">;</span>
110   <span class="identifier">string</span> <span class="identifier">last_name</span><span class="special">;</span>
111 <span class="special">};</span>
112
113 <span class="keyword">static</span> <span class="keyword">const</span> <span class="keyword">int</span> <span class="identifier">birth_size</span> <span class="special">=</span> <span class="keyword">sizeof</span><span class="special">(</span><span class="identifier">time_t</span><span class="special">);</span>
114 <span class="keyword">static</span> <span class="keyword">const</span> <span class="keyword">int</span> <span class="identifier">first_name_offset</span> <span class="special">=</span> <span class="identifier">birth_size</span> <span class="special">+</span> <span class="keyword">sizeof</span><span class="special">(</span><span class="keyword">float</span><span class="special">);</span>
115 <span class="keyword">static</span> <span class="keyword">const</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">uint64_t</span> <span class="identifier">base_mask</span> <span class="special">=</span> <span class="number">0xff</span><span class="special">;</span>
116
117 <span class="keyword">struct</span> <span class="identifier">lessthan</span> <span class="special">{</span>
118   <span class="keyword">inline</span> <span class="keyword">bool</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&amp;</span><span class="identifier">x</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&amp;</span><span class="identifier">y</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span>
119     <span class="keyword">if</span> <span class="special">(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">birth</span> <span class="special">!=</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">birth</span><span class="special">)</span> <span class="special">{</span>
120       <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">birth</span> <span class="special">&lt;</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">birth</span><span class="special">;</span>
121     <span class="special">}</span>
122     <span class="keyword">if</span> <span class="special">(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">net_worth</span> <span class="special">!=</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">net_worth</span><span class="special">)</span> <span class="special">{</span>
123       <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">net_worth</span> <span class="special">&lt;</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">net_worth</span><span class="special">;</span>
124     <span class="special">}</span>
125     <span class="keyword">if</span> <span class="special">(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">first_name</span> <span class="special">!=</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">first_name</span><span class="special">)</span> <span class="special">{</span>
126       <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">first_name</span> <span class="special">&lt;</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">first_name</span><span class="special">;</span>
127     <span class="special">}</span>
128     <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">last_name</span> <span class="special">&lt;</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">last_name</span><span class="special">;</span>
129   <span class="special">}</span>
130 <span class="special">};</span>
131
132 <span class="keyword">struct</span> <span class="identifier">bracket</span> <span class="special">{</span>
133   <span class="keyword">inline</span> <span class="keyword">unsigned</span> <span class="keyword">char</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&amp;</span><span class="identifier">x</span><span class="special">,</span> <span class="identifier">size_t</span> <span class="identifier">offset</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span>
134     <span class="comment">// Sort date as a signed int, returning the appropriate byte.</span>
135     <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special">&lt;</span> <span class="identifier">birth_size</span><span class="special">)</span> <span class="special">{</span>
136       <span class="keyword">const</span> <span class="keyword">int</span> <span class="identifier">bit_shift</span> <span class="special">=</span> <span class="number">8</span> <span class="special">*</span> <span class="special">(</span><span class="identifier">birth_size</span> <span class="special">-</span> <span class="identifier">offset</span> <span class="special">-</span> <span class="number">1</span><span class="special">);</span>
137       <span class="keyword">unsigned</span> <span class="keyword">char</span> <span class="identifier">result</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">birth</span> <span class="special">&amp;</span> <span class="special">(</span><span class="identifier">base_mask</span> <span class="special">&lt;&lt;</span> <span class="identifier">bit_shift</span><span class="special">))</span> <span class="special">&gt;&gt;</span> <span class="identifier">bit_shift</span><span class="special">;</span>
138       <span class="comment">// Handling the sign bit.  Unnecessary if the data is always positive.</span>
139       <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special">==</span> <span class="number">0</span><span class="special">)</span> <span class="special">{</span>
140         <span class="keyword">return</span> <span class="identifier">result</span> <span class="special">^</span> <span class="number">128</span><span class="special">;</span>
141       <span class="special">}</span>
142
143       <span class="keyword">return</span> <span class="identifier">result</span><span class="special">;</span>
144     <span class="special">}</span>
145
146     <span class="comment">// Sort a signed float.  This requires reversing the order of negatives</span>
147     <span class="comment">// because of the way floats are represented in bits.</span>
148     <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special">&lt;</span> <span class="identifier">first_name_offset</span><span class="special">)</span> <span class="special">{</span>
149       <span class="keyword">const</span> <span class="keyword">int</span> <span class="identifier">bit_shift</span> <span class="special">=</span> <span class="number">8</span> <span class="special">*</span> <span class="special">(</span><span class="identifier">first_name_offset</span> <span class="special">-</span> <span class="identifier">offset</span> <span class="special">-</span> <span class="number">1</span><span class="special">);</span>
150       <span class="keyword">unsigned</span> <span class="identifier">key</span> <span class="special">=</span> <span class="identifier">float_mem_cast</span><span class="special">&lt;</span><span class="keyword">float</span><span class="special">,</span> <span class="keyword">unsigned</span><span class="special">&gt;(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">net_worth</span><span class="special">);</span>
151       <span class="keyword">unsigned</span> <span class="keyword">char</span> <span class="identifier">result</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">key</span> <span class="special">&amp;</span> <span class="special">(</span><span class="identifier">base_mask</span> <span class="special">&lt;&lt;</span> <span class="identifier">bit_shift</span><span class="special">))</span> <span class="special">&gt;&gt;</span> <span class="identifier">bit_shift</span><span class="special">;</span>
152       <span class="comment">// Handling the sign.</span>
153       <span class="keyword">if</span> <span class="special">(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">net_worth</span> <span class="special">&lt;</span> <span class="number">0</span><span class="special">)</span> <span class="special">{</span>
154         <span class="keyword">return</span> <span class="number">255</span> <span class="special">-</span> <span class="identifier">result</span><span class="special">;</span>
155       <span class="special">}</span>
156       <span class="comment">// Increasing positives so they are higher than negatives.</span>
157       <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special">==</span> <span class="identifier">birth_size</span><span class="special">)</span> <span class="special">{</span>
158         <span class="keyword">return</span> <span class="number">128</span> <span class="special">+</span> <span class="identifier">result</span><span class="special">;</span>
159       <span class="special">}</span>
160
161       <span class="keyword">return</span> <span class="identifier">result</span><span class="special">;</span>
162     <span class="special">}</span>
163
164     <span class="comment">// Sort a string that is before the end.  This approach supports embedded</span>
165     <span class="comment">// nulls.  If embedded nulls are not required, then just delete the "* 2"</span>
166     <span class="comment">// and the inside of the following if just becomes:</span>
167     <span class="comment">// return x.first_name[offset - first_name_offset];</span>
168     <span class="keyword">const</span> <span class="keyword">unsigned</span> <span class="identifier">first_name_end_offset</span> <span class="special">=</span>
169       <span class="identifier">first_name_offset</span> <span class="special">+</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">first_name</span><span class="special">.</span><span class="identifier">size</span><span class="special">()</span> <span class="special">*</span> <span class="number">2</span><span class="special">;</span>
170     <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special">&lt;</span> <span class="identifier">first_name_end_offset</span><span class="special">)</span> <span class="special">{</span>
171       <span class="keyword">int</span> <span class="identifier">char_offset</span> <span class="special">=</span> <span class="identifier">offset</span> <span class="special">-</span> <span class="identifier">first_name_offset</span><span class="special">;</span>
172       <span class="comment">// This signals that the string continues.</span>
173       <span class="keyword">if</span> <span class="special">(!(</span><span class="identifier">char_offset</span> <span class="special">&amp;</span> <span class="number">1</span><span class="special">))</span> <span class="special">{</span>
174         <span class="keyword">return</span> <span class="number">1</span><span class="special">;</span>
175       <span class="special">}</span>
176       <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">first_name</span><span class="special">[</span><span class="identifier">char_offset</span> <span class="special">&gt;&gt;</span> <span class="number">1</span><span class="special">];</span>
177     <span class="special">}</span>
178
179     <span class="comment">// This signals that the string has ended, so that shorter strings come</span>
180     <span class="comment">// before longer ones.</span>
181     <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special">==</span> <span class="identifier">first_name_end_offset</span><span class="special">)</span> <span class="special">{</span>
182       <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span>
183     <span class="special">}</span>
184
185     <span class="comment">// The final string needs no special consideration.</span>
186     <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">last_name</span><span class="special">[</span><span class="identifier">offset</span> <span class="special">-</span> <span class="identifier">first_name_end_offset</span> <span class="special">-</span> <span class="number">1</span><span class="special">];</span>
187   <span class="special">}</span>
188 <span class="special">};</span>
189
190 <span class="keyword">struct</span> <span class="identifier">getsize</span> <span class="special">{</span>
191   <span class="keyword">inline</span> <span class="identifier">size_t</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&amp;</span><span class="identifier">x</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span>
192     <span class="keyword">return</span> <span class="identifier">first_name_offset</span> <span class="special">+</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">first_name</span><span class="special">.</span><span class="identifier">size</span><span class="special">()</span> <span class="special">*</span> <span class="number">2</span> <span class="special">+</span> <span class="number">1</span> <span class="special">+</span>
193       <span class="identifier">x</span><span class="special">.</span><span class="identifier">last_name</span><span class="special">.</span><span class="identifier">size</span><span class="special">();</span>
194   <span class="special">}</span>
195 <span class="special">};</span>
196 </pre>
197 <pre class="programlisting"><span class="identifier">string_sort</span><span class="special">(</span><span class="identifier">array</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">array</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">bracket</span><span class="special">(),</span> <span class="identifier">getsize</span><span class="special">(),</span> <span class="identifier">lessthan</span><span class="special">());</span>
198 </pre>
199 <p>
200               Other examples:
201             </p>
202 <p>
203               <a href="../../../../../../example/stringsample.cpp" target="_top">String sort.</a>
204             </p>
205 <p>
206               <a href="../../../../../../example/reversestringsample.cpp" target="_top">Reverse string sort.</a>
207             </p>
208 <p>
209               <a href="../../../../../../example/wstringsample.cpp" target="_top">Wide character string
210               sort.</a>
211             </p>
212 <p>
213               <a href="../../../../../../example/caseinsensitive.cpp" target="_top">Case insensitive string
214               sort.</a>
215             </p>
216 <p>
217               <a href="../../../../../../example/charstringsample.cpp" target="_top">Sort structs using
218               a string key and indexing functors.</a>
219             </p>
220 <p>
221               <a href="../../../../../../example/reversestringfunctorsample.cpp" target="_top">Sort structs
222               using a string keynd all functors in reverse order.</a>
223             </p>
224 </div>
225 </div>
226 <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
227 <td align="left"></td>
228 <td align="right"><div class="copyright-footer">Copyright &#169; 2014-2017 Steven
229       Ross, Francisco Tapia, Orson Peters<p>
230         Distributed under the <a href="http://boost.org/LICENSE_1_0.txt" target="_top">Boost
231         Software License, Version 1.0</a>.
232       </p>
233 </div></td>
234 </tr></table>
235 <hr>
236 <div class="spirit-nav">
237 <a accesskey="p" href="float_sort.html"><img src="../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../sort_hpp.html"><img src="../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../index.html"><img src="../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="rationale.html"><img src="../../../../../../../../doc/src/images/next.png" alt="Next"></a>
238 </div>
239 </body>
240 </html>