3 <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
4 <title>String Sort</title>
5 <link rel="stylesheet" href="../../../../../../../../doc/src/boostbook.css" type="text/css">
6 <meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
7 <link rel="home" href="../../../../index.html" title="Boost.Sort">
8 <link rel="up" href="../sort_hpp.html" title="Spreadsort">
9 <link rel="prev" href="float_sort.html" title="Float Sort">
10 <link rel="next" href="rationale.html" title="Rationale">
12 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
13 <table cellpadding="2" width="100%"><tr>
14 <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../../../boost.png"></td>
15 <td align="center"><a href="../../../../../../../../index.html">Home</a></td>
16 <td align="center"><a href="../../../../../../../../libs/libraries.htm">Libraries</a></td>
17 <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
18 <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
19 <td align="center"><a href="../../../../../../../../more/index.htm">More</a></td>
22 <div class="spirit-nav">
23 <a accesskey="p" href="float_sort.html"><img src="../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../sort_hpp.html"><img src="../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../index.html"><img src="../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="rationale.html"><img src="../../../../../../../../doc/src/images/next.png" alt="Next"></a>
26 <div class="titlepage"><div><div><h5 class="title">
27 <a name="sort.single_thread.spreadsort.sort_hpp.string_sort"></a><a class="link" href="string_sort.html" title="String Sort">String
29 </h5></div></div></div>
31 <code class="literal"><code class="computeroutput"><a class="link" href="../../../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>
32 is a hybrid radix-based/comparison-based algorithm that sorts strings
33 of characters (or arrays of binary data) in ascending order.
36 The simplest version (no functors) sorts strings of items that can cast
37 to an unsigned data type (such as an unsigned char), have a &lt;
38 operator, have a size function, and have a data() function that returns
39 a pointer to an array of characters, such as a std::string. The functor
40 version can sort any data type that has a strict weak ordering, via templating,
41 but requires definitions of a get_char (acts like x[offset] on a string
42 or a byte array), get_length (returns length of the string being sorted),
43 and a comparison functor. Individual characters returned by get_char
44 must support the != operator and have an unsigned value that defines
45 their lexicographical order.
48 This algorithm is not efficient for character types larger than 2 bytes
49 each, and is optimized for one-byte character strings. For this reason,
50 <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
51 will be called instead if the character type is of size > 2.
54 <code class="literal"><code class="computeroutput"><a class="link" href="../../../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>
55 has a special optimization for identical substrings. This adds some overhead
56 on random data, but identical substrings are common in real strings.
59 reverse_string_sort sorts strings in reverse (decending) order, but is
60 otherwise identical. <code class="literal"><code class="computeroutput"><a class="link" href="../../../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>
61 is sufficiently flexible that it should sort any data type that <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
62 can, assuming the user provides appropriate functors that index into
66 <a href="../../../../../../doc/graph/windows_string_sort.htm" target="_top">Windows String Sort</a>
69 <a href="../../../../../../doc/graph/osx_string_sort.htm" target="_top">OSX String Sort</a>
72 <div class="titlepage"><div><div><h6 class="title">
73 <a name="sort.single_thread.spreadsort.sort_hpp.string_sort.stringsort_examples"></a><a class="link" href="string_sort.html#sort.single_thread.spreadsort.sort_hpp.string_sort.stringsort_examples" title="String Sort Examples">String
75 </h6></div></div></div>
77 See <a href="../../../../../../example/stringfunctorsample.cpp" target="_top">stringfunctorsample.cpp</a>
78 for an example of how to sort structs using a string key and all functors:
80 <pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">lessthan</span> <span class="special">{</span>
81 <span class="keyword">inline</span> <span class="keyword">bool</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">x</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">y</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span>
82 <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">a</span> <span class="special"><</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">a</span><span class="special">;</span>
83 <span class="special">}</span>
84 <span class="special">};</span>
86 <pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">bracket</span> <span class="special">{</span>
87 <span class="keyword">inline</span> <span class="keyword">unsigned</span> <span class="keyword">char</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">x</span><span class="special">,</span> <span class="identifier">size_t</span> <span class="identifier">offset</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span>
88 <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">a</span><span class="special">[</span><span class="identifier">offset</span><span class="special">];</span>
89 <span class="special">}</span>
90 <span class="special">};</span>
92 <pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">getsize</span> <span class="special">{</span>
93 <span class="keyword">inline</span> <span class="identifier">size_t</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">x</span><span class="special">)</span> <span class="keyword">const</span><span class="special">{</span> <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">a</span><span class="special">.</span><span class="identifier">size</span><span class="special">();</span> <span class="special">}</span>
94 <span class="special">};</span>
97 and these functors are used thus:
99 <pre class="programlisting"><span class="identifier">string_sort</span><span class="special">(</span><span class="identifier">array</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">array</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">bracket</span><span class="special">(),</span> <span class="identifier">getsize</span><span class="special">(),</span> <span class="identifier">lessthan</span><span class="special">());</span>
102 See <a href="../../../../../../example/generalizedstruct.cpp" target="_top">generalizedstruct.cpp</a>
103 for a working example of a generalized approach to sort structs by
104 a sequence of integer, float, and multiple string keys using string_sort:
106 <pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">DATA_TYPE</span> <span class="special">{</span>
107 <span class="identifier">time_t</span> <span class="identifier">birth</span><span class="special">;</span>
108 <span class="keyword">float</span> <span class="identifier">net_worth</span><span class="special">;</span>
109 <span class="identifier">string</span> <span class="identifier">first_name</span><span class="special">;</span>
110 <span class="identifier">string</span> <span class="identifier">last_name</span><span class="special">;</span>
111 <span class="special">};</span>
113 <span class="keyword">static</span> <span class="keyword">const</span> <span class="keyword">int</span> <span class="identifier">birth_size</span> <span class="special">=</span> <span class="keyword">sizeof</span><span class="special">(</span><span class="identifier">time_t</span><span class="special">);</span>
114 <span class="keyword">static</span> <span class="keyword">const</span> <span class="keyword">int</span> <span class="identifier">first_name_offset</span> <span class="special">=</span> <span class="identifier">birth_size</span> <span class="special">+</span> <span class="keyword">sizeof</span><span class="special">(</span><span class="keyword">float</span><span class="special">);</span>
115 <span class="keyword">static</span> <span class="keyword">const</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">uint64_t</span> <span class="identifier">base_mask</span> <span class="special">=</span> <span class="number">0xff</span><span class="special">;</span>
117 <span class="keyword">struct</span> <span class="identifier">lessthan</span> <span class="special">{</span>
118 <span class="keyword">inline</span> <span class="keyword">bool</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">x</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">y</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span>
119 <span class="keyword">if</span> <span class="special">(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">birth</span> <span class="special">!=</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">birth</span><span class="special">)</span> <span class="special">{</span>
120 <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">birth</span> <span class="special"><</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">birth</span><span class="special">;</span>
121 <span class="special">}</span>
122 <span class="keyword">if</span> <span class="special">(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">net_worth</span> <span class="special">!=</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">net_worth</span><span class="special">)</span> <span class="special">{</span>
123 <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">net_worth</span> <span class="special"><</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">net_worth</span><span class="special">;</span>
124 <span class="special">}</span>
125 <span class="keyword">if</span> <span class="special">(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">first_name</span> <span class="special">!=</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">first_name</span><span class="special">)</span> <span class="special">{</span>
126 <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">first_name</span> <span class="special"><</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">first_name</span><span class="special">;</span>
127 <span class="special">}</span>
128 <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">last_name</span> <span class="special"><</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">last_name</span><span class="special">;</span>
129 <span class="special">}</span>
130 <span class="special">};</span>
132 <span class="keyword">struct</span> <span class="identifier">bracket</span> <span class="special">{</span>
133 <span class="keyword">inline</span> <span class="keyword">unsigned</span> <span class="keyword">char</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">x</span><span class="special">,</span> <span class="identifier">size_t</span> <span class="identifier">offset</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span>
134 <span class="comment">// Sort date as a signed int, returning the appropriate byte.</span>
135 <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special"><</span> <span class="identifier">birth_size</span><span class="special">)</span> <span class="special">{</span>
136 <span class="keyword">const</span> <span class="keyword">int</span> <span class="identifier">bit_shift</span> <span class="special">=</span> <span class="number">8</span> <span class="special">*</span> <span class="special">(</span><span class="identifier">birth_size</span> <span class="special">-</span> <span class="identifier">offset</span> <span class="special">-</span> <span class="number">1</span><span class="special">);</span>
137 <span class="keyword">unsigned</span> <span class="keyword">char</span> <span class="identifier">result</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">birth</span> <span class="special">&</span> <span class="special">(</span><span class="identifier">base_mask</span> <span class="special"><<</span> <span class="identifier">bit_shift</span><span class="special">))</span> <span class="special">>></span> <span class="identifier">bit_shift</span><span class="special">;</span>
138 <span class="comment">// Handling the sign bit. Unnecessary if the data is always positive.</span>
139 <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special">==</span> <span class="number">0</span><span class="special">)</span> <span class="special">{</span>
140 <span class="keyword">return</span> <span class="identifier">result</span> <span class="special">^</span> <span class="number">128</span><span class="special">;</span>
141 <span class="special">}</span>
143 <span class="keyword">return</span> <span class="identifier">result</span><span class="special">;</span>
144 <span class="special">}</span>
146 <span class="comment">// Sort a signed float. This requires reversing the order of negatives</span>
147 <span class="comment">// because of the way floats are represented in bits.</span>
148 <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special"><</span> <span class="identifier">first_name_offset</span><span class="special">)</span> <span class="special">{</span>
149 <span class="keyword">const</span> <span class="keyword">int</span> <span class="identifier">bit_shift</span> <span class="special">=</span> <span class="number">8</span> <span class="special">*</span> <span class="special">(</span><span class="identifier">first_name_offset</span> <span class="special">-</span> <span class="identifier">offset</span> <span class="special">-</span> <span class="number">1</span><span class="special">);</span>
150 <span class="keyword">unsigned</span> <span class="identifier">key</span> <span class="special">=</span> <span class="identifier">float_mem_cast</span><span class="special"><</span><span class="keyword">float</span><span class="special">,</span> <span class="keyword">unsigned</span><span class="special">>(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">net_worth</span><span class="special">);</span>
151 <span class="keyword">unsigned</span> <span class="keyword">char</span> <span class="identifier">result</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">key</span> <span class="special">&</span> <span class="special">(</span><span class="identifier">base_mask</span> <span class="special"><<</span> <span class="identifier">bit_shift</span><span class="special">))</span> <span class="special">>></span> <span class="identifier">bit_shift</span><span class="special">;</span>
152 <span class="comment">// Handling the sign.</span>
153 <span class="keyword">if</span> <span class="special">(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">net_worth</span> <span class="special"><</span> <span class="number">0</span><span class="special">)</span> <span class="special">{</span>
154 <span class="keyword">return</span> <span class="number">255</span> <span class="special">-</span> <span class="identifier">result</span><span class="special">;</span>
155 <span class="special">}</span>
156 <span class="comment">// Increasing positives so they are higher than negatives.</span>
157 <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special">==</span> <span class="identifier">birth_size</span><span class="special">)</span> <span class="special">{</span>
158 <span class="keyword">return</span> <span class="number">128</span> <span class="special">+</span> <span class="identifier">result</span><span class="special">;</span>
159 <span class="special">}</span>
161 <span class="keyword">return</span> <span class="identifier">result</span><span class="special">;</span>
162 <span class="special">}</span>
164 <span class="comment">// Sort a string that is before the end. This approach supports embedded</span>
165 <span class="comment">// nulls. If embedded nulls are not required, then just delete the "* 2"</span>
166 <span class="comment">// and the inside of the following if just becomes:</span>
167 <span class="comment">// return x.first_name[offset - first_name_offset];</span>
168 <span class="keyword">const</span> <span class="keyword">unsigned</span> <span class="identifier">first_name_end_offset</span> <span class="special">=</span>
169 <span class="identifier">first_name_offset</span> <span class="special">+</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">first_name</span><span class="special">.</span><span class="identifier">size</span><span class="special">()</span> <span class="special">*</span> <span class="number">2</span><span class="special">;</span>
170 <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special"><</span> <span class="identifier">first_name_end_offset</span><span class="special">)</span> <span class="special">{</span>
171 <span class="keyword">int</span> <span class="identifier">char_offset</span> <span class="special">=</span> <span class="identifier">offset</span> <span class="special">-</span> <span class="identifier">first_name_offset</span><span class="special">;</span>
172 <span class="comment">// This signals that the string continues.</span>
173 <span class="keyword">if</span> <span class="special">(!(</span><span class="identifier">char_offset</span> <span class="special">&</span> <span class="number">1</span><span class="special">))</span> <span class="special">{</span>
174 <span class="keyword">return</span> <span class="number">1</span><span class="special">;</span>
175 <span class="special">}</span>
176 <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">first_name</span><span class="special">[</span><span class="identifier">char_offset</span> <span class="special">>></span> <span class="number">1</span><span class="special">];</span>
177 <span class="special">}</span>
179 <span class="comment">// This signals that the string has ended, so that shorter strings come</span>
180 <span class="comment">// before longer ones.</span>
181 <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special">==</span> <span class="identifier">first_name_end_offset</span><span class="special">)</span> <span class="special">{</span>
182 <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span>
183 <span class="special">}</span>
185 <span class="comment">// The final string needs no special consideration.</span>
186 <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">last_name</span><span class="special">[</span><span class="identifier">offset</span> <span class="special">-</span> <span class="identifier">first_name_end_offset</span> <span class="special">-</span> <span class="number">1</span><span class="special">];</span>
187 <span class="special">}</span>
188 <span class="special">};</span>
190 <span class="keyword">struct</span> <span class="identifier">getsize</span> <span class="special">{</span>
191 <span class="keyword">inline</span> <span class="identifier">size_t</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">x</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span>
192 <span class="keyword">return</span> <span class="identifier">first_name_offset</span> <span class="special">+</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">first_name</span><span class="special">.</span><span class="identifier">size</span><span class="special">()</span> <span class="special">*</span> <span class="number">2</span> <span class="special">+</span> <span class="number">1</span> <span class="special">+</span>
193 <span class="identifier">x</span><span class="special">.</span><span class="identifier">last_name</span><span class="special">.</span><span class="identifier">size</span><span class="special">();</span>
194 <span class="special">}</span>
195 <span class="special">};</span>
197 <pre class="programlisting"><span class="identifier">string_sort</span><span class="special">(</span><span class="identifier">array</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">array</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">bracket</span><span class="special">(),</span> <span class="identifier">getsize</span><span class="special">(),</span> <span class="identifier">lessthan</span><span class="special">());</span>
203 <a href="../../../../../../example/stringsample.cpp" target="_top">String sort.</a>
206 <a href="../../../../../../example/reversestringsample.cpp" target="_top">Reverse string sort.</a>
209 <a href="../../../../../../example/wstringsample.cpp" target="_top">Wide character string
213 <a href="../../../../../../example/caseinsensitive.cpp" target="_top">Case insensitive string
217 <a href="../../../../../../example/charstringsample.cpp" target="_top">Sort structs using
218 a string key and indexing functors.</a>
221 <a href="../../../../../../example/reversestringfunctorsample.cpp" target="_top">Sort structs
222 using a string keynd all functors in reverse order.</a>
226 <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
227 <td align="left"></td>
228 <td align="right"><div class="copyright-footer">Copyright © 2014-2017 Steven
229 Ross, Francisco Tapia, Orson Peters<p>
230 Distributed under the <a href="http://boost.org/LICENSE_1_0.txt" target="_top">Boost
231 Software License, Version 1.0</a>.
236 <div class="spirit-nav">
237 <a accesskey="p" href="float_sort.html"><img src="../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../sort_hpp.html"><img src="../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../index.html"><img src="../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="rationale.html"><img src="../../../../../../../../doc/src/images/next.png" alt="Next"></a>