Imported Upstream version 1.72.0
[platform/upstream/boost.git] / libs / sort / doc / html / sort / single_thread / spreadsort.html
1 <html>
2 <head>
3 <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
4 <title>2.1.-Spreadsort</title>
5 <link rel="stylesheet" href="../../../../../../doc/src/boostbook.css" type="text/css">
6 <meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
7 <link rel="home" href="../../index.html" title="Boost.Sort">
8 <link rel="up" href="../single_thread.html" title="2.- Single Thread Algorithms">
9 <link rel="prev" href="../single_thread.html" title="2.- Single Thread Algorithms">
10 <link rel="next" href="spreadsort/sort_hpp.html" title="Spreadsort">
11 </head>
12 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
13 <table cellpadding="2" width="100%"><tr>
14 <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../boost.png"></td>
15 <td align="center"><a href="../../../../../../index.html">Home</a></td>
16 <td align="center"><a href="../../../../../../libs/libraries.htm">Libraries</a></td>
17 <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
18 <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
19 <td align="center"><a href="../../../../../../more/index.htm">More</a></td>
20 </tr></table>
21 <hr>
22 <div class="spirit-nav">
23 <a accesskey="p" href="../single_thread.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../single_thread.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="spreadsort/sort_hpp.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
24 </div>
25 <div class="section">
26 <div class="titlepage"><div><div><h3 class="title">
27 <a name="sort.single_thread.spreadsort"></a><a class="link" href="spreadsort.html" title="2.1.-Spreadsort">2.1.-Spreadsort</a>
28 </h3></div></div></div>
29 <div class="section">
30 <div class="titlepage"><div><div><h4 class="title">
31 <a name="sort.single_thread.spreadsort.overview"></a><a class="link" href="spreadsort.html#sort.single_thread.spreadsort.overview" title="Overview">Overview</a>
32 </h4></div></div></div>
33 <div class="section">
34 <div class="titlepage"><div><div><h5 class="title">
35 <a name="sort.single_thread.spreadsort.overview.intro"></a><a class="link" href="spreadsort.html#sort.single_thread.spreadsort.overview.intro" title="Introduction">Introduction</a>
36 </h5></div></div></div>
37 <p>
38             Spreadsort combines generic implementations of multiple high-speed sorting
39             algorithms that outperform those in the C++ standard in both average
40             and worst case performance when there are over 1000 elements in the list
41             to sort.
42           </p>
43 <p>
44             They fall back to <a href="http://www.cplusplus.com/reference/algorithm/sort/" target="_top">STL
45             std::sort</a> on small data sets.
46           </p>
47 <div class="warning"><table border="0" summary="Warning">
48 <tr>
49 <td rowspan="2" align="center" valign="top" width="25"><img alt="[Warning]" src="../../../../../../doc/src/images/warning.png"></td>
50 <th align="left">Warning</th>
51 </tr>
52 <tr><td align="left" valign="top"><p>
53               These algorithms all only work on <a href="http://www.cplusplus.com/reference/iterator/RandomAccessIterator/" target="_top">random
54               access iterators</a>.
55             </p></td></tr>
56 </table></div>
57 <p>
58             They are hybrids using both radix and comparison-based sorting, specialized
59             to sorting common data types, such as integers, floats, and strings.
60           </p>
61 <p>
62             These algorithms are encoded in a generic fashion and accept functors,
63             enabling them to sort any object that can be processed like these basic
64             data types. In the case of <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>,
65             this includes anything with a defined strict-weak-ordering that <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
66             can sort, but writing efficient functors for some complex key types may
67             not be worth the additional effort relative to just using <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>,
68             depending on how important speed is to your application. Sample usages
69             are available in the example directory.
70           </p>
71 <p>
72             Unlike many radix-based algorithms, the underlying <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/spreadso_idm46048203074304.html" title="Function template spreadsort">spreadsort</a></code></code>
73             algorithm is designed around <span class="bold"><strong>worst-case performance</strong></span>.
74             It performs better on chunky data (where it is not widely distributed),
75             so that on real data it can perform substantially better than on random
76             data. Conceptually, <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/spreadso_idm46048203074304.html" title="Function template spreadsort">spreadsort</a></code></code>
77             can sort any data for which an absolute ordering can be determined, and
78             <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>
79             is sufficiently flexible that this should be possible.
80           </p>
81 <p>
82             Situations where <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/spreadso_idm46048203074304.html" title="Function template spreadsort">spreadsort</a></code></code>
83             is fastest relative to <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>:
84           </p>
85 <div class="orderedlist"><ol class="orderedlist" type="1">
86 <li class="listitem">
87                 Large number of elements to sort (<span class="emphasis"><em>N</em></span> &gt;= 10000).
88               </li>
89 <li class="listitem">
90                 Slow comparison function (such as floating-point numbers on x86 processors
91                 or strings).
92               </li>
93 <li class="listitem">
94                 Large data elements (such as key + data sorted on a key).
95               </li>
96 <li class="listitem">
97                 Completely sorted data when <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/spreadso_idm46048203074304.html" title="Function template spreadsort">spreadsort</a></code></code>
98                 has an optimization to quit early in this case.
99               </li>
100 </ol></div>
101 <p>
102             Situations where <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/spreadso_idm46048203074304.html" title="Function template spreadsort">spreadsort</a></code></code>
103             is slower than <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>:
104           </p>
105 <div class="orderedlist"><ol class="orderedlist" type="1">
106 <li class="listitem">
107                 Data sorted in reverse order. Both <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
108                 and <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/spreadso_idm46048203074304.html" title="Function template spreadsort">spreadsort</a></code></code>
109                 are faster on reverse-ordered data than randomized data, but <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
110                 speeds up more in this special case.
111               </li>
112 <li class="listitem">
113                 Very small amounts of data (&lt; 1000 elements). For this reason
114                 there is a fallback in <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/spreadso_idm46048203074304.html" title="Function template spreadsort">spreadsort</a></code></code>
115                 to <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
116                 if the input size is less than 1000, so performance is identical
117                 for small amounts of data in practice.
118               </li>
119 </ol></div>
120 <p>
121             These functions are defined in <code class="computeroutput">namespace boost::sort::spreadsort</code>.
122           </p>
123 </div>
124 <div class="section">
125 <div class="titlepage"><div><div><h5 class="title">
126 <a name="sort.single_thread.spreadsort.overview.overloading"></a><a class="link" href="spreadsort.html#sort.single_thread.spreadsort.overview.overloading" title="Overloading">Overloading</a>
127 </h5></div></div></div>
128 <div class="tip"><table border="0" summary="Tip">
129 <tr>
130 <td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="../../../../../../doc/src/images/tip.png"></td>
131 <th align="left">Tip</th>
132 </tr>
133 <tr><td align="left" valign="top"><p>
134               In the Boost.Sort C++ Reference section, click on the appropriate overload,
135               for example <code class="computeroutput">float_sort(RandomAccessIter, RandomAccessIter, Right_shift,
136               Compare);</code> to get full details of that overload.
137             </p></td></tr>
138 </table></div>
139 <p>
140             Each of <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/integer__idm46048203222928.html" title="Function template integer_sort">integer_sort</a></code></code>,
141             <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/float_so_idm46048204416688.html" title="Function template float_sort">float_sort</a></code></code>,
142             and <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>
143             have 3 main versions: The base version, which takes a first iterator
144             and a last iterator, just like <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>:
145           </p>
146 <pre class="programlisting">integer_sort(array.begin(), array.end());
147 float_sort(array.begin(), array.end());
148 string_sort(array.begin(), array.end());
149 </pre>
150 <p>
151             The version with an overridden shift functor, providing flexibility in
152             case the <code class="computeroutput">operator&gt;&gt;</code> already does something other than
153             a bitshift. The rightshift functor takes two args, first the data type,
154             and second a natural number of bits to shift right.
155           </p>
156 <p>
157             For <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>
158             this variant is slightly different; it needs a bracket functor equivalent
159             to <code class="computeroutput">operator</code>[], taking a number corresponding to the character
160             offset, along with a second <code class="computeroutput">getlength</code> functor to get the
161             length of the string in characters. In all cases, this operator must
162             return an integer type that compares with the <code class="computeroutput">operator&lt;</code>
163             to provide the intended order (integers can be negated to reverse their
164             order).
165           </p>
166 <p>
167             In other words (aside from negative floats, which are inverted as ints):
168           </p>
169 <pre class="programlisting">rightshift(A, n) &lt; rightshift(B, n) -&gt; A &lt; B
170 A &lt; B -&gt; rightshift(A, 0) &lt; rightshift(B, 0)
171 </pre>
172 <pre class="programlisting"><span class="identifier">integer_sort</span><span class="special">(</span><span class="identifier">array</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">array</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">rightshift</span><span class="special">());</span>
173 </pre>
174 <pre class="programlisting"><span class="identifier">string_sort</span><span class="special">(</span><span class="identifier">array</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">array</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">bracket</span><span class="special">(),</span> <span class="identifier">getsize</span><span class="special">());</span>
175 </pre>
176 <p>
177             See <a href="../../../../example/rightshiftsample.cpp" target="_top">rightshiftsample.cpp</a>
178             for a working example of integer sorting with a rightshift functor.
179           </p>
180 <p>
181             And a version with a comparison functor for maximum flexibility. This
182             functor must provide the same sorting order as the integers returned
183             by the rightshift (aside from negative floats):
184           </p>
185 <pre class="programlisting">rightshift(A, n) &lt; rightshift(B, n) -&gt; compare(A, B)
186 compare(A, B) -&gt; rightshift(A, 0) &lt; rightshift(B, 0)
187 </pre>
188 <pre class="programlisting"><span class="identifier">integer_sort</span><span class="special">(</span><span class="identifier">array</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">array</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">negrightshift</span><span class="special">(),</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">greater</span><span class="special">&lt;</span><span class="identifier">DATA_TYPE</span><span class="special">&gt;());</span>
189 </pre>
190 <p>
191             Examples of functors are:
192           </p>
193 <pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">lessthan</span> <span class="special">{</span>
194   <span class="keyword">inline</span> <span class="keyword">bool</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&amp;</span><span class="identifier">x</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&amp;</span><span class="identifier">y</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span>
195     <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">a</span> <span class="special">&lt;</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">a</span><span class="special">;</span>
196   <span class="special">}</span>
197 <span class="special">};</span>
198 </pre>
199 <pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">bracket</span> <span class="special">{</span>
200   <span class="keyword">inline</span> <span class="keyword">unsigned</span> <span class="keyword">char</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&amp;</span><span class="identifier">x</span><span class="special">,</span> <span class="identifier">size_t</span> <span class="identifier">offset</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span>
201     <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">a</span><span class="special">[</span><span class="identifier">offset</span><span class="special">];</span>
202   <span class="special">}</span>
203 <span class="special">};</span>
204 </pre>
205 <pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">getsize</span> <span class="special">{</span>
206   <span class="keyword">inline</span> <span class="identifier">size_t</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&amp;</span><span class="identifier">x</span><span class="special">)</span> <span class="keyword">const</span><span class="special">{</span> <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">a</span><span class="special">.</span><span class="identifier">size</span><span class="special">();</span> <span class="special">}</span>
207 <span class="special">};</span>
208 </pre>
209 <p>
210             and these functors are used thus:
211           </p>
212 <pre class="programlisting"><span class="identifier">string_sort</span><span class="special">(</span><span class="identifier">array</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">array</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">bracket</span><span class="special">(),</span> <span class="identifier">getsize</span><span class="special">(),</span> <span class="identifier">lessthan</span><span class="special">());</span>
213 </pre>
214 <p>
215             See <a href="../../../../example/stringfunctorsample.cpp" target="_top">stringfunctorsample.cpp</a>
216             for a working example of sorting strings with all functors.
217           </p>
218 </div>
219 <div class="section">
220 <div class="titlepage"><div><div><h5 class="title">
221 <a name="sort.single_thread.spreadsort.overview.performance"></a><a class="link" href="spreadsort.html#sort.single_thread.spreadsort.overview.performance" title="Performance">Performance</a>
222 </h5></div></div></div>
223 <p>
224             The <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/spreadso_idm46048203074304.html" title="Function template spreadsort">spreadsort</a></code></code>
225             algorithm is a hybrid algorithm; when the number of elements being sorted
226             is below a certain number, comparison-based sorting is used. Above it,
227             radix sorting is used. The radix-based algorithm will thus cut up the
228             problem into small pieces, and either completely sort the data based
229             upon its radix if the data is clustered, or finish sorting the cut-down
230             pieces with comparison-based sorting.
231           </p>
232 <p>
233             The Spreadsort algorithm dynamically chooses either comparison-based
234             or radix-based sorting when recursing, whichever provides better worst-case
235             performance. This way worst-case performance is guaranteed to be the
236             better of <span class="emphasis"><em>&#119926;(N&#8901;log2(N))</em></span> comparisons and <span class="emphasis"><em>&#119926;(N&#8901;log2(K/S
237             + S))</em></span> operations where
238           </p>
239 <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
240 <li class="listitem">
241                 <span class="emphasis"><em>N</em></span> is the number of elements being sorted,
242               </li>
243 <li class="listitem">
244                 <span class="emphasis"><em>K</em></span> is the length in bits of the key, and
245               </li>
246 <li class="listitem">
247                 <span class="emphasis"><em>S</em></span> is a constant.
248               </li>
249 </ul></div>
250 <p>
251             This results in substantially improved performance for large <span class="emphasis"><em>N</em></span>;
252             <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/integer__idm46048203222928.html" title="Function template integer_sort">integer_sort</a></code></code>
253             tends to be 50% to 2X faster than <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>,
254             while <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/float_so_idm46048204416688.html" title="Function template float_sort">float_sort</a></code></code>
255             and _string_sort are roughly 2X faster than <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>.
256           </p>
257 <p>
258             Performance graphs are provided for <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/integer__idm46048203222928.html" title="Function template integer_sort">integer_sort</a></code></code>,
259             <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/float_so_idm46048204416688.html" title="Function template float_sort">float_sort</a></code></code>,
260             and <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>
261             in their description.
262           </p>
263 <p>
264             Runtime Performance comparisons and graphs were made on a Core 2 Duo
265             laptop running Windows Vista 64 with MSVC 8.0, and an old G4 laptop running
266             Mac OSX with gcc. <a href="http://www.boost.org/build/doc/html/" target="_top">Boost
267             bjam/b2</a> was used to control compilation.
268           </p>
269 <p>
270             Direct performance comparisons on a newer x86 system running Ubuntu,
271             with the fallback to <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
272             at lower input sizes disabled are below.
273           </p>
274 <div class="note"><table border="0" summary="Note">
275 <tr>
276 <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../../doc/src/images/note.png"></td>
277 <th align="left">Note</th>
278 </tr>
279 <tr><td align="left" valign="top"><p>
280               The fallback to <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
281               for smaller input sizes prevents the worse performance seen on the
282               left sides of the first two graphs.
283             </p></td></tr>
284 </table></div>
285 <p>
286             <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/integer__idm46048203222928.html" title="Function template integer_sort">integer_sort</a></code></code>
287             starts to become faster than <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
288             at about 1000 integers (4000 bytes), and <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>
289             becomes faster than <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
290             at slightly fewer bytes (as few as 30 strings).
291           </p>
292 <div class="note"><table border="0" summary="Note">
293 <tr>
294 <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../../doc/src/images/note.png"></td>
295 <th align="left">Note</th>
296 </tr>
297 <tr><td align="left" valign="top"><p>
298               The 4-threaded graph has 4 threads doing <span class="bold"><strong>separate
299               sorts simultaneously</strong></span> (not splitting up a single sort) as
300               a test for thread cache collision and other multi-threaded performance
301               issues.
302             </p></td></tr>
303 </table></div>
304 <p>
305             <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/float_so_idm46048204416688.html" title="Function template float_sort">float_sort</a></code></code>
306             times are very similar to <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/integer__idm46048203222928.html" title="Function template integer_sort">integer_sort</a></code></code>
307             times.
308           </p>
309 <p>
310             <span class="inlinemediaobject"><img src="../../../images/single_threaded.png"></span>
311             <span class="inlinemediaobject"><img src="../../../images/4_threaded.png"></span>
312             <span class="inlinemediaobject"><img src="../../../images/entropy.png"></span>
313             <span class="inlinemediaobject"><img src="../../../images/bits_per_byte.png"></span>
314           </p>
315 <p>
316             Histogramming with a fixed maximum number of splits is used because it
317             reduces the number of cache misses, thus improving performance relative
318             to the approach described in detail in the <a href="http://en.wikipedia.org/wiki/Spreadsort" target="_top">original
319             SpreadSort publication</a>.
320           </p>
321 <p>
322             The importance of cache-friendly histogramming is described in <a href="http://www.nik.no/2002/Maus.pdf" target="_top">Arne Maus, Adaptive Left Reflex</a>,
323             though without the worst-case handling described below.
324           </p>
325 <p>
326             The time taken per radix iteration is:
327           </p>
328 <p>
329             <span class="emphasis"><em>&#119926;(N)</em></span> iterations over the data
330           </p>
331 <p>
332             <span class="emphasis"><em>&#119926;(N)</em></span> integer-type comparisons (even for _float_sort
333             and <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>)
334           </p>
335 <p>
336             <span class="emphasis"><em>&#119926;(N)</em></span> swaps
337           </p>
338 <p>
339             <span class="emphasis"><em>&#119926;(2<sup>S</sup>)</em></span> bin operations.
340           </p>
341 <p>
342             To obtain <span class="emphasis"><em>&#119926;(N)</em></span> worst-case performance per iteration,
343             the restriction <span class="emphasis"><em>S &lt;= log2(N)</em></span> is applied, and
344             <span class="emphasis"><em>&#119926;(2<sup>S</sup>)</em></span> becomes <span class="emphasis"><em>&#119926;(N)</em></span>. For each
345             such iteration, the number of unsorted bits log2(range) (referred to
346             as <span class="emphasis"><em>K</em></span>) per element is reduced by <span class="emphasis"><em>S</em></span>.
347             As <span class="emphasis"><em>S</em></span> decreases depending upon the amount of elements
348             being sorted, it can drop from a maximum of <span class="emphasis"><em>S<sub>max</sub></em></span>
349             to the minimum of <span class="emphasis"><em>S<sub>min</sub></em></span>.
350           </p>
351 <p>
352             Assumption: <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
353             is assumed to be <span class="emphasis"><em>&#119926;(N*log2(N))</em></span>, as <a href="http://en.wikipedia.org/wiki/Introsort" target="_top">introsort</a>
354             exists and is commonly used. (If you have a quibble with this please
355             take it up with the implementor of your <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>;
356             you're welcome to replace the recursive calls to <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
357             to calls to <a href="http://en.wikipedia.org/wiki/Introsort" target="_top">introsort</a>
358             if your <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
359             library call is poorly implemented).
360           </p>
361 <p>
362             <a href="http://en.wikipedia.org/wiki/Introsort" target="_top">Introsort</a>
363             is not included with this algorithm for simplicity and because the implementor
364             of the <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
365             call is assumed to know what they're doing.
366           </p>
367 <p>
368             To maintain a minimum value for <span class="emphasis"><em>S (S<sub>min</sub>)</em></span>, comparison-based
369             sorting has to be used to sort when <span class="emphasis"><em>n &lt;= log2(meanbinsize)</em></span>,
370             where <span class="emphasis"><em>log2(meanbinsize) (lbs)</em></span> is a small constant,
371             usually between 0 and 4, used to minimize bin overhead per element. There
372             is a small corner-case where if <span class="emphasis"><em>K &lt; S<sub>min</sub></em></span> and
373             <span class="emphasis"><em>n &gt;= 2^K</em></span>, then the data can be sorted in a single
374             radix-based iteration with an <span class="emphasis"><em>S = K</em></span> (this bucketsorting
375             special case is by default only applied to <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/float_so_idm46048204416688.html" title="Function template float_sort">float_sort</a></code></code>).
376             So for the final recursion, worst-case performance is:
377           </p>
378 <p>
379             1 radix-based iteration if <span class="emphasis"><em>K &lt;= S<sub>min</sub></em></span>,
380           </p>
381 <p>
382             or <span class="emphasis"><em>S<sub>min</sub> + lbs</em></span> comparison-based iterations if <span class="emphasis"><em>K
383             &gt; S<sub>min</sub></em></span> but <span class="emphasis"><em>n &lt;= 2<sup>(S<sub>min</sub> + lbs)</sup></em></span>.
384           </p>
385 <p>
386             So for the final iteration, worst-case runtime is <span class="emphasis"><em>&#119926;(N*(S<sub>min</sub> +
387             lbs))</em></span> but if <span class="emphasis"><em>K &gt; S<sub>min</sub></em></span> and <span class="emphasis"><em>N
388             &gt; 2<sup>(S<sub>min</sub> + lbs)</sup></em></span> then more than 1 radix recursion will be
389             required.
390           </p>
391 <p>
392             For the second to last iteration, <span class="emphasis"><em>K &lt;= S<sub>min</sub> * 2 + 1</em></span>
393             can be handled, (if the data is divided into <span class="emphasis"><em>2<sup>(S<sub>min</sub> + 1)</sup></em></span>
394             pieces) or if <span class="emphasis"><em>N &lt; 2<sup>(S<sub>min</sub> + lbs + 1)</sup></em></span>, then it is
395             faster to fallback to <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>.
396           </p>
397 <p>
398             In the case of a radix-based sort plus recursion, it will take <span class="emphasis"><em>&#119926;(N*(S<sub>min</sub> +
399             lbs)) + &#119926;(N) = &#119926;(N*(S<sub>min</sub> + lbs + 1))</em></span> worst-case time, as <span class="emphasis"><em>K_remaining
400             = K_start - (S<sub>min</sub> + 1)</em></span>, and <span class="emphasis"><em>K_start &lt;= S<sub>min</sub> * 2
401             + 1</em></span>.
402           </p>
403 <p>
404             Alternatively, comparison-based sorting is used if <span class="emphasis"><em>N &lt; 2<sup>(S<sub>min</sub> +
405             lbs + 1)</sup></em></span>, which will take <span class="emphasis"><em>&#119926;(N*(S<sub>min</sub> + lbs + 1))</em></span>
406             time.
407           </p>
408 <p>
409             So either way <span class="emphasis"><em>&#119926;(N*(S<sub>min</sub> + lbs + 1))</em></span> is the worst-case
410             time for the second to last iteration, which occurs if <span class="emphasis"><em>K &lt;=
411             S<sub>min</sub> * 2 + </em></span>1 or <span class="emphasis"><em>N &lt; 2<sup>(S<sub>min</sub> + lbs + 1)</sup></em></span>.
412           </p>
413 <p>
414             This continues as long as <span class="emphasis"><em>S<sub>min</sub> &lt;= S &lt;= S<sub>max</sub></em></span>,
415             so that for <span class="emphasis"><em>K_m &lt;= K_(m-1) + S<sub>min</sub> + m</em></span> where <span class="emphasis"><em>m</em></span>
416             is the maximum number of iterations after this one has finished, or where
417             <span class="emphasis"><em>N &lt; 2<sup>(S<sub>min</sub> + lbs + m)</sup></em></span>, then the worst-case runtime
418             is <span class="emphasis"><em>&#119926;(N*(S<sub>min</sub> + lbs + m))</em></span>.
419           </p>
420 <p>
421             &#160; &#160;<span class="emphasis"><em>K_m</em></span> at <span class="emphasis"><em>m &lt;= (S<sub>max</sub> - S<sub>min</sub>)</em></span>
422             works out to:
423           </p>
424 <p>
425             &#160; &#160;<span class="emphasis"><em>K_1 &lt;= (S<sub>min</sub>) + S<sub>min</sub> + 1 &lt;= 2S<sub>min</sub> + 1</em></span>
426           </p>
427 <p>
428             &#160; &#160;<span class="emphasis"><em>K_2 &lt;= (2S<sub>min</sub> + 1) + S<sub>min</sub> + 2</em></span>
429           </p>
430 <p>
431             as the sum from 0 to <span class="emphasis"><em>m</em></span> is <span class="emphasis"><em>m(m + 1)/2</em></span>
432           </p>
433 <p>
434             &#160; &#160;<span class="emphasis"><em>K_m &lt;= (m + 1)S<sub>min</sub> + m(m + 1)/2 &lt;= (S<sub>min</sub> + m/2)(m + 1)</em></span>
435           </p>
436 <p>
437             substituting in S<sub>max</sub> - S<sub>min</sub> for m
438           </p>
439 <p>
440             &#160; &#160;<span class="emphasis"><em>K_(S<sub>max</sub> - S<sub>min</sub>) &lt;= (S<sub>min</sub> + (S<sub>max</sub> - S<sub>min</sub>)/2)*(S<sub>max</sub> - S<sub>min</sub> + 1)</em></span>
441           </p>
442 <p>
443             &#160; &#160;<span class="emphasis"><em>K_(S<sub>max</sub> - S<sub>min</sub>) &lt;= (S<sub>min</sub> + S<sub>max</sub>) * (S<sub>max</sub> - S<sub>min</sub> + 1)/2</em></span>
444           </p>
445 <p>
446             Since this involves <span class="emphasis"><em>S<sub>max</sub> - S<sub>min</sub> + 1</em></span> iterations, this
447             works out to dividing <span class="emphasis"><em>K</em></span> into an average <span class="emphasis"><em>(S<sub>min</sub> +
448             S<sub>max</sub>)</em></span>/2 pieces per iteration.
449           </p>
450 <p>
451             To finish the problem from this point takes <span class="emphasis"><em>&#119926;(N * (S<sub>max</sub> - S<sub>min</sub>))</em></span>
452             for <span class="emphasis"><em>m</em></span> iterations, plus the worst-case of <span class="emphasis"><em>&#119926;(N*(S<sub>min</sub> +
453             lbs))</em></span> for the last iteration, for a total of <span class="emphasis"><em>&#119926;(N
454             *(S<sub>max</sub> + lbs))</em></span> time.
455           </p>
456 <p>
457             When <span class="emphasis"><em>m &gt; S<sub>max</sub> - S<sub>min</sub></em></span>, the problem is divided into
458             <span class="emphasis"><em>S<sub>max</sub></em></span> pieces per iteration, or <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
459             is called if <span class="emphasis"><em>N &lt; 2^(m + S<sub>min</sub> + lbs)</em></span>. For this
460             range:
461           </p>
462 <p>
463             &#160; &#160;<span class="emphasis"><em>K_m &lt;= K_(m - 1) + S<sub>max</sub></em></span>, providing runtime of
464           </p>
465 <p>
466             &#160; &#160;<span class="emphasis"><em>&#119926;(N *((K - K_(S<sub>max</sub> - S<sub>min</sub>))/S<sub>max</sub> + S<sub>max</sub> + lbs))</em></span> if recursive,
467           </p>
468 <p>
469             or <span class="emphasis"><em>&#119926;(N * log(2^(m + S<sub>min</sub> + lbs)))</em></span> if comparison-based,
470           </p>
471 <p>
472             which simplifies to <span class="emphasis"><em>&#119926;(N * (m + S<sub>min</sub> + lbs))</em></span>, which
473             substitutes to <span class="emphasis"><em>&#119926;(N * ((m - (S<sub>max</sub> - S<sub>min</sub>)) + S<sub>max</sub> + lbs))</em></span>,
474             which given that <span class="emphasis"><em>m - (S<sub>max</sub> - S<sub>min</sub>) &lt;= (K - K_(S<sub>max</sub> - S<sub>min</sub>))/S<sub>max</sub></em></span>
475             (otherwise a lesser number of radix-based iterations would be used)
476           </p>
477 <p>
478             also comes out to <span class="emphasis"><em>&#119926;(N *((K - K_(S<sub>max</sub> - S<sub>min</sub>))/S<sub>max</sub> + S<sub>max</sub> + lbs))</em></span>.
479           </p>
480 <p>
481             Asymptotically, for large <span class="emphasis"><em>N</em></span> and large <span class="emphasis"><em>K</em></span>,
482             this simplifies to:
483           </p>
484 <p>
485             &#160; &#160;<span class="emphasis"><em>&#119926;(N * (K/S<sub>max</sub> + S<sub>max</sub> + lbs))</em></span>,
486           </p>
487 <p>
488             simplifying out the constants related to the <span class="emphasis"><em>S<sub>max</sub> - S<sub>min</sub></em></span>
489             range, providing an additional <span class="emphasis"><em>&#119926;(N * (S<sub>max</sub> + lbs))</em></span>
490             runtime on top of the <span class="emphasis"><em>&#119926;(N * (K/S))</em></span> performance of
491             LSD <a href="http://en.wikipedia.org/wiki/Radix_sort" target="_top">radix sort</a>,
492             but without the <span class="emphasis"><em>&#119926;(N)</em></span> memory overhead. For simplicity,
493             because <span class="emphasis"><em>lbs</em></span> is a small constant (0 can be used,
494             and performs reasonably), it is ignored when summarizing the performance
495             in further discussions. By checking whether comparison-based sorting
496             is better, Spreadsort is also <span class="emphasis"><em>&#119926;(N*log(N))</em></span>, whichever
497             is better, and unlike LSD <a href="http://en.wikipedia.org/wiki/Radix_sort" target="_top">radix
498             sort</a>, can perform much better than the worst-case if the data
499             is either evenly distributed or highly clustered.
500           </p>
501 <p>
502             This analysis was for <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/integer__idm46048203222928.html" title="Function template integer_sort">integer_sort</a></code></code>
503             and <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/float_so_idm46048204416688.html" title="Function template float_sort">float_sort</a></code></code>.
504             <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>
505             differs in that <span class="emphasis"><em>S<sub>min</sub> = S<sub>max</sub> = sizeof(Char_type) * 8</em></span>,
506             <span class="emphasis"><em>lbs</em></span> is 0, and that <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>'s
507             comparison is not a constant-time operation, so strictly speaking <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>
508             runtime is
509           </p>
510 <p>
511             &#160; &#160;<span class="emphasis"><em>&#119926;(N * (K/S<sub>max</sub> + (S<sub>max</sub> comparisons)))</em></span>.
512           </p>
513 <p>
514             Worst-case, this ends up being <span class="emphasis"><em>&#119926;(N * K)</em></span> (where <span class="emphasis"><em>K</em></span>
515             is the mean string length in bytes), as described for <a href="http://en.wikipedia.org/wiki/American_flag_sort" target="_top">American
516             flag sort</a>, which is better than the
517           </p>
518 <p>
519             &#160; &#160;<span class="emphasis"><em>&#119926;(N * K * log(N))</em></span>
520           </p>
521 <p>
522             worst-case for comparison-based sorting.
523           </p>
524 </div>
525 <div class="section">
526 <div class="titlepage"><div><div><h5 class="title">
527 <a name="sort.single_thread.spreadsort.overview.tuning"></a><a class="link" href="spreadsort.html#sort.single_thread.spreadsort.overview.tuning" title="Tuning">Tuning</a>
528 </h5></div></div></div>
529 <p>
530             <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/integer__idm46048203222928.html" title="Function template integer_sort">integer_sort</a></code></code>
531             and <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/float_so_idm46048204416688.html" title="Function template float_sort">float_sort</a></code></code>
532             have tuning constants that control how the radix-sorting portion of those
533             algorithms work. The ideal constant values for <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/integer__idm46048203222928.html" title="Function template integer_sort">integer_sort</a></code></code>
534             and <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/float_so_idm46048204416688.html" title="Function template float_sort">float_sort</a></code></code>
535             vary depending on the platform, compiler, and data being sorted. By far
536             the most important constant is <span class="emphasis"><em>max_splits</em></span>, which
537             defines how many pieces the radix-sorting portion splits the data into
538             per iteration.
539           </p>
540 <p>
541             The ideal value of <span class="emphasis"><em>max_splits</em></span> depends upon the size
542             of the L1 processor cache, and is between 10 and 13 on many systems.
543             A default value of 11 is used. For mostly-sorted data, a much larger
544             value is better, as swaps (and thus cache misses) are rare, but this
545             hurts runtime severely for unsorted data, so is not recommended.
546           </p>
547 <p>
548             On some x86 systems, when the total number of elements being sorted is
549             small ( less than 1 million or so), the ideal <span class="emphasis"><em>max_splits</em></span>
550             can be substantially larger, such as 17. This is suspected to be because
551             all the data fits into the L2 cache, and misses from L1 cache to L2 cache
552             do not impact performance as severely as misses to main memory. Modifying
553             tuning constants other than <span class="emphasis"><em>max_splits</em></span> is not recommended,
554             as the performance improvement for changing other constants is usually
555             minor.
556           </p>
557 <p>
558             If you can afford to let it run for a day, and have at least 1GB of free
559             memory, the perl command: <code class="computeroutput">./tune.pl -large -tune</code> (UNIX)
560             or <code class="computeroutput">perl tune.pl -large -tune -windows</code> (Windows) can be used
561             to automatically tune these constants. This should be run from the <code class="computeroutput">libs/sort
562             directory</code> inside the boost home directory. This will work to identify
563             the <code class="computeroutput">ideal constants.hpp</code> settings for your system, testing
564             on various distributions in a 20 million element (80MB) file, and additionally
565             verifies that all sorting routines sort correctly across various data
566             distributions. Alternatively, you can test with the file size you're
567             most concerned with <code class="computeroutput">./tune.pl number -tune</code> (UNIX) or <code class="computeroutput">perl
568             tune.pl number -tune -windows</code> (Windows). Substitute the number
569             of elements you want to test with for <code class="computeroutput">number</code>. Otherwise,
570             just use the options it comes with, they're decent. With default settings
571             <code class="computeroutput">./tune.pl -tune</code> (UNIX) <code class="computeroutput">perl tune.pl -tune -windows</code>
572             (Windows), the script will take hours to run (less than a day), but may
573             not pick the correct <span class="emphasis"><em>max_splits</em></span> if it is over 10.
574             Alternatively, you can add the <code class="computeroutput">-small</code> option to make it
575             take just a few minutes, tuning for smaller vector sizes (one hundred
576             thousand elements), but the resulting constants may not be good for large
577             files (see above note about <span class="emphasis"><em>max_splits</em></span> on Windows).
578           </p>
579 <p>
580             The tuning script can also be used just to verify that sorting works
581             correctly on your system, and see how much of a speedup it gets, by omiting
582             the "-tune" option. This runs at the end of tuning runs. Default
583             args will take about an hour to run and give accurate results on decent-sized
584             test vectors. <code class="computeroutput">./tune.pl -small</code> (UNIX) <code class="computeroutput">perl tune.pl
585             -small -windows</code> (Windows) is a faster option, that tests on smaller
586             vectors and isn't as accurate.
587           </p>
588 <p>
589             If any differences are encountered during tuning, please call <code class="computeroutput">tune.pl</code>
590             with <code class="computeroutput">-debug &gt; log_file_name</code>. If the resulting log file
591             contains compilation or permissions issues, it is likely an issue with
592             your setup. If some other type of error is encountered (or result differences),
593             please send them to the library author at spreadsort@gmail.com. Including
594             the zipped <code class="computeroutput">input.txt</code> that was being used is also helpful.
595           </p>
596 </div>
597 </div>
598 </div>
599 <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
600 <td align="left"></td>
601 <td align="right"><div class="copyright-footer">Copyright &#169; 2014-2017 Steven
602       Ross, Francisco Tapia, Orson Peters<p>
603         Distributed under the <a href="http://boost.org/LICENSE_1_0.txt" target="_top">Boost
604         Software License, Version 1.0</a>.
605       </p>
606 </div></td>
607 </tr></table>
608 <hr>
609 <div class="spirit-nav">
610 <a accesskey="p" href="../single_thread.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../single_thread.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="spreadsort/sort_hpp.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
611 </div>
612 </body>
613 </html>