3 <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
4 <title>2.1.-Spreadsort</title>
5 <link rel="stylesheet" href="../../../../../../doc/src/boostbook.css" type="text/css">
6 <meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
7 <link rel="home" href="../../index.html" title="Boost.Sort">
8 <link rel="up" href="../single_thread.html" title="2.- Single Thread Algorithms">
9 <link rel="prev" href="../single_thread.html" title="2.- Single Thread Algorithms">
10 <link rel="next" href="spreadsort/sort_hpp.html" title="Spreadsort">
12 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
13 <table cellpadding="2" width="100%"><tr>
14 <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../boost.png"></td>
15 <td align="center"><a href="../../../../../../index.html">Home</a></td>
16 <td align="center"><a href="../../../../../../libs/libraries.htm">Libraries</a></td>
17 <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
18 <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
19 <td align="center"><a href="../../../../../../more/index.htm">More</a></td>
22 <div class="spirit-nav">
23 <a accesskey="p" href="../single_thread.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../single_thread.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="spreadsort/sort_hpp.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
26 <div class="titlepage"><div><div><h3 class="title">
27 <a name="sort.single_thread.spreadsort"></a><a class="link" href="spreadsort.html" title="2.1.-Spreadsort">2.1.-Spreadsort</a>
28 </h3></div></div></div>
30 <div class="titlepage"><div><div><h4 class="title">
31 <a name="sort.single_thread.spreadsort.overview"></a><a class="link" href="spreadsort.html#sort.single_thread.spreadsort.overview" title="Overview">Overview</a>
32 </h4></div></div></div>
34 <div class="titlepage"><div><div><h5 class="title">
35 <a name="sort.single_thread.spreadsort.overview.intro"></a><a class="link" href="spreadsort.html#sort.single_thread.spreadsort.overview.intro" title="Introduction">Introduction</a>
36 </h5></div></div></div>
38 Spreadsort combines generic implementations of multiple high-speed sorting
39 algorithms that outperform those in the C++ standard in both average
40 and worst case performance when there are over 1000 elements in the list
44 They fall back to <a href="http://www.cplusplus.com/reference/algorithm/sort/" target="_top">STL
45 std::sort</a> on small data sets.
47 <div class="warning"><table border="0" summary="Warning">
49 <td rowspan="2" align="center" valign="top" width="25"><img alt="[Warning]" src="../../../../../../doc/src/images/warning.png"></td>
50 <th align="left">Warning</th>
52 <tr><td align="left" valign="top"><p>
53 These algorithms all only work on <a href="http://www.cplusplus.com/reference/iterator/RandomAccessIterator/" target="_top">random
58 They are hybrids using both radix and comparison-based sorting, specialized
59 to sorting common data types, such as integers, floats, and strings.
62 These algorithms are encoded in a generic fashion and accept functors,
63 enabling them to sort any object that can be processed like these basic
64 data types. In the case of <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>,
65 this includes anything with a defined strict-weak-ordering that <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
66 can sort, but writing efficient functors for some complex key types may
67 not be worth the additional effort relative to just using <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>,
68 depending on how important speed is to your application. Sample usages
69 are available in the example directory.
72 Unlike many radix-based algorithms, the underlying <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/spreadso_idm46048203074304.html" title="Function template spreadsort">spreadsort</a></code></code>
73 algorithm is designed around <span class="bold"><strong>worst-case performance</strong></span>.
74 It performs better on chunky data (where it is not widely distributed),
75 so that on real data it can perform substantially better than on random
76 data. Conceptually, <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/spreadso_idm46048203074304.html" title="Function template spreadsort">spreadsort</a></code></code>
77 can sort any data for which an absolute ordering can be determined, and
78 <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>
79 is sufficiently flexible that this should be possible.
82 Situations where <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/spreadso_idm46048203074304.html" title="Function template spreadsort">spreadsort</a></code></code>
83 is fastest relative to <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>:
85 <div class="orderedlist"><ol class="orderedlist" type="1">
87 Large number of elements to sort (<span class="emphasis"><em>N</em></span> >= 10000).
90 Slow comparison function (such as floating-point numbers on x86 processors
94 Large data elements (such as key + data sorted on a key).
97 Completely sorted data when <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/spreadso_idm46048203074304.html" title="Function template spreadsort">spreadsort</a></code></code>
98 has an optimization to quit early in this case.
102 Situations where <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/spreadso_idm46048203074304.html" title="Function template spreadsort">spreadsort</a></code></code>
103 is slower than <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>:
105 <div class="orderedlist"><ol class="orderedlist" type="1">
106 <li class="listitem">
107 Data sorted in reverse order. Both <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
108 and <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/spreadso_idm46048203074304.html" title="Function template spreadsort">spreadsort</a></code></code>
109 are faster on reverse-ordered data than randomized data, but <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
110 speeds up more in this special case.
112 <li class="listitem">
113 Very small amounts of data (< 1000 elements). For this reason
114 there is a fallback in <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/spreadso_idm46048203074304.html" title="Function template spreadsort">spreadsort</a></code></code>
115 to <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
116 if the input size is less than 1000, so performance is identical
117 for small amounts of data in practice.
121 These functions are defined in <code class="computeroutput">namespace boost::sort::spreadsort</code>.
124 <div class="section">
125 <div class="titlepage"><div><div><h5 class="title">
126 <a name="sort.single_thread.spreadsort.overview.overloading"></a><a class="link" href="spreadsort.html#sort.single_thread.spreadsort.overview.overloading" title="Overloading">Overloading</a>
127 </h5></div></div></div>
128 <div class="tip"><table border="0" summary="Tip">
130 <td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="../../../../../../doc/src/images/tip.png"></td>
131 <th align="left">Tip</th>
133 <tr><td align="left" valign="top"><p>
134 In the Boost.Sort C++ Reference section, click on the appropriate overload,
135 for example <code class="computeroutput">float_sort(RandomAccessIter, RandomAccessIter, Right_shift,
136 Compare);</code> to get full details of that overload.
140 Each of <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/integer__idm46048203222928.html" title="Function template integer_sort">integer_sort</a></code></code>,
141 <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/float_so_idm46048204416688.html" title="Function template float_sort">float_sort</a></code></code>,
142 and <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>
143 have 3 main versions: The base version, which takes a first iterator
144 and a last iterator, just like <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>:
146 <pre class="programlisting">integer_sort(array.begin(), array.end());
147 float_sort(array.begin(), array.end());
148 string_sort(array.begin(), array.end());
151 The version with an overridden shift functor, providing flexibility in
152 case the <code class="computeroutput">operator>></code> already does something other than
153 a bitshift. The rightshift functor takes two args, first the data type,
154 and second a natural number of bits to shift right.
157 For <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>
158 this variant is slightly different; it needs a bracket functor equivalent
159 to <code class="computeroutput">operator</code>[], taking a number corresponding to the character
160 offset, along with a second <code class="computeroutput">getlength</code> functor to get the
161 length of the string in characters. In all cases, this operator must
162 return an integer type that compares with the <code class="computeroutput">operator<</code>
163 to provide the intended order (integers can be negated to reverse their
167 In other words (aside from negative floats, which are inverted as ints):
169 <pre class="programlisting">rightshift(A, n) < rightshift(B, n) -> A < B
170 A < B -> rightshift(A, 0) < rightshift(B, 0)
172 <pre class="programlisting"><span class="identifier">integer_sort</span><span class="special">(</span><span class="identifier">array</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">array</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">rightshift</span><span class="special">());</span>
174 <pre class="programlisting"><span class="identifier">string_sort</span><span class="special">(</span><span class="identifier">array</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">array</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">bracket</span><span class="special">(),</span> <span class="identifier">getsize</span><span class="special">());</span>
177 See <a href="../../../../example/rightshiftsample.cpp" target="_top">rightshiftsample.cpp</a>
178 for a working example of integer sorting with a rightshift functor.
181 And a version with a comparison functor for maximum flexibility. This
182 functor must provide the same sorting order as the integers returned
183 by the rightshift (aside from negative floats):
185 <pre class="programlisting">rightshift(A, n) < rightshift(B, n) -> compare(A, B)
186 compare(A, B) -> rightshift(A, 0) < rightshift(B, 0)
188 <pre class="programlisting"><span class="identifier">integer_sort</span><span class="special">(</span><span class="identifier">array</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">array</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">negrightshift</span><span class="special">(),</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">greater</span><span class="special"><</span><span class="identifier">DATA_TYPE</span><span class="special">>());</span>
191 Examples of functors are:
193 <pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">lessthan</span> <span class="special">{</span>
194 <span class="keyword">inline</span> <span class="keyword">bool</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">x</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">y</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span>
195 <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">a</span> <span class="special"><</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">a</span><span class="special">;</span>
196 <span class="special">}</span>
197 <span class="special">};</span>
199 <pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">bracket</span> <span class="special">{</span>
200 <span class="keyword">inline</span> <span class="keyword">unsigned</span> <span class="keyword">char</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">x</span><span class="special">,</span> <span class="identifier">size_t</span> <span class="identifier">offset</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span>
201 <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">a</span><span class="special">[</span><span class="identifier">offset</span><span class="special">];</span>
202 <span class="special">}</span>
203 <span class="special">};</span>
205 <pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">getsize</span> <span class="special">{</span>
206 <span class="keyword">inline</span> <span class="identifier">size_t</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">x</span><span class="special">)</span> <span class="keyword">const</span><span class="special">{</span> <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">a</span><span class="special">.</span><span class="identifier">size</span><span class="special">();</span> <span class="special">}</span>
207 <span class="special">};</span>
210 and these functors are used thus:
212 <pre class="programlisting"><span class="identifier">string_sort</span><span class="special">(</span><span class="identifier">array</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">array</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">bracket</span><span class="special">(),</span> <span class="identifier">getsize</span><span class="special">(),</span> <span class="identifier">lessthan</span><span class="special">());</span>
215 See <a href="../../../../example/stringfunctorsample.cpp" target="_top">stringfunctorsample.cpp</a>
216 for a working example of sorting strings with all functors.
219 <div class="section">
220 <div class="titlepage"><div><div><h5 class="title">
221 <a name="sort.single_thread.spreadsort.overview.performance"></a><a class="link" href="spreadsort.html#sort.single_thread.spreadsort.overview.performance" title="Performance">Performance</a>
222 </h5></div></div></div>
224 The <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/spreadso_idm46048203074304.html" title="Function template spreadsort">spreadsort</a></code></code>
225 algorithm is a hybrid algorithm; when the number of elements being sorted
226 is below a certain number, comparison-based sorting is used. Above it,
227 radix sorting is used. The radix-based algorithm will thus cut up the
228 problem into small pieces, and either completely sort the data based
229 upon its radix if the data is clustered, or finish sorting the cut-down
230 pieces with comparison-based sorting.
233 The Spreadsort algorithm dynamically chooses either comparison-based
234 or radix-based sorting when recursing, whichever provides better worst-case
235 performance. This way worst-case performance is guaranteed to be the
236 better of <span class="emphasis"><em>𝑶(N⋅log2(N))</em></span> comparisons and <span class="emphasis"><em>𝑶(N⋅log2(K/S
237 + S))</em></span> operations where
239 <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
240 <li class="listitem">
241 <span class="emphasis"><em>N</em></span> is the number of elements being sorted,
243 <li class="listitem">
244 <span class="emphasis"><em>K</em></span> is the length in bits of the key, and
246 <li class="listitem">
247 <span class="emphasis"><em>S</em></span> is a constant.
251 This results in substantially improved performance for large <span class="emphasis"><em>N</em></span>;
252 <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/integer__idm46048203222928.html" title="Function template integer_sort">integer_sort</a></code></code>
253 tends to be 50% to 2X faster than <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>,
254 while <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/float_so_idm46048204416688.html" title="Function template float_sort">float_sort</a></code></code>
255 and _string_sort are roughly 2X faster than <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>.
258 Performance graphs are provided for <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/integer__idm46048203222928.html" title="Function template integer_sort">integer_sort</a></code></code>,
259 <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/float_so_idm46048204416688.html" title="Function template float_sort">float_sort</a></code></code>,
260 and <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>
261 in their description.
264 Runtime Performance comparisons and graphs were made on a Core 2 Duo
265 laptop running Windows Vista 64 with MSVC 8.0, and an old G4 laptop running
266 Mac OSX with gcc. <a href="http://www.boost.org/build/doc/html/" target="_top">Boost
267 bjam/b2</a> was used to control compilation.
270 Direct performance comparisons on a newer x86 system running Ubuntu,
271 with the fallback to <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
272 at lower input sizes disabled are below.
274 <div class="note"><table border="0" summary="Note">
276 <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../../doc/src/images/note.png"></td>
277 <th align="left">Note</th>
279 <tr><td align="left" valign="top"><p>
280 The fallback to <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
281 for smaller input sizes prevents the worse performance seen on the
282 left sides of the first two graphs.
286 <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/integer__idm46048203222928.html" title="Function template integer_sort">integer_sort</a></code></code>
287 starts to become faster than <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
288 at about 1000 integers (4000 bytes), and <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>
289 becomes faster than <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
290 at slightly fewer bytes (as few as 30 strings).
292 <div class="note"><table border="0" summary="Note">
294 <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../../doc/src/images/note.png"></td>
295 <th align="left">Note</th>
297 <tr><td align="left" valign="top"><p>
298 The 4-threaded graph has 4 threads doing <span class="bold"><strong>separate
299 sorts simultaneously</strong></span> (not splitting up a single sort) as
300 a test for thread cache collision and other multi-threaded performance
305 <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/float_so_idm46048204416688.html" title="Function template float_sort">float_sort</a></code></code>
306 times are very similar to <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/integer__idm46048203222928.html" title="Function template integer_sort">integer_sort</a></code></code>
310 <span class="inlinemediaobject"><img src="../../../images/single_threaded.png"></span>
311 <span class="inlinemediaobject"><img src="../../../images/4_threaded.png"></span>
312 <span class="inlinemediaobject"><img src="../../../images/entropy.png"></span>
313 <span class="inlinemediaobject"><img src="../../../images/bits_per_byte.png"></span>
316 Histogramming with a fixed maximum number of splits is used because it
317 reduces the number of cache misses, thus improving performance relative
318 to the approach described in detail in the <a href="http://en.wikipedia.org/wiki/Spreadsort" target="_top">original
319 SpreadSort publication</a>.
322 The importance of cache-friendly histogramming is described in <a href="http://www.nik.no/2002/Maus.pdf" target="_top">Arne Maus, Adaptive Left Reflex</a>,
323 though without the worst-case handling described below.
326 The time taken per radix iteration is:
329 <span class="emphasis"><em>𝑶(N)</em></span> iterations over the data
332 <span class="emphasis"><em>𝑶(N)</em></span> integer-type comparisons (even for _float_sort
333 and <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>)
336 <span class="emphasis"><em>𝑶(N)</em></span> swaps
339 <span class="emphasis"><em>𝑶(2<sup>S</sup>)</em></span> bin operations.
342 To obtain <span class="emphasis"><em>𝑶(N)</em></span> worst-case performance per iteration,
343 the restriction <span class="emphasis"><em>S <= log2(N)</em></span> is applied, and
344 <span class="emphasis"><em>𝑶(2<sup>S</sup>)</em></span> becomes <span class="emphasis"><em>𝑶(N)</em></span>. For each
345 such iteration, the number of unsorted bits log2(range) (referred to
346 as <span class="emphasis"><em>K</em></span>) per element is reduced by <span class="emphasis"><em>S</em></span>.
347 As <span class="emphasis"><em>S</em></span> decreases depending upon the amount of elements
348 being sorted, it can drop from a maximum of <span class="emphasis"><em>S<sub>max</sub></em></span>
349 to the minimum of <span class="emphasis"><em>S<sub>min</sub></em></span>.
352 Assumption: <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
353 is assumed to be <span class="emphasis"><em>𝑶(N*log2(N))</em></span>, as <a href="http://en.wikipedia.org/wiki/Introsort" target="_top">introsort</a>
354 exists and is commonly used. (If you have a quibble with this please
355 take it up with the implementor of your <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>;
356 you're welcome to replace the recursive calls to <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
357 to calls to <a href="http://en.wikipedia.org/wiki/Introsort" target="_top">introsort</a>
358 if your <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
359 library call is poorly implemented).
362 <a href="http://en.wikipedia.org/wiki/Introsort" target="_top">Introsort</a>
363 is not included with this algorithm for simplicity and because the implementor
364 of the <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
365 call is assumed to know what they're doing.
368 To maintain a minimum value for <span class="emphasis"><em>S (S<sub>min</sub>)</em></span>, comparison-based
369 sorting has to be used to sort when <span class="emphasis"><em>n <= log2(meanbinsize)</em></span>,
370 where <span class="emphasis"><em>log2(meanbinsize) (lbs)</em></span> is a small constant,
371 usually between 0 and 4, used to minimize bin overhead per element. There
372 is a small corner-case where if <span class="emphasis"><em>K < S<sub>min</sub></em></span> and
373 <span class="emphasis"><em>n >= 2^K</em></span>, then the data can be sorted in a single
374 radix-based iteration with an <span class="emphasis"><em>S = K</em></span> (this bucketsorting
375 special case is by default only applied to <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/float_so_idm46048204416688.html" title="Function template float_sort">float_sort</a></code></code>).
376 So for the final recursion, worst-case performance is:
379 1 radix-based iteration if <span class="emphasis"><em>K <= S<sub>min</sub></em></span>,
382 or <span class="emphasis"><em>S<sub>min</sub> + lbs</em></span> comparison-based iterations if <span class="emphasis"><em>K
383 > S<sub>min</sub></em></span> but <span class="emphasis"><em>n <= 2<sup>(S<sub>min</sub> + lbs)</sup></em></span>.
386 So for the final iteration, worst-case runtime is <span class="emphasis"><em>𝑶(N*(S<sub>min</sub> +
387 lbs))</em></span> but if <span class="emphasis"><em>K > S<sub>min</sub></em></span> and <span class="emphasis"><em>N
388 > 2<sup>(S<sub>min</sub> + lbs)</sup></em></span> then more than 1 radix recursion will be
392 For the second to last iteration, <span class="emphasis"><em>K <= S<sub>min</sub> * 2 + 1</em></span>
393 can be handled, (if the data is divided into <span class="emphasis"><em>2<sup>(S<sub>min</sub> + 1)</sup></em></span>
394 pieces) or if <span class="emphasis"><em>N < 2<sup>(S<sub>min</sub> + lbs + 1)</sup></em></span>, then it is
395 faster to fallback to <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>.
398 In the case of a radix-based sort plus recursion, it will take <span class="emphasis"><em>𝑶(N*(S<sub>min</sub> +
399 lbs)) + 𝑶(N) = 𝑶(N*(S<sub>min</sub> + lbs + 1))</em></span> worst-case time, as <span class="emphasis"><em>K_remaining
400 = K_start - (S<sub>min</sub> + 1)</em></span>, and <span class="emphasis"><em>K_start <= S<sub>min</sub> * 2
404 Alternatively, comparison-based sorting is used if <span class="emphasis"><em>N < 2<sup>(S<sub>min</sub> +
405 lbs + 1)</sup></em></span>, which will take <span class="emphasis"><em>𝑶(N*(S<sub>min</sub> + lbs + 1))</em></span>
409 So either way <span class="emphasis"><em>𝑶(N*(S<sub>min</sub> + lbs + 1))</em></span> is the worst-case
410 time for the second to last iteration, which occurs if <span class="emphasis"><em>K <=
411 S<sub>min</sub> * 2 + </em></span>1 or <span class="emphasis"><em>N < 2<sup>(S<sub>min</sub> + lbs + 1)</sup></em></span>.
414 This continues as long as <span class="emphasis"><em>S<sub>min</sub> <= S <= S<sub>max</sub></em></span>,
415 so that for <span class="emphasis"><em>K_m <= K_(m-1) + S<sub>min</sub> + m</em></span> where <span class="emphasis"><em>m</em></span>
416 is the maximum number of iterations after this one has finished, or where
417 <span class="emphasis"><em>N < 2<sup>(S<sub>min</sub> + lbs + m)</sup></em></span>, then the worst-case runtime
418 is <span class="emphasis"><em>𝑶(N*(S<sub>min</sub> + lbs + m))</em></span>.
421    <span class="emphasis"><em>K_m</em></span> at <span class="emphasis"><em>m <= (S<sub>max</sub> - S<sub>min</sub>)</em></span>
425    <span class="emphasis"><em>K_1 <= (S<sub>min</sub>) + S<sub>min</sub> + 1 <= 2S<sub>min</sub> + 1</em></span>
428    <span class="emphasis"><em>K_2 <= (2S<sub>min</sub> + 1) + S<sub>min</sub> + 2</em></span>
431 as the sum from 0 to <span class="emphasis"><em>m</em></span> is <span class="emphasis"><em>m(m + 1)/2</em></span>
434    <span class="emphasis"><em>K_m <= (m + 1)S<sub>min</sub> + m(m + 1)/2 <= (S<sub>min</sub> + m/2)(m + 1)</em></span>
437 substituting in S<sub>max</sub> - S<sub>min</sub> for m
440    <span class="emphasis"><em>K_(S<sub>max</sub> - S<sub>min</sub>) <= (S<sub>min</sub> + (S<sub>max</sub> - S<sub>min</sub>)/2)*(S<sub>max</sub> - S<sub>min</sub> + 1)</em></span>
443    <span class="emphasis"><em>K_(S<sub>max</sub> - S<sub>min</sub>) <= (S<sub>min</sub> + S<sub>max</sub>) * (S<sub>max</sub> - S<sub>min</sub> + 1)/2</em></span>
446 Since this involves <span class="emphasis"><em>S<sub>max</sub> - S<sub>min</sub> + 1</em></span> iterations, this
447 works out to dividing <span class="emphasis"><em>K</em></span> into an average <span class="emphasis"><em>(S<sub>min</sub> +
448 S<sub>max</sub>)</em></span>/2 pieces per iteration.
451 To finish the problem from this point takes <span class="emphasis"><em>𝑶(N * (S<sub>max</sub> - S<sub>min</sub>))</em></span>
452 for <span class="emphasis"><em>m</em></span> iterations, plus the worst-case of <span class="emphasis"><em>𝑶(N*(S<sub>min</sub> +
453 lbs))</em></span> for the last iteration, for a total of <span class="emphasis"><em>𝑶(N
454 *(S<sub>max</sub> + lbs))</em></span> time.
457 When <span class="emphasis"><em>m > S<sub>max</sub> - S<sub>min</sub></em></span>, the problem is divided into
458 <span class="emphasis"><em>S<sub>max</sub></em></span> pieces per iteration, or <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
459 is called if <span class="emphasis"><em>N < 2^(m + S<sub>min</sub> + lbs)</em></span>. For this
463    <span class="emphasis"><em>K_m <= K_(m - 1) + S<sub>max</sub></em></span>, providing runtime of
466    <span class="emphasis"><em>𝑶(N *((K - K_(S<sub>max</sub> - S<sub>min</sub>))/S<sub>max</sub> + S<sub>max</sub> + lbs))</em></span> if recursive,
469 or <span class="emphasis"><em>𝑶(N * log(2^(m + S<sub>min</sub> + lbs)))</em></span> if comparison-based,
472 which simplifies to <span class="emphasis"><em>𝑶(N * (m + S<sub>min</sub> + lbs))</em></span>, which
473 substitutes to <span class="emphasis"><em>𝑶(N * ((m - (S<sub>max</sub> - S<sub>min</sub>)) + S<sub>max</sub> + lbs))</em></span>,
474 which given that <span class="emphasis"><em>m - (S<sub>max</sub> - S<sub>min</sub>) <= (K - K_(S<sub>max</sub> - S<sub>min</sub>))/S<sub>max</sub></em></span>
475 (otherwise a lesser number of radix-based iterations would be used)
478 also comes out to <span class="emphasis"><em>𝑶(N *((K - K_(S<sub>max</sub> - S<sub>min</sub>))/S<sub>max</sub> + S<sub>max</sub> + lbs))</em></span>.
481 Asymptotically, for large <span class="emphasis"><em>N</em></span> and large <span class="emphasis"><em>K</em></span>,
485    <span class="emphasis"><em>𝑶(N * (K/S<sub>max</sub> + S<sub>max</sub> + lbs))</em></span>,
488 simplifying out the constants related to the <span class="emphasis"><em>S<sub>max</sub> - S<sub>min</sub></em></span>
489 range, providing an additional <span class="emphasis"><em>𝑶(N * (S<sub>max</sub> + lbs))</em></span>
490 runtime on top of the <span class="emphasis"><em>𝑶(N * (K/S))</em></span> performance of
491 LSD <a href="http://en.wikipedia.org/wiki/Radix_sort" target="_top">radix sort</a>,
492 but without the <span class="emphasis"><em>𝑶(N)</em></span> memory overhead. For simplicity,
493 because <span class="emphasis"><em>lbs</em></span> is a small constant (0 can be used,
494 and performs reasonably), it is ignored when summarizing the performance
495 in further discussions. By checking whether comparison-based sorting
496 is better, Spreadsort is also <span class="emphasis"><em>𝑶(N*log(N))</em></span>, whichever
497 is better, and unlike LSD <a href="http://en.wikipedia.org/wiki/Radix_sort" target="_top">radix
498 sort</a>, can perform much better than the worst-case if the data
499 is either evenly distributed or highly clustered.
502 This analysis was for <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/integer__idm46048203222928.html" title="Function template integer_sort">integer_sort</a></code></code>
503 and <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/float_so_idm46048204416688.html" title="Function template float_sort">float_sort</a></code></code>.
504 <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>
505 differs in that <span class="emphasis"><em>S<sub>min</sub> = S<sub>max</sub> = sizeof(Char_type) * 8</em></span>,
506 <span class="emphasis"><em>lbs</em></span> is 0, and that <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>'s
507 comparison is not a constant-time operation, so strictly speaking <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/string_s_idm46048202990736.html" title="Function template string_sort">string_sort</a></code></code>
511    <span class="emphasis"><em>𝑶(N * (K/S<sub>max</sub> + (S<sub>max</sub> comparisons)))</em></span>.
514 Worst-case, this ends up being <span class="emphasis"><em>𝑶(N * K)</em></span> (where <span class="emphasis"><em>K</em></span>
515 is the mean string length in bytes), as described for <a href="http://en.wikipedia.org/wiki/American_flag_sort" target="_top">American
516 flag sort</a>, which is better than the
519    <span class="emphasis"><em>𝑶(N * K * log(N))</em></span>
522 worst-case for comparison-based sorting.
525 <div class="section">
526 <div class="titlepage"><div><div><h5 class="title">
527 <a name="sort.single_thread.spreadsort.overview.tuning"></a><a class="link" href="spreadsort.html#sort.single_thread.spreadsort.overview.tuning" title="Tuning">Tuning</a>
528 </h5></div></div></div>
530 <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/integer__idm46048203222928.html" title="Function template integer_sort">integer_sort</a></code></code>
531 and <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/float_so_idm46048204416688.html" title="Function template float_sort">float_sort</a></code></code>
532 have tuning constants that control how the radix-sorting portion of those
533 algorithms work. The ideal constant values for <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/integer__idm46048203222928.html" title="Function template integer_sort">integer_sort</a></code></code>
534 and <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/float_so_idm46048204416688.html" title="Function template float_sort">float_sort</a></code></code>
535 vary depending on the platform, compiler, and data being sorted. By far
536 the most important constant is <span class="emphasis"><em>max_splits</em></span>, which
537 defines how many pieces the radix-sorting portion splits the data into
541 The ideal value of <span class="emphasis"><em>max_splits</em></span> depends upon the size
542 of the L1 processor cache, and is between 10 and 13 on many systems.
543 A default value of 11 is used. For mostly-sorted data, a much larger
544 value is better, as swaps (and thus cache misses) are rare, but this
545 hurts runtime severely for unsorted data, so is not recommended.
548 On some x86 systems, when the total number of elements being sorted is
549 small ( less than 1 million or so), the ideal <span class="emphasis"><em>max_splits</em></span>
550 can be substantially larger, such as 17. This is suspected to be because
551 all the data fits into the L2 cache, and misses from L1 cache to L2 cache
552 do not impact performance as severely as misses to main memory. Modifying
553 tuning constants other than <span class="emphasis"><em>max_splits</em></span> is not recommended,
554 as the performance improvement for changing other constants is usually
558 If you can afford to let it run for a day, and have at least 1GB of free
559 memory, the perl command: <code class="computeroutput">./tune.pl -large -tune</code> (UNIX)
560 or <code class="computeroutput">perl tune.pl -large -tune -windows</code> (Windows) can be used
561 to automatically tune these constants. This should be run from the <code class="computeroutput">libs/sort
562 directory</code> inside the boost home directory. This will work to identify
563 the <code class="computeroutput">ideal constants.hpp</code> settings for your system, testing
564 on various distributions in a 20 million element (80MB) file, and additionally
565 verifies that all sorting routines sort correctly across various data
566 distributions. Alternatively, you can test with the file size you're
567 most concerned with <code class="computeroutput">./tune.pl number -tune</code> (UNIX) or <code class="computeroutput">perl
568 tune.pl number -tune -windows</code> (Windows). Substitute the number
569 of elements you want to test with for <code class="computeroutput">number</code>. Otherwise,
570 just use the options it comes with, they're decent. With default settings
571 <code class="computeroutput">./tune.pl -tune</code> (UNIX) <code class="computeroutput">perl tune.pl -tune -windows</code>
572 (Windows), the script will take hours to run (less than a day), but may
573 not pick the correct <span class="emphasis"><em>max_splits</em></span> if it is over 10.
574 Alternatively, you can add the <code class="computeroutput">-small</code> option to make it
575 take just a few minutes, tuning for smaller vector sizes (one hundred
576 thousand elements), but the resulting constants may not be good for large
577 files (see above note about <span class="emphasis"><em>max_splits</em></span> on Windows).
580 The tuning script can also be used just to verify that sorting works
581 correctly on your system, and see how much of a speedup it gets, by omiting
582 the "-tune" option. This runs at the end of tuning runs. Default
583 args will take about an hour to run and give accurate results on decent-sized
584 test vectors. <code class="computeroutput">./tune.pl -small</code> (UNIX) <code class="computeroutput">perl tune.pl
585 -small -windows</code> (Windows) is a faster option, that tests on smaller
586 vectors and isn't as accurate.
589 If any differences are encountered during tuning, please call <code class="computeroutput">tune.pl</code>
590 with <code class="computeroutput">-debug > log_file_name</code>. If the resulting log file
591 contains compilation or permissions issues, it is likely an issue with
592 your setup. If some other type of error is encountered (or result differences),
593 please send them to the library author at spreadsort@gmail.com. Including
594 the zipped <code class="computeroutput">input.txt</code> that was being used is also helpful.
599 <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
600 <td align="left"></td>
601 <td align="right"><div class="copyright-footer">Copyright © 2014-2017 Steven
602 Ross, Francisco Tapia, Orson Peters<p>
603 Distributed under the <a href="http://boost.org/LICENSE_1_0.txt" target="_top">Boost
604 Software License, Version 1.0</a>.
609 <div class="spirit-nav">
610 <a accesskey="p" href="../single_thread.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../single_thread.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="spreadsort/sort_hpp.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>