Imported Upstream version 1.72.0
[platform/upstream/boost.git] / libs / math / doc / statistics / anderson_darling.qbk
1 [/
2 Copyright (c) 2019 Nick Thompson
3 Use, modification and distribution are subject to the
4 Boost Software License, Version 1.0. (See accompanying file
5 LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
6 ]
7
8 [section:anderson_darling The Anderson-Darling Test]
9
10 [heading Synopsis]
11
12 ```
13 #include <boost/math/statistics/anderson_darling.hpp>
14
15 namespace boost{ namespace math { namespace { statistics {
16
17 template<class RandomAccessContainer>
18 auto anderson_darling_normality_statistic(RandomAccessContainer const & v,
19                                           typename RandomAccessContainer::value_type mu = std::numeric_limits<typename RandomAccessContainer::value_type>::quiet_NaN(),
20                                           typename RandomAccessContainer::value_type sd = std::numeric_limits<typename RandomAccessContainer::value_type>::quiet_NaN());
21
22 }}}
23 ```
24
25 [heading Background]
26
27 The Anderson-Darling test for normality asks if a given sequence of numbers are drawn from a normal distribution by computing an integral over the empirical cumulative distribution function.
28 The test statistic /A/[super 2] is given by
29
30 [$../graphs/anderson_darling_definition.svg]
31
32 where /F/[sub /n/] is the empirical cumulative distribution and /F/ is the CDF of the normal distribution.
33
34 The value returned by the routine is /A/[super 2].
35
36 If /A/[super 2]\/n converges to zero as /n/ goes to infinity, then the hypothesis that the data is normally distributed is supported by the test.
37
38 If /A/[super 2]\/n converges to a finite positive value as /n/ goes to infinity, then the hypothesis is not supported by the test.
39
40 An example usage is demonstrated below:
41
42 ```
43 #include <vector>
44 #include <random>
45 #include <iostream>
46 #include <boost/math/statistics/anderson_darling.hpp>
47 using boost::math::statistics::anderson_darling_normality_statistic;
48 std::random_device rd;
49 std::normal_distribution<double> dis(0, 1);
50 std::vector<double> v(8192);
51 for (auto & x : v) { x = dis(rd); }
52 std::sort(v.begin(), v.end());
53 double presumed_mean = 0;
54 double presumed_standard_deviation = 0;
55 double Asq = anderson_darling_normality_statistic(v, presumed_mean, presumed_standard_deviation);
56 std::cout << "A^2/n = " << Asq/v.size() << "\n";
57 5.39e-05 // should be small . . .
58 // Now use an incorrect hypothesis:
59 presumed_mean = 4;
60 Asq = anderson_darling_normality_statistic(v, presumed_mean, presumed_standard_deviation);
61 std::cout << "A^2/n = " << Asq/v.size() << "\n";
62 7.41 // should be somewhat large . . .
63 ```
64
65 The Anderson-Darling normality requires sorted data.
66 If the data are not sorted an exception is thrown.
67
68 If you simply wish to know whether or not data is normally distributed, and not whether it is normally distributed with a presumed mean and variance,
69 then you can call the function without the final two arguments, and the mean and variance will be estimated from the data themselves:
70
71 ```
72 double Asq = anderson_darling_normality_statistic(v);
73 ```
74
75 The following graph demonstrates the convergence of the test statistic.
76 Each data point represents a vector of length /n/ which is filled with normally distributed data.
77 The test statistic is computed over this vector, divided by /n/, and passed to the natural logarithm.
78 This exhibits the (admittedly slow) convergence of the integral to zero when the hypothesis is true.
79
80 [$../graphs/anderson_darling_simulation.svg]
81
82
83 [heading Performance]
84
85 ```
86 ---------------------------------------------------------------
87 Benchmark                                              Time
88 ---------------------------------------------------------------
89 AndersonDarlingNormalityTest<float>/8                224 ns    bytes_per_second=136.509M/s
90 AndersonDarlingNormalityTest<float>/16               435 ns    bytes_per_second=140.254M/s
91 AndersonDarlingNormalityTest<float>/32               898 ns    bytes_per_second=135.995M/s
92 AndersonDarlingNormalityTest<float>/64              1773 ns    bytes_per_second=137.675M/s
93 AndersonDarlingNormalityTest<float>/128             3455 ns    bytes_per_second=141.338M/s
94 AndersonDarlingNormalityTest<float>/256             7001 ns    bytes_per_second=139.488M/s
95 AndersonDarlingNormalityTest<float>/512            13996 ns    bytes_per_second=139.551M/s
96 AndersonDarlingNormalityTest<float>/1024           28129 ns    bytes_per_second=138.868M/s
97 AndersonDarlingNormalityTest<float>/2048           55723 ns    bytes_per_second=140.206M/s
98 AndersonDarlingNormalityTest<float>/4096          112008 ns    bytes_per_second=139.501M/s
99 AndersonDarlingNormalityTest<float>/8192          224643 ns    bytes_per_second=139.11M/s
100 AndersonDarlingNormalityTest<float>/16384         450320 ns    bytes_per_second=138.791M/s
101 AndersonDarlingNormalityTest<float>/32768         896409 ns    bytes_per_second=139.45M/s
102 AndersonDarlingNormalityTest<float>/65536        1797800 ns    bytes_per_second=139.058M/s
103 AndersonDarlingNormalityTest<float>/131072       3604995 ns    bytes_per_second=138.698M/s
104 AndersonDarlingNormalityTest<float>/262144       7235625 ns    bytes_per_second=138.207M/s
105 AndersonDarlingNormalityTest<float>/524288      14502815 ns    bytes_per_second=137.904M/s
106 AndersonDarlingNormalityTest<float>/1048576     29058087 ns    bytes_per_second=137.659M/s
107 AndersonDarlingNormalityTest<float>/2097152     58470439 ns    bytes_per_second=136.824M/s
108 AndersonDarlingNormalityTest<float>/4194304    117476365 ns    bytes_per_second=136.201M/s
109 AndersonDarlingNormalityTest<float>/8388608    239887895 ns    bytes_per_second=133.397M/s
110 AndersonDarlingNormalityTest<float>/16777216   488787211 ns    bytes_per_second=130.94M/s
111 AndersonDarlingNormalityTest<float>_BigO           28.96 N         28.96 N
112 AndersonDarlingNormalityTest<double>/8               470 ns    bytes_per_second=129.733M/s
113 AndersonDarlingNormalityTest<double>/16              911 ns    bytes_per_second=133.989M/s
114 AndersonDarlingNormalityTest<double>/32             1773 ns    bytes_per_second=137.723M/s
115 AndersonDarlingNormalityTest<double>/64             3368 ns    bytes_per_second=144.966M/s
116 AndersonDarlingNormalityTest<double>/128            6627 ns    bytes_per_second=147.357M/s
117 AndersonDarlingNormalityTest<double>/256           12458 ns    bytes_per_second=156.777M/s
118 AndersonDarlingNormalityTest<double>/512           23060 ns    bytes_per_second=169.395M/s
119 AndersonDarlingNormalityTest<double>/1024          44529 ns    bytes_per_second=175.45M/s
120 AndersonDarlingNormalityTest<double>/2048          88735 ns    bytes_per_second=176.087M/s
121 AndersonDarlingNormalityTest<double>/4096         175583 ns    bytes_per_second=177.978M/s
122 AndersonDarlingNormalityTest<double>/8192         348042 ns    bytes_per_second=179.577M/s
123 AndersonDarlingNormalityTest<double>/16384        701439 ns    bytes_per_second=178.206M/s
124 AndersonDarlingNormalityTest<double>/32768       1394597 ns    bytes_per_second=179.262M/s
125 AndersonDarlingNormalityTest<double>/65536       2777943 ns    bytes_per_second=179.994M/s
126 AndersonDarlingNormalityTest<double>/131072      5571455 ns    bytes_per_second=179.487M/s
127 AndersonDarlingNormalityTest<double>/262144     11161456 ns    bytes_per_second=179.193M/s
128 AndersonDarlingNormalityTest<double>/524288     22048950 ns    bytes_per_second=181.417M/s
129 AndersonDarlingNormalityTest<double>/1048576    44094409 ns    bytes_per_second=181.429M/s
130 AndersonDarlingNormalityTest<double>/2097152    88300185 ns    bytes_per_second=181.199M/s
131 AndersonDarlingNormalityTest<double>/4194304   176140378 ns    bytes_per_second=181.678M/s
132 AndersonDarlingNormalityTest<double>/8388608   352102955 ns    bytes_per_second=181.769M/s
133 AndersonDarlingNormalityTest<double>/16777216  706160246 ns    bytes_per_second=181.267M/s
134 AndersonDarlingNormalityTest<double>_BigO          42.06 N
135 ```
136
137 [heading Caveats]
138
139 Some authors, including [@https://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm NIST], give the following definition of the Anderson-Darling test statistic:
140
141 [$../graphs/alternative_anderson_darling_definition.svg]
142
143 This is an approximation to the quadrature sum we use as our definition.
144 Boost.Math /does not compute this quantity/.
145 (However, with a sufficiently large amount of data the two definitions seem to agree to two digits, so the importance of making a clear distinction between the two is unclear.)
146 Our computation of the Anderson-Darling test statistic agrees with Mathematica.
147
148 [endsect]
149 [/section:anderson_darling]