libs/math/doc/statistics/signal_statistics.qbk

   1 [/
   2   Copyright 2018 Nick Thompson
   3
   4   Distributed under the Boost Software License, Version 1.0.
   5   (See accompanying file LICENSE_1_0.txt or copy at
   6   http://www.boost.org/LICENSE_1_0.txt).
   7 ]
   8
   9 [section:signal_statistics Signal Statistics]
  10
  11 [heading Synopsis]
  12
  13 ``
  14 #include <boost/math/statistics/signal_statistics.hpp>
  15
  16 namespace boost::math::statistics {
  17
  18     template<class Container>
  19     auto absolute_gini_coefficient(Container & c);
  20
  21     template<class ForwardIterator>
  22     auto absolute_gini_coefficient(ForwardIterator first, ForwardIterator last);
  23
  24     template<class Container>
  25     auto sample_absolute_gini_coefficient(Container & c);
  26
  27     template<class ForwardIterator>
  28     auto sample_absolute_gini_coefficient(ForwardIterator first, ForwardIterator last);
  29
  30     template<class Container>
  31     auto hoyer_sparsity(Container const & c);
  32
  33     template<class ForwardIterator>
  34     auto hoyer_sparsity(ForwardIterator first, ForwardIterator last);
  35
  36     template<class Container>
  37     auto oracle_snr(Container const & signal, Container const & noisy_signal);
  38
  39     template<class Container>
  40     auto oracle_snr_db(Container const & signal, Container const & noisy_signal);
  41
  42     template<class ForwardIterator>
  43     auto m2m4_snr_estimator(ForwardIterator first, ForwardIterator last, decltype(*first) estimated_signal_kurtosis=1, decltype(*first) estimated_noise_kurtosis=3);
  44
  45     template<class Container>
  46     auto m2m4_snr_estimator(Container const & noisy_signal, typename Container::value_type estimated_signal_kurtosis=1, typename Container::value_type estimate_noise_kurtosis=3);
  47
  48     template<class ForwardIterator>
  49     auto m2m4_snr_estimator_db(ForwardIterator first, ForwardIterator last, decltype(*first) estimated_signal_kurtosis=1, decltype(*first) estimated_noise_kurtosis=3);
  50
  51     template<class Container>
  52     auto m2m4_snr_estimator_db(Container const & noisy_signal,typename Container::value_type estimated_signal_kurtosis=1, typename Container::value_type estimate_noise_kurtosis=3);
  53
  54 }
  55 ``
  56
  57 [heading Description]
  58
  59 The file `boost/math/statistics/signal_statistics.hpp` is a set of facilities for computing quantities commonly used in signal analysis.
  60
  61 Our examples use `std::vector<double>` to hold the data, but this not required.
  62 In general, you can store your data in an Eigen array, and Armadillo vector, `std::array`, and for many of the routines, a `std::forward_list`.
  63 These routines are usable in float, double, long double, and Boost.Multiprecision precision, as well as their complex extensions whenever the computation is well-defined.
  64
  65
  66 [heading Absolute Gini Coefficient]
  67
  68 The Gini coefficient, first used to measure wealth inequality, is also one of the best measures of the sparsity of an expansion in a basis.
  69 A sparse expansion has most of its norm concentrated in just a few coefficients, making the connection with wealth inequality obvious.
  70 See [@https://arxiv.org/pdf/0811.4706.pdf Hurley and Rickard] for details.
  71 However, for measuring sparsity, the phase of the numbers is irrelevant, so we provide the `absolute_gini_coefficient`:
  72
  73     using boost::math::statistics::sample_absolute_gini_coefficient;
  74     using boost::math::statistics::absolute_gini_coefficient;
  75     std::vector<std::complex<double>> v{{0,1}, {0,0}, {0,0}, {0,0}};
  76     double abs_gini = sample_absolute_gini_coefficient(v);
  77     // now abs_gini = 1; maximally unequal
  78
  79     std::vector<std::complex<double>> w{{0,1}, {1,0}, {0,-1}, {-1,0}};
  80     abs_gini = absolute_gini_coefficient(w);
  81     // now abs_gini = 0; every element of the vector has equal magnitude
  82
  83     std::vector<double> u{-1, 1, -1};
  84     abs_gini = absolute_gini_coefficient(u);
  85     // now abs_gini = 0
  86     // Alternative call useful for computing over subset of the input:
  87     abs_gini = absolute_gini_coefficient(u.begin(), u.begin() + 1);
  88
  89
  90 The sample Gini coefficient returns unity for a vector which has only one nonzero coefficient.
  91 The population Gini coefficient of a vector with one non-zero element is dependent on the length of the input.
  92
  93 The sample Gini coefficient lacks one desirable property of the population Gini coefficient,
  94 namely that "cloning" a vector has the same Gini coefficient; though cloning holds to very high accuracy with the sample Gini coefficient and can easily be recovered by a rescaling.
  95
  96 If sorting the input data is too much expense for a sparsity measure (is it going to be perfect anyway?),
  97 consider calculating the Hoyer sparsity instead.
  98
  99 [heading Hoyer Sparsity]
 100
 101 The Hoyer sparsity measures a normalized ratio of the \u2113[super 1] and \u2113[super 2] norms.
 102 As the name suggests, it is used to measure the sparsity of an expansion in some basis.
 103
 104 The Hoyer sparsity computes ([radic]/N/ - \u2113[super 1](v)/\u2113[super 2](v))/([radic]N -1).
 105 For details, see [@http://www.jmlr.org/papers/volume5/hoyer04a/hoyer04a.pdf Hoyer] as well as [@https://arxiv.org/pdf/0811.4706.pdf Hurley and Rickard].
 106
 107 A few special cases will serve to clarify the intended use:
 108 If /v/ has only one nonzero coefficient, the Hoyer sparsity attains its maxima of 1.
 109 If the coefficients of /v/ all have the same magnitude, then the Hoyer sparsity attains its minima of zero.
 110 If the elements of /v/ are uniformly distributed on an interval [0, /b/], then the Hoyer sparsity is approximately 0.133.
 111
 112
 113 Usage:
 114
 115     std::vector<Real> v{1,0,0};
 116     Real hs = boost::math::statistics::hoyer_sparsity(v);
 117     // hs = 1
 118     std::vector<Real> v{1,-1,1};
 119     Real hs = boost::math::statistics::hoyer_sparsity(v.begin(), v.end());
 120     // hs = 0
 121
 122 The container must be forward iterable and the contents are not modified.
 123 Accepts real, complex, and integer inputs.
 124 If the input is an integral type, the output is a double precision float.
 125
 126
 127 [heading Oracle Signal-to-noise ratio]
 128
 129 The function `oracle_snr` computes the ratio \u2016 /s/ \u2016[sub 2][super 2] / \u2016 /s/ - /x/ \u2016[sub 2][super 2], where /s/ is signal and /x/ is a noisy signal.
 130 The function `oracle_snr_db` computes 10`log`[sub 10](\u2016 /s/ \u2016[super 2] / \u2016 /s/ - /x/ \u2016[super 2]).
 131 The functions are so named because in general, one does not know how to decompose a real signal /x/ into /s/ + /w/ and as such /s/ is regarded as oracle information.
 132 Hence this function is mainly useful for unit testing other SNR estimators.
 133
 134 Usage:
 135
 136     std::vector<double> signal(500, 3.2);
 137     std::vector<double> noisy_signal(500);
 138     // fill 'noisy_signal' signal + noise
 139     double snr_db = boost::math::statistics::oracle_snr_db(signal, noisy_signal);
 140     double snr = boost::math::statistics::oracle_snr(signal, noisy_signal);
 141
 142 The input can be real, complex, or integral.
 143 Integral inputs produce double precision floating point outputs.
 144 The input data is not modified and must satisfy the requirements of a `RandomAccessContainer`.
 145
 146 [heading /M/[sub 2]/M/[sub 4] SNR Estimation]
 147
 148 Estimates the SNR of a noisy signal via the /M/[sub 2]/M/[sub 4] method.
 149 See [@https://doi.org/10.1109/26.871393  Pauluzzi and N.C. Beaulieu] and [@https://doi.org/10.1109/ISIT.1994.394869 Matzner and Englberger] for details.
 150
 151     std::vector<double> noisy_signal(512);
 152     // fill noisy_signal with data contaminated by Gaussian white noise:
 153     double est_snr_db = boost::math::statistics::m2m4_snr_estimator_db(noisy_signal);
 154
 155 The /M/[sub 2]/M/[sub 4] SNR estimator is an "in-service" estimator, meaning that the estimate is made using the noisy, data-bearing signal, and does not require a background estimate.
 156 This estimator has been found to be work best between roughly -3 and 15db, tending to overestimate the noise below -3db, and underestimate the noise above 15db.
 157 See [@https://www.mdpi.com/2078-2489/8/3/75/pdf Xue et al] for details.
 158
 159 The /M/[sub 2]/M/[sub 4] SNR estimator, by default, assumes that the kurtosis of the signal is 1 and the kurtosis of the noise is 3, the latter corresponding to Gaussian noise.
 160 These parameters, however, can be overridden:
 161
 162     std::vector<double> noisy_signal(512);
 163     // fill noisy_signal with the data:
 164     double signal_kurtosis = 1.5;
 165     // Noise is assumed to follow Laplace distribution, which has kurtosis of 6:
 166     double noise_kurtosis = 6;
 167     double est_snr = boost::math::statistics::m2m4_snr_estimator_db(noisy_signal, signal_kurtosis, noise_kurtosis);
 168
 169 Now, technically the method is a "blind SNR estimator", meaning that the no /a-priori/ information about the signal is required to use the method.
 170 However, the performance of the method is /vastly/ better if you can come up with a better estimate of the signal and noise kurtosis.
 171 How can we do this? Suppose we know that the SNR is much greater than 1.
 172 Then we can estimate the signal kurtosis simply by using the noisy signal kurtosis.
 173 If the SNR is much less than one, this method breaks down as the noisy signal kurtosis will tend to the noise kurtosis-though in this limit we have an excellent estimator of the noise kurtosis!
 174 In addition, if you have a model of what your signal should look like, you can precompute the signal kurtosis.
 175 For example, sinusoids have a kurtosis of 1.5.
 176 See [@http://www.jcomputers.us/vol8/jcp0808-21.pdf here] for a study which uses estimates of this sort to improve the performance of the /M/[sub 2]/M/[sub 4] estimator.
 177
 178
 179 /Nota bene/: The traditional definition of SNR is /not/ mean invariant.
 180 By this we mean that if a constant is added to every sample of a signal, the SNR is changed.
 181 For example, adding DC bias to a signal changes its SNR.
 182 For most use cases, this is really not what you intend; for example a signal consisting of zeros plus Gaussian noise has an SNR of zero,
 183 whereas a signal with a constant DC bias and random Gaussian noise might have a very large SNR.
 184
 185 The /M/[sub 2]/M/[sub 4] SNR estimator is computed from mean-invariant quantities,
 186 and hence it should really be compared to the mean-invariant SNR.
 187
 188 /Nota bene/: This computation requires the solution of a system of quadratic equations involving the noise kurtosis, the signal kurtosis, and the second and fourth moments of the data.
 189 There is no guarantee that a solution of this system exists for all value of these parameters, in fact nonexistence can easily be demonstrated for certain data.
 190 If there is no solution to the system, then failure is communicated by returning NaNs.
 191 This happens distressingly often; if a user is aware of any blind SNR estimators which do not suffer from this drawback, please open a github ticket and let us know.
 192
 193 The author has not managed to fully characterize the conditions under which a real solution with /S > 0/ and /N >0/ exists.
 194 However, a very intuitive example demonstrates why nonexistence can occur.
 195 Suppose the signal and noise kurtosis are equal.
 196 Then the method has no way to distinguish between the signal and the noise, and the solution is non-unique.
 197
 198
 199 [heading References]
 200
 201 * Mallat, Stephane. ['A wavelet tour of signal processing: the sparse way.] Academic press, 2008.
 202 * Hurley, Niall, and Scott Rickard. ['Comparing measures of sparsity.] IEEE Transactions on Information Theory 55.10 (2009): 4723-4741.
 203 * Jensen, Arne, and Anders la Cour-Harbo. ['Ripples in mathematics: the discrete wavelet transform.] Springer Science & Business Media, 2001.
 204 * D. R. Pauluzzi and N. C. Beaulieu, ['A comparison of SNR estimation techniques for the AWGN channel,] IEEE Trans. Communications, Vol. 48, No. 10, pp. 1681-1691, 2000.
 205 * Hoyer, Patrik O. ['Non-negative matrix factorization with sparseness constraints.], Journal of machine learning research 5.Nov (2004): 1457-1469.
 206
 207 [endsect]
 208 [/section:signal_statistics Signal Statistics]