1 [section:geometric_dist Geometric Distribution]
3 ``#include <boost/math/distributions/geometric.hpp>``
5 namespace boost{ namespace math{
7 template <class RealType = double,
8 class ``__Policy`` = ``__policy_class`` >
9 class geometric_distribution;
11 typedef geometric_distribution<> geometric;
13 template <class RealType, class ``__Policy``>
14 class geometric_distribution
17 typedef RealType value_type;
18 typedef Policy policy_type;
19 // Constructor from success_fraction:
20 geometric_distribution(RealType p);
22 // Parameter accessors:
23 RealType success_fraction() const;
24 RealType successes() const;
26 // Bounds on success fraction:
27 static RealType find_lower_bound_on_p(
30 RealType probability); // alpha
31 static RealType find_upper_bound_on_p(
34 RealType probability); // alpha
36 // Estimate min/max number of trials:
37 static RealType find_minimum_number_of_trials(
38 RealType k, // Number of failures.
39 RealType p, // Success fraction.
40 RealType probability); // Probability threshold alpha.
41 static RealType find_maximum_number_of_trials(
42 RealType k, // Number of failures.
43 RealType p, // Success fraction.
44 RealType probability); // Probability threshold alpha.
49 The class type `geometric_distribution` represents a
50 [@http://en.wikipedia.org/wiki/geometric_distribution geometric distribution]:
51 it is used when there are exactly two mutually exclusive outcomes of a
52 [@http://en.wikipedia.org/wiki/Bernoulli_trial Bernoulli trial]:
53 these outcomes are labelled "success" and "failure".
55 For [@http://en.wikipedia.org/wiki/Bernoulli_trial Bernoulli trials]
56 each with success fraction /p/, the geometric distribution gives
57 the probability of observing /k/ trials (failures, events, occurrences, or arrivals)
58 before the first success.
60 [note For this implementation, the set of trials *includes zero*
61 (unlike another definition where the set of trials starts at one, sometimes named /shifted/).]
62 The geometric distribution assumes that success_fraction /p/ is fixed for all /k/ trials.
64 The probability that there are /k/ failures before the first success
66 [expression Pr(Y=/k/) = (1-/p/)[super /k/] /p/]
68 For example, when throwing a 6-face dice the success probability /p/ = 1/6 = 0.1666[recur].
69 Throwing repeatedly until a /three/ appears,
70 the probability distribution of the number of times /not-a-three/ is thrown is geometric.
72 Geometric distribution has the Probability Density Function PDF:
74 [expression (1-/p/)[super /k/] /p/]
76 The following graph illustrates how the PDF and CDF vary for three examples
77 of the success fraction /p/,
78 (when considering the geometric distribution as a continuous function),
80 [graph geometric_pdf_2]
82 [graph geometric_cdf_2]
86 [graph geometric_pdf_discrete]
88 [graph geometric_cdf_discrete]
91 [h4 Related Distributions]
93 The geometric distribution is a special case of
94 the __negative_binomial_distrib with successes parameter /r/ = 1,
95 so only one first and only success is required : thus by definition
96 __spaces `geometric(p) == negative_binomial(1, p)`
98 negative_binomial_distribution(RealType r, RealType success_fraction);
99 negative_binomial nb(1, success_fraction);
100 geometric g(success_fraction);
101 ASSERT(pdf(nb, 1) == pdf(g, 1));
103 This implementation uses real numbers for the computation throughout
104 (because it uses the *real-valued* power and exponential functions).
105 So to obtain a conventional strictly-discrete geometric distribution
106 you must ensure that an integer value is provided for the number of trials
107 (random variable) /k/,
108 and take integer values (floor or ceil functions) from functions that return
109 a number of successes.
111 [discrete_quantile_warning geometric]
113 [h4 Member Functions]
117 geometric_distribution(RealType p);
119 Constructor: /p/ or success_fraction is the probability of success of a single trial.
121 Requires: `0 <= p <= 1`.
125 RealType success_fraction() const; // successes / trials (0 <= p <= 1)
127 Returns the success_fraction parameter /p/ from which this distribution was constructed.
129 RealType successes() const; // required successes always one,
130 // included for compatibility with negative binomial distribution
131 // with successes r == 1.
135 The following functions are equivalent to those provided for the negative binomial,
136 with successes = 1, but are provided here for completeness.
138 The best method of calculation for the following functions is disputed:
139 see __binomial_distrib and __negative_binomial_distrib for more discussion.
141 [h5 Lower Bound on success_fraction Parameter ['p]]
143 static RealType find_lower_bound_on_p(
145 RealType probability) // (0 <= alpha <= 1), 0.05 equivalent to 95% confidence.
147 Returns a *lower bound* on the success fraction:
150 [[failures][The total number of failures before the 1st success.]]
151 [[alpha][The largest acceptable probability that the true value of
152 the success fraction is [*less than] the value returned.]]
155 For example, if you observe /k/ failures from /n/ trials
156 the best estimate for the success fraction is simply 1/['n], but if you
157 want to be 95% sure that the true value is [*greater than] some value,
160 p``[sub min]`` = geometric_distribution<RealType>::
161 find_lower_bound_on_p(failures, 0.05);
163 [link math_toolkit.stat_tut.weg.neg_binom_eg.neg_binom_conf See negative_binomial confidence interval example.]
165 This function uses the Clopper-Pearson method of computing the lower bound on the
166 success fraction, whilst many texts refer to this method as giving an "exact"
167 result in practice it produces an interval that guarantees ['at least] the
168 coverage required, and may produce pessimistic estimates for some combinations
169 of /failures/ and /successes/. See:
171 [@http://www.ucs.louisiana.edu/~kxk4695/Discrete_new.pdf
172 Yong Cai and K. Krishnamoorthy, A Simple Improved Inferential Method for Some Discrete Distributions.
173 Computational statistics and data analysis, 2005, vol. 48, no3, 605-621].
175 [h5 Upper Bound on success_fraction Parameter p]
177 static RealType find_upper_bound_on_p(
179 RealType alpha); // (0 <= alpha <= 1), 0.05 equivalent to 95% confidence.
181 Returns an *upper bound* on the success fraction:
184 [[trials][The total number of trials conducted.]]
185 [[alpha][The largest acceptable probability that the true value of
186 the success fraction is [*greater than] the value returned.]]
189 For example, if you observe /k/ successes from /n/ trials the
190 best estimate for the success fraction is simply ['k/n], but if you
191 want to be 95% sure that the true value is [*less than] some value,
194 p``[sub max]`` = geometric_distribution<RealType>::find_upper_bound_on_p(
197 [link math_toolkit.stat_tut.weg.neg_binom_eg.neg_binom_conf See negative binomial confidence interval example.]
199 This function uses the Clopper-Pearson method of computing the lower bound on the
200 success fraction, whilst many texts refer to this method as giving an "exact"
201 result in practice it produces an interval that guarantees ['at least] the
202 coverage required, and may produce pessimistic estimates for some combinations
203 of /failures/ and /successes/. See:
205 [@http://www.ucs.louisiana.edu/~kxk4695/Discrete_new.pdf
206 Yong Cai and K. Krishnamoorthy, A Simple Improved Inferential Method for Some Discrete Distributions.
207 Computational statistics and data analysis, 2005, vol. 48, no3, 605-621].
209 [h5 Estimating Number of Trials to Ensure at Least a Certain Number of Failures]
211 static RealType find_minimum_number_of_trials(
212 RealType k, // number of failures.
213 RealType p, // success fraction.
214 RealType alpha); // probability threshold (0.05 equivalent to 95%).
216 This functions estimates the number of trials required to achieve a certain
217 probability that [*more than ['k] failures will be observed].
220 [[k][The target number of failures to be observed.]]
221 [[p][The probability of ['success] for each trial.]]
222 [[alpha][The maximum acceptable ['risk] that only ['k] failures or fewer will be observed.]]
227 geometric_distribution<RealType>::find_minimum_number_of_trials(10, 0.5, 0.05);
229 Returns the smallest number of trials we must conduct to be 95% (1-0.05) sure
230 of seeing 10 failures that occur with frequency one half.
232 [link math_toolkit.stat_tut.weg.neg_binom_eg.neg_binom_size_eg Worked Example.]
234 This function uses numeric inversion of the geometric distribution
235 to obtain the result: another interpretation of the result is that it finds
236 the number of trials (failures) that will lead to an /alpha/ probability
237 of observing /k/ failures or fewer.
239 [h5 Estimating Number of Trials to Ensure a Maximum Number of Failures or Less]
241 static RealType find_maximum_number_of_trials(
242 RealType k, // number of failures.
243 RealType p, // success fraction.
244 RealType alpha); // probability threshold (0.05 equivalent to 95%).
246 This functions estimates the maximum number of trials we can conduct and achieve
247 a certain probability that [*k failures or fewer will be observed].
250 [[k][The maximum number of failures to be observed.]]
251 [[p][The probability of ['success] for each trial.]]
252 [[alpha][The maximum acceptable ['risk] that more than ['k] failures will be observed.]]
257 geometric_distribution<RealType>::find_maximum_number_of_trials(0, 1.0-1.0/1000000, 0.05);
259 Returns the largest number of trials we can conduct and still be 95% sure
260 of seeing no failures that occur with frequency one in one million.
262 This function uses numeric inversion of the geometric distribution
263 to obtain the result: another interpretation of the result, is that it finds
264 the number of trials that will lead to an /alpha/ probability
265 of observing more than k failures.
267 [h4 Non-member Accessors]
269 All the [link math_toolkit.dist_ref.nmp usual non-member accessor functions]
270 that are generic to all distributions are supported: __usual_accessors.
272 However it's worth taking a moment to define what these actually mean in
273 the context of this distribution:
275 [table Meaning of the non-member accessors.
276 [[Function][Meaning]]
278 [The probability of obtaining [*exactly k failures] from /k/ trials
279 with success fraction p. For example:
281 ``pdf(geometric(p), k)``]]
283 [The probability of obtaining [*k failures or fewer] from /k/ trials
284 with success fraction p and success on the last trial. For example:
286 ``cdf(geometric(p), k)``]]
288 [The probability of obtaining [*more than k failures] from /k/ trials
289 with success fraction p and success on the last trial. For example:
291 ``cdf(complement(geometric(p), k))``]]
293 [The [*greatest] number of failures /k/ expected to be observed from /k/ trials
294 with success fraction /p/, at probability /P/. Note that the value returned
295 is a real-number, and not an integer. Depending on the use case you may
296 want to take either the floor or ceiling of the real result. For example:
297 ``quantile(geometric(p), P)``]]
299 [The [*smallest] number of failures /k/ expected to be observed from /k/ trials
300 with success fraction /p/, at probability /P/. Note that the value returned
301 is a real-number, and not an integer. Depending on the use case you may
302 want to take either the floor or ceiling of the real result. For example:
303 ``quantile(complement(geometric(p), P))``]]
308 This distribution is implemented using the pow and exp functions, so most results
309 are accurate within a few epsilon for the RealType.
310 For extreme values of `double` /p/, for example 0.9999999999,
311 accuracy can fall significantly, for example to 10 decimal digits (from 16).
315 In the following table, /p/ is the probability that any one trial will
316 be successful (the success fraction), /k/ is the number of failures,
317 /p/ is the probability and /q = 1-p/,
318 /x/ is the given probability to estimate
319 the expected number of failures using the quantile.
322 [[Function][Implementation Notes]]
323 [[pdf][pdf = p * pow(q, k)]]
324 [[cdf][cdf = 1 - q[super k=1]]]
325 [[cdf complement][exp(log1p(-p) * (k+1))]]
326 [[quantile][k = log1p(-x) / log1p(-p) -1]]
327 [[quantile from the complement][k = log(x) / log1p(-p) -1]]
329 [[variance][(1-p)/p[sup2]]]
331 [[skewness][(2-p)/[sqrt]q]]
332 [[kurtosis][9+p[sup2]/q]]
333 [[kurtosis excess][6 +p[sup2]/q]]
334 [[parameter estimation member functions][See __negative_binomial_distrib]]
335 [[`find_lower_bound_on_p`][See __negative_binomial_distrib]]
336 [[`find_upper_bound_on_p`][See __negative_binomial_distrib]]
337 [[`find_minimum_number_of_trials`][See __negative_binomial_distrib]]
338 [[`find_maximum_number_of_trials`][See __negative_binomial_distrib]]
341 [endsect] [/section:geometric_dist geometric]
344 Copyright 2010 John Maddock and Paul A. Bristow.
345 Distributed under the Boost Software License, Version 1.0.
346 (See accompanying file LICENSE_1_0.txt or copy at
347 http://www.boost.org/LICENSE_1_0.txt).