3 <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
4 <title>Calculating confidence intervals on the mean with the Students-t distribution</title>
5 <link rel="stylesheet" href="../../../../math.css" type="text/css">
6 <meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
7 <link rel="home" href="../../../../index.html" title="Math Toolkit 2.11.0">
8 <link rel="up" href="../st_eg.html" title="Student's t Distribution Examples">
9 <link rel="prev" href="../st_eg.html" title="Student's t Distribution Examples">
10 <link rel="next" href="tut_mean_test.html" title='Testing a sample mean for difference from a "true" mean'>
12 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
13 <table cellpadding="2" width="100%"><tr>
14 <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../../../boost.png"></td>
15 <td align="center"><a href="../../../../../../../../index.html">Home</a></td>
16 <td align="center"><a href="../../../../../../../../libs/libraries.htm">Libraries</a></td>
17 <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
18 <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
19 <td align="center"><a href="../../../../../../../../more/index.htm">More</a></td>
22 <div class="spirit-nav">
23 <a accesskey="p" href="../st_eg.html"><img src="../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../st_eg.html"><img src="../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../index.html"><img src="../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="tut_mean_test.html"><img src="../../../../../../../../doc/src/images/next.png" alt="Next"></a>
26 <div class="titlepage"><div><div><h5 class="title">
27 <a name="math_toolkit.stat_tut.weg.st_eg.tut_mean_intervals"></a><a class="link" href="tut_mean_intervals.html" title="Calculating confidence intervals on the mean with the Students-t distribution">Calculating
28 confidence intervals on the mean with the Students-t distribution</a>
29 </h5></div></div></div>
31 Let's say you have a sample mean, you may wish to know what confidence
32 intervals you can place on that mean. Colloquially: "I want an interval
33 that I can be P% sure contains the true mean". (On a technical point,
34 note that the interval either contains the true mean or it does not:
35 the meaning of the confidence level is subtly different from this colloquialism.
36 More background information can be found on the <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm" target="_top">NIST
40 The formula for the interval can be expressed as:
42 <div class="blockquote"><blockquote class="blockquote"><p>
43 <span class="inlinemediaobject"><img src="../../../../../equations/dist_tutorial4.svg"></span>
45 </p></blockquote></div>
47 Where, <span class="emphasis"><em>Y<sub>s</sub></em></span> is the sample mean, <span class="emphasis"><em>s</em></span>
48 is the sample standard deviation, <span class="emphasis"><em>N</em></span> is the sample
49 size, /α/ is the desired significance level and <span class="emphasis"><em>t<sub>(α/2,N-1)</sub></em></span>
50 is the upper critical value of the Students-t distribution with <span class="emphasis"><em>N-1</em></span>
53 <div class="note"><table border="0" summary="Note">
55 <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../../../../doc/src/images/note.png"></td>
56 <th align="left">Note</th>
58 <tr><td align="left" valign="top">
60 The quantity α is the maximum acceptable risk of falsely rejecting the
61 null-hypothesis. The smaller the value of α the greater the strength
65 The confidence level of the test is defined as 1 - α, and often expressed
66 as a percentage. So for example a significance level of 0.05, is equivalent
67 to a 95% confidence level. Refer to <a href="http://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm" target="_top">"What
68 are confidence intervals?"</a> in <a href="http://www.itl.nist.gov/div898/handbook/" target="_top">NIST/SEMATECH
69 e-Handbook of Statistical Methods.</a> for more information.
73 <div class="note"><table border="0" summary="Note">
75 <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../../../../doc/src/images/note.png"></td>
76 <th align="left">Note</th>
78 <tr><td align="left" valign="top"><p>
79 The usual assumptions of <a href="http://en.wikipedia.org/wiki/Independent_and_identically-distributed_random_variables" target="_top">independent
80 and identically distributed (i.i.d.)</a> variables and <a href="http://en.wikipedia.org/wiki/Normal_distribution" target="_top">normal
81 distribution</a> of course apply here, as they do in other examples.
85 From the formula, it should be clear that:
87 <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
89 The width of the confidence interval decreases as the sample size
93 The width increases as the standard deviation increases.
96 The width increases as the <span class="emphasis"><em>confidence level increases</em></span>
97 (0.5 towards 0.99999 - stronger).
100 The width increases as the <span class="emphasis"><em>significance level decreases</em></span>
101 (0.5 towards 0.00000...01 - stronger).
105 The following example code is taken from the example program <a href="../../../../../../example/students_t_single_sample.cpp" target="_top">students_t_single_sample.cpp</a>.
108 We'll begin by defining a procedure to calculate intervals for various
109 confidence levels; the procedure will print these out as a table:
111 <pre class="programlisting"><span class="comment">// Needed includes:</span>
112 <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">math</span><span class="special">/</span><span class="identifier">distributions</span><span class="special">/</span><span class="identifier">students_t</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span>
113 <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span>
114 <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iomanip</span><span class="special">></span>
115 <span class="comment">// Bring everything into global namespace for ease of use:</span>
116 <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">;</span>
117 <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">std</span><span class="special">;</span>
119 <span class="keyword">void</span> <span class="identifier">confidence_limits_on_mean</span><span class="special">(</span>
120 <span class="keyword">double</span> <span class="identifier">Sm</span><span class="special">,</span> <span class="comment">// Sm = Sample Mean.</span>
121 <span class="keyword">double</span> <span class="identifier">Sd</span><span class="special">,</span> <span class="comment">// Sd = Sample Standard Deviation.</span>
122 <span class="keyword">unsigned</span> <span class="identifier">Sn</span><span class="special">)</span> <span class="comment">// Sn = Sample Size.</span>
123 <span class="special">{</span>
124 <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">std</span><span class="special">;</span>
125 <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">;</span>
127 <span class="comment">// Print out general info:</span>
128 <span class="identifier">cout</span> <span class="special"><<</span>
129 <span class="string">"__________________________________\n"</span>
130 <span class="string">"2-Sided Confidence Limits For Mean\n"</span>
131 <span class="string">"__________________________________\n\n"</span><span class="special">;</span>
132 <span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">setprecision</span><span class="special">(</span><span class="number">7</span><span class="special">);</span>
133 <span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">40</span><span class="special">)</span> <span class="special"><<</span> <span class="identifier">left</span> <span class="special"><<</span> <span class="string">"Number of Observations"</span> <span class="special"><<</span> <span class="string">"= "</span> <span class="special"><<</span> <span class="identifier">Sn</span> <span class="special"><<</span> <span class="string">"\n"</span><span class="special">;</span>
134 <span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">40</span><span class="special">)</span> <span class="special"><<</span> <span class="identifier">left</span> <span class="special"><<</span> <span class="string">"Mean"</span> <span class="special"><<</span> <span class="string">"= "</span> <span class="special"><<</span> <span class="identifier">Sm</span> <span class="special"><<</span> <span class="string">"\n"</span><span class="special">;</span>
135 <span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">40</span><span class="special">)</span> <span class="special"><<</span> <span class="identifier">left</span> <span class="special"><<</span> <span class="string">"Standard Deviation"</span> <span class="special"><<</span> <span class="string">"= "</span> <span class="special"><<</span> <span class="identifier">Sd</span> <span class="special"><<</span> <span class="string">"\n"</span><span class="special">;</span>
138 We'll define a table of significance/risk levels for which we'll compute
141 <pre class="programlisting"><span class="keyword">double</span> <span class="identifier">alpha</span><span class="special">[]</span> <span class="special">=</span> <span class="special">{</span> <span class="number">0.5</span><span class="special">,</span> <span class="number">0.25</span><span class="special">,</span> <span class="number">0.1</span><span class="special">,</span> <span class="number">0.05</span><span class="special">,</span> <span class="number">0.01</span><span class="special">,</span> <span class="number">0.001</span><span class="special">,</span> <span class="number">0.0001</span><span class="special">,</span> <span class="number">0.00001</span> <span class="special">};</span>
144 Note that these are the complements of the confidence/probability levels:
145 0.5, 0.75, 0.9 .. 0.99999).
148 Next we'll declare the distribution object we'll need, note that the
149 <span class="emphasis"><em>degrees of freedom</em></span> parameter is the sample size
152 <pre class="programlisting"><span class="identifier">students_t</span> <span class="identifier">dist</span><span class="special">(</span><span class="identifier">Sn</span> <span class="special">-</span> <span class="number">1</span><span class="special">);</span>
155 Most of what follows in the program is pretty printing, so let's focus
156 on the calculation of the interval. First we need the t-statistic, computed
157 using the <span class="emphasis"><em>quantile</em></span> function and our significance
158 level. Note that since the significance levels are the complement of
159 the probability, we have to wrap the arguments in a call to <span class="emphasis"><em>complement(...)</em></span>:
161 <pre class="programlisting"><span class="keyword">double</span> <span class="identifier">T</span> <span class="special">=</span> <span class="identifier">quantile</span><span class="special">(</span><span class="identifier">complement</span><span class="special">(</span><span class="identifier">dist</span><span class="special">,</span> <span class="identifier">alpha</span><span class="special">[</span><span class="identifier">i</span><span class="special">]</span> <span class="special">/</span> <span class="number">2</span><span class="special">));</span>
164 Note that alpha was divided by two, since we'll be calculating both the
165 upper and lower bounds: had we been interested in a single sided interval
166 then we would have omitted this step.
169 Now to complete the picture, we'll get the (one-sided) width of the interval
170 from the t-statistic by multiplying by the standard deviation, and dividing
171 by the square root of the sample size:
173 <pre class="programlisting"><span class="keyword">double</span> <span class="identifier">w</span> <span class="special">=</span> <span class="identifier">T</span> <span class="special">*</span> <span class="identifier">Sd</span> <span class="special">/</span> <span class="identifier">sqrt</span><span class="special">(</span><span class="keyword">double</span><span class="special">(</span><span class="identifier">Sn</span><span class="special">));</span>
176 The two-sided interval is then the sample mean plus and minus this width.
179 And apart from some more pretty-printing that completes the procedure.
182 Let's take a look at some sample output, first using the <a href="http://www.itl.nist.gov/div898/handbook/eda/section4/eda428.htm" target="_top">Heat
183 flow data</a> from the NIST site. The data set was collected by Bob
184 Zarr of NIST in January, 1990 from a heat flow meter calibration and
185 stability analysis. The corresponding dataplot output for this test can
186 be found in <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm" target="_top">section
187 3.5.2</a> of the <a href="http://www.itl.nist.gov/div898/handbook/" target="_top">NIST/SEMATECH
188 e-Handbook of Statistical Methods.</a>.
190 <pre class="programlisting"> __________________________________
191 2-Sided Confidence Limits For Mean
192 __________________________________
194 Number of Observations = 195
196 Standard Deviation = 0.02278881
199 ___________________________________________________________________
200 Confidence T Interval Lower Upper
201 Value (%) Value Width Limit Limit
202 ___________________________________________________________________
203 50.000 0.676 1.103e-003 9.26036 9.26256
204 75.000 1.154 1.883e-003 9.25958 9.26334
205 90.000 1.653 2.697e-003 9.25876 9.26416
206 95.000 1.972 3.219e-003 9.25824 9.26468
207 99.000 2.601 4.245e-003 9.25721 9.26571
208 99.900 3.341 5.453e-003 9.25601 9.26691
209 99.990 3.973 6.484e-003 9.25498 9.26794
210 99.999 4.537 7.404e-003 9.25406 9.26886
213 As you can see the large sample size (195) and small standard deviation
214 (0.023) have combined to give very small intervals, indeed we can be
215 very confident that the true mean is 9.2.
218 For comparison the next example data output is taken from <span class="emphasis"><em>P.K.Hou,
219 O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64. and from Statistics
220 for Analytical Chemistry, 3rd ed. (1994), pp 54-55 J. C. Miller and J.
221 N. Miller, Ellis Horwood ISBN 0 13 0309907.</em></span> The values result
222 from the determination of mercury by cold-vapour atomic absorption.
224 <pre class="programlisting"> __________________________________
225 2-Sided Confidence Limits For Mean
226 __________________________________
228 Number of Observations = 3
230 Standard Deviation = 0.9643650
233 ___________________________________________________________________
234 Confidence T Interval Lower Upper
235 Value (%) Value Width Limit Limit
236 ___________________________________________________________________
237 50.000 0.816 0.455 37.34539 38.25461
238 75.000 1.604 0.893 36.90717 38.69283
239 90.000 2.920 1.626 36.17422 39.42578
240 95.000 4.303 2.396 35.40438 40.19562
241 99.000 9.925 5.526 32.27408 43.32592
242 99.900 31.599 17.594 20.20639 55.39361
243 99.990 99.992 55.673 -17.87346 93.47346
244 99.999 316.225 176.067 -138.26683 213.86683
247 This time the fact that there are only three measurements leads to much
248 wider intervals, indeed such large intervals that it's hard to be very
249 confident in the location of the mean.
252 <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
253 <td align="left"></td>
254 <td align="right"><div class="copyright-footer">Copyright © 2006-2019 Nikhar
255 Agrawal, Anton Bikineev, Paul A. Bristow, Marco Guazzone, Christopher Kormanyos,
256 Hubert Holin, Bruno Lalande, John Maddock, Jeremy Murphy, Matthew Pulver, Johan
257 Råde, Gautam Sewani, Benjamin Sobotta, Nicholas Thompson, Thijs van den Berg,
258 Daryle Walker and Xiaogang Zhang<p>
259 Distributed under the Boost Software License, Version 1.0. (See accompanying
260 file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
265 <div class="spirit-nav">
266 <a accesskey="p" href="../st_eg.html"><img src="../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../st_eg.html"><img src="../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../index.html"><img src="../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="tut_mean_test.html"><img src="../../../../../../../../doc/src/images/next.png" alt="Next"></a>