libs/math/doc/html/math_toolkit/stat_tut/weg/st_eg/tut_mean_intervals.html

   1 <html>
   2 <head>
   3 <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
   4 <title>Calculating confidence intervals on the mean with the Students-t distribution</title>
   5 <link rel="stylesheet" href="../../../../math.css" type="text/css">
   6 <meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
   7 <link rel="home" href="../../../../index.html" title="Math Toolkit 2.11.0">
   8 <link rel="up" href="../st_eg.html" title="Student's t Distribution Examples">
   9 <link rel="prev" href="../st_eg.html" title="Student's t Distribution Examples">
  10 <link rel="next" href="tut_mean_test.html" title='Testing a sample mean for difference from a "true" mean'>
  11 </head>
  12 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
  13 <table cellpadding="2" width="100%"><tr>
  14 <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../../../boost.png"></td>
  15 <td align="center"><a href="../../../../../../../../index.html">Home</a></td>
  16 <td align="center"><a href="../../../../../../../../libs/libraries.htm">Libraries</a></td>
  17 <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
  18 <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
  19 <td align="center"><a href="../../../../../../../../more/index.htm">More</a></td>
  20 </tr></table>
  21 <hr>
  22 <div class="spirit-nav">
  23 <a accesskey="p" href="../st_eg.html"><img src="../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../st_eg.html"><img src="../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../index.html"><img src="../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="tut_mean_test.html"><img src="../../../../../../../../doc/src/images/next.png" alt="Next"></a>
  24 </div>
  25 <div class="section">
  26 <div class="titlepage"><div><div><h5 class="title">
  27 <a name="math_toolkit.stat_tut.weg.st_eg.tut_mean_intervals"></a><a class="link" href="tut_mean_intervals.html" title="Calculating confidence intervals on the mean with the Students-t distribution">Calculating
  28           confidence intervals on the mean with the Students-t distribution</a>
  29 </h5></div></div></div>
  30 <p>
  31             Let's say you have a sample mean, you may wish to know what confidence
  32             intervals you can place on that mean. Colloquially: "I want an interval
  33             that I can be P% sure contains the true mean". (On a technical point,
  34             note that the interval either contains the true mean or it does not:
  35             the meaning of the confidence level is subtly different from this colloquialism.
  36             More background information can be found on the <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm" target="_top">NIST
  37             site</a>).
  38           </p>
  39 <p>
  40             The formula for the interval can be expressed as:
  41           </p>
  42 <div class="blockquote"><blockquote class="blockquote"><p>
  43               <span class="inlinemediaobject"><img src="../../../../../equations/dist_tutorial4.svg"></span>
  44
  45             </p></blockquote></div>
  46 <p>
  47             Where, <span class="emphasis"><em>Y<sub>s</sub></em></span> is the sample mean, <span class="emphasis"><em>s</em></span>
  48             is the sample standard deviation, <span class="emphasis"><em>N</em></span> is the sample
  49             size, /&#945;/ is the desired significance level and <span class="emphasis"><em>t<sub>(&#945;/2,N-1)</sub></em></span>
  50             is the upper critical value of the Students-t distribution with <span class="emphasis"><em>N-1</em></span>
  51             degrees of freedom.
  52           </p>
  53 <div class="note"><table border="0" summary="Note">
  54 <tr>
  55 <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../../../../doc/src/images/note.png"></td>
  56 <th align="left">Note</th>
  57 </tr>
  58 <tr><td align="left" valign="top">
  59 <p>
  60               The quantity &#945; is the maximum acceptable risk of falsely rejecting the
  61               null-hypothesis. The smaller the value of &#945; the greater the strength
  62               of the test.
  63             </p>
  64 <p>
  65               The confidence level of the test is defined as 1 - &#945;, and often expressed
  66               as a percentage. So for example a significance level of 0.05, is equivalent
  67               to a 95% confidence level. Refer to <a href="http://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm" target="_top">"What
  68               are confidence intervals?"</a> in <a href="http://www.itl.nist.gov/div898/handbook/" target="_top">NIST/SEMATECH
  69               e-Handbook of Statistical Methods.</a> for more information.
  70             </p>
  71 </td></tr>
  72 </table></div>
  73 <div class="note"><table border="0" summary="Note">
  74 <tr>
  75 <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../../../../doc/src/images/note.png"></td>
  76 <th align="left">Note</th>
  77 </tr>
  78 <tr><td align="left" valign="top"><p>
  79               The usual assumptions of <a href="http://en.wikipedia.org/wiki/Independent_and_identically-distributed_random_variables" target="_top">independent
  80               and identically distributed (i.i.d.)</a> variables and <a href="http://en.wikipedia.org/wiki/Normal_distribution" target="_top">normal
  81               distribution</a> of course apply here, as they do in other examples.
  82             </p></td></tr>
  83 </table></div>
  84 <p>
  85             From the formula, it should be clear that:
  86           </p>
  87 <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
  88 <li class="listitem">
  89                 The width of the confidence interval decreases as the sample size
  90                 increases.
  91               </li>
  92 <li class="listitem">
  93                 The width increases as the standard deviation increases.
  94               </li>
  95 <li class="listitem">
  96                 The width increases as the <span class="emphasis"><em>confidence level increases</em></span>
  97                 (0.5 towards 0.99999 - stronger).
  98               </li>
  99 <li class="listitem">
 100                 The width increases as the <span class="emphasis"><em>significance level decreases</em></span>
 101                 (0.5 towards 0.00000...01 - stronger).
 102               </li>
 103 </ul></div>
 104 <p>
 105             The following example code is taken from the example program <a href="../../../../../../example/students_t_single_sample.cpp" target="_top">students_t_single_sample.cpp</a>.
 106           </p>
 107 <p>
 108             We'll begin by defining a procedure to calculate intervals for various
 109             confidence levels; the procedure will print these out as a table:
 110           </p>
 111 <pre class="programlisting"><span class="comment">// Needed includes:</span>
 112 <span class="preprocessor">#include</span> <span class="special">&lt;</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">math</span><span class="special">/</span><span class="identifier">distributions</span><span class="special">/</span><span class="identifier">students_t</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">&gt;</span>
 113 <span class="preprocessor">#include</span> <span class="special">&lt;</span><span class="identifier">iostream</span><span class="special">&gt;</span>
 114 <span class="preprocessor">#include</span> <span class="special">&lt;</span><span class="identifier">iomanip</span><span class="special">&gt;</span>
 115 <span class="comment">// Bring everything into global namespace for ease of use:</span>
 116 <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">;</span>
 117 <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">std</span><span class="special">;</span>
 118
 119 <span class="keyword">void</span> <span class="identifier">confidence_limits_on_mean</span><span class="special">(</span>
 120    <span class="keyword">double</span> <span class="identifier">Sm</span><span class="special">,</span>           <span class="comment">// Sm = Sample Mean.</span>
 121    <span class="keyword">double</span> <span class="identifier">Sd</span><span class="special">,</span>           <span class="comment">// Sd = Sample Standard Deviation.</span>
 122    <span class="keyword">unsigned</span> <span class="identifier">Sn</span><span class="special">)</span>         <span class="comment">// Sn = Sample Size.</span>
 123 <span class="special">{</span>
 124    <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">std</span><span class="special">;</span>
 125    <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">;</span>
 126
 127    <span class="comment">// Print out general info:</span>
 128    <span class="identifier">cout</span> <span class="special">&lt;&lt;</span>
 129       <span class="string">"__________________________________\n"</span>
 130       <span class="string">"2-Sided Confidence Limits For Mean\n"</span>
 131       <span class="string">"__________________________________\n\n"</span><span class="special">;</span>
 132    <span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">setprecision</span><span class="special">(</span><span class="number">7</span><span class="special">);</span>
 133    <span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">40</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">left</span> <span class="special">&lt;&lt;</span> <span class="string">"Number of Observations"</span> <span class="special">&lt;&lt;</span> <span class="string">"=  "</span> <span class="special">&lt;&lt;</span> <span class="identifier">Sn</span> <span class="special">&lt;&lt;</span> <span class="string">"\n"</span><span class="special">;</span>
 134    <span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">40</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">left</span> <span class="special">&lt;&lt;</span> <span class="string">"Mean"</span> <span class="special">&lt;&lt;</span> <span class="string">"=  "</span> <span class="special">&lt;&lt;</span> <span class="identifier">Sm</span> <span class="special">&lt;&lt;</span> <span class="string">"\n"</span><span class="special">;</span>
 135    <span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">40</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">left</span> <span class="special">&lt;&lt;</span> <span class="string">"Standard Deviation"</span> <span class="special">&lt;&lt;</span> <span class="string">"=  "</span> <span class="special">&lt;&lt;</span> <span class="identifier">Sd</span> <span class="special">&lt;&lt;</span> <span class="string">"\n"</span><span class="special">;</span>
 136 </pre>
 137 <p>
 138             We'll define a table of significance/risk levels for which we'll compute
 139             intervals:
 140           </p>
 141 <pre class="programlisting"><span class="keyword">double</span> <span class="identifier">alpha</span><span class="special">[]</span> <span class="special">=</span> <span class="special">{</span> <span class="number">0.5</span><span class="special">,</span> <span class="number">0.25</span><span class="special">,</span> <span class="number">0.1</span><span class="special">,</span> <span class="number">0.05</span><span class="special">,</span> <span class="number">0.01</span><span class="special">,</span> <span class="number">0.001</span><span class="special">,</span> <span class="number">0.0001</span><span class="special">,</span> <span class="number">0.00001</span> <span class="special">};</span>
 142 </pre>
 143 <p>
 144             Note that these are the complements of the confidence/probability levels:
 145             0.5, 0.75, 0.9 .. 0.99999).
 146           </p>
 147 <p>
 148             Next we'll declare the distribution object we'll need, note that the
 149             <span class="emphasis"><em>degrees of freedom</em></span> parameter is the sample size
 150             less one:
 151           </p>
 152 <pre class="programlisting"><span class="identifier">students_t</span> <span class="identifier">dist</span><span class="special">(</span><span class="identifier">Sn</span> <span class="special">-</span> <span class="number">1</span><span class="special">);</span>
 153 </pre>
 154 <p>
 155             Most of what follows in the program is pretty printing, so let's focus
 156             on the calculation of the interval. First we need the t-statistic, computed
 157             using the <span class="emphasis"><em>quantile</em></span> function and our significance
 158             level. Note that since the significance levels are the complement of
 159             the probability, we have to wrap the arguments in a call to <span class="emphasis"><em>complement(...)</em></span>:
 160           </p>
 161 <pre class="programlisting"><span class="keyword">double</span> <span class="identifier">T</span> <span class="special">=</span> <span class="identifier">quantile</span><span class="special">(</span><span class="identifier">complement</span><span class="special">(</span><span class="identifier">dist</span><span class="special">,</span> <span class="identifier">alpha</span><span class="special">[</span><span class="identifier">i</span><span class="special">]</span> <span class="special">/</span> <span class="number">2</span><span class="special">));</span>
 162 </pre>
 163 <p>
 164             Note that alpha was divided by two, since we'll be calculating both the
 165             upper and lower bounds: had we been interested in a single sided interval
 166             then we would have omitted this step.
 167           </p>
 168 <p>
 169             Now to complete the picture, we'll get the (one-sided) width of the interval
 170             from the t-statistic by multiplying by the standard deviation, and dividing
 171             by the square root of the sample size:
 172           </p>
 173 <pre class="programlisting"><span class="keyword">double</span> <span class="identifier">w</span> <span class="special">=</span> <span class="identifier">T</span> <span class="special">*</span> <span class="identifier">Sd</span> <span class="special">/</span> <span class="identifier">sqrt</span><span class="special">(</span><span class="keyword">double</span><span class="special">(</span><span class="identifier">Sn</span><span class="special">));</span>
 174 </pre>
 175 <p>
 176             The two-sided interval is then the sample mean plus and minus this width.
 177           </p>
 178 <p>
 179             And apart from some more pretty-printing that completes the procedure.
 180           </p>
 181 <p>
 182             Let's take a look at some sample output, first using the <a href="http://www.itl.nist.gov/div898/handbook/eda/section4/eda428.htm" target="_top">Heat
 183             flow data</a> from the NIST site. The data set was collected by Bob
 184             Zarr of NIST in January, 1990 from a heat flow meter calibration and
 185             stability analysis. The corresponding dataplot output for this test can
 186             be found in <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm" target="_top">section
 187             3.5.2</a> of the <a href="http://www.itl.nist.gov/div898/handbook/" target="_top">NIST/SEMATECH
 188             e-Handbook of Statistical Methods.</a>.
 189           </p>
 190 <pre class="programlisting">   __________________________________
 191    2-Sided Confidence Limits For Mean
 192    __________________________________
 193
 194    Number of Observations                  =  195
 195    Mean                                    =  9.26146
 196    Standard Deviation                      =  0.02278881
 197
 198
 199    ___________________________________________________________________
 200    Confidence       T           Interval          Lower          Upper
 201     Value (%)     Value          Width            Limit          Limit
 202    ___________________________________________________________________
 203        50.000     0.676       1.103e-003        9.26036        9.26256
 204        75.000     1.154       1.883e-003        9.25958        9.26334
 205        90.000     1.653       2.697e-003        9.25876        9.26416
 206        95.000     1.972       3.219e-003        9.25824        9.26468
 207        99.000     2.601       4.245e-003        9.25721        9.26571
 208        99.900     3.341       5.453e-003        9.25601        9.26691
 209        99.990     3.973       6.484e-003        9.25498        9.26794
 210        99.999     4.537       7.404e-003        9.25406        9.26886
 211 </pre>
 212 <p>
 213             As you can see the large sample size (195) and small standard deviation
 214             (0.023) have combined to give very small intervals, indeed we can be
 215             very confident that the true mean is 9.2.
 216           </p>
 217 <p>
 218             For comparison the next example data output is taken from <span class="emphasis"><em>P.K.Hou,
 219             O. W. Lau &amp; M.C. Wong, Analyst (1983) vol. 108, p 64. and from Statistics
 220             for Analytical Chemistry, 3rd ed. (1994), pp 54-55 J. C. Miller and J.
 221             N. Miller, Ellis Horwood ISBN 0 13 0309907.</em></span> The values result
 222             from the determination of mercury by cold-vapour atomic absorption.
 223           </p>
 224 <pre class="programlisting">   __________________________________
 225    2-Sided Confidence Limits For Mean
 226    __________________________________
 227
 228    Number of Observations                  =  3
 229    Mean                                    =  37.8000000
 230    Standard Deviation                      =  0.9643650
 231
 232
 233    ___________________________________________________________________
 234    Confidence       T           Interval          Lower          Upper
 235     Value (%)     Value          Width            Limit          Limit
 236    ___________________________________________________________________
 237        50.000     0.816            0.455       37.34539       38.25461
 238        75.000     1.604            0.893       36.90717       38.69283
 239        90.000     2.920            1.626       36.17422       39.42578
 240        95.000     4.303            2.396       35.40438       40.19562
 241        99.000     9.925            5.526       32.27408       43.32592
 242        99.900    31.599           17.594       20.20639       55.39361
 243        99.990    99.992           55.673      -17.87346       93.47346
 244        99.999   316.225          176.067     -138.26683      213.86683
 245 </pre>
 246 <p>
 247             This time the fact that there are only three measurements leads to much
 248             wider intervals, indeed such large intervals that it's hard to be very
 249             confident in the location of the mean.
 250           </p>
 251 </div>
 252 <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
 253 <td align="left"></td>
 254 <td align="right"><div class="copyright-footer">Copyright &#169; 2006-2019 Nikhar
 255       Agrawal, Anton Bikineev, Paul A. Bristow, Marco Guazzone, Christopher Kormanyos,
 256       Hubert Holin, Bruno Lalande, John Maddock, Jeremy Murphy, Matthew Pulver, Johan
 257       R&#229;de, Gautam Sewani, Benjamin Sobotta, Nicholas Thompson, Thijs van den Berg,
 258       Daryle Walker and Xiaogang Zhang<p>
 259         Distributed under the Boost Software License, Version 1.0. (See accompanying
 260         file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
 261       </p>
 262 </div></td>
 263 </tr></table>
 264 <hr>
 265 <div class="spirit-nav">
 266 <a accesskey="p" href="../st_eg.html"><img src="../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../st_eg.html"><img src="../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../index.html"><img src="../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="tut_mean_test.html"><img src="../../../../../../../../doc/src/images/next.png" alt="Next"></a>
 267 </div>
 268 </body>
 269 </html>