3 <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
4 <title>Comparing the means of two samples with the Students-t test</title>
5 <link rel="stylesheet" href="../../../../math.css" type="text/css">
6 <meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
7 <link rel="home" href="../../../../index.html" title="Math Toolkit 2.11.0">
8 <link rel="up" href="../st_eg.html" title="Student's t Distribution Examples">
9 <link rel="prev" href="tut_mean_size.html" title="Estimating how large a sample size would have to become in order to give a significant Students-t test result with a single sample test">
10 <link rel="next" href="paired_st.html" title="Comparing two paired samples with the Student's t distribution">
12 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
13 <table cellpadding="2" width="100%"><tr>
14 <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../../../boost.png"></td>
15 <td align="center"><a href="../../../../../../../../index.html">Home</a></td>
16 <td align="center"><a href="../../../../../../../../libs/libraries.htm">Libraries</a></td>
17 <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
18 <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
19 <td align="center"><a href="../../../../../../../../more/index.htm">More</a></td>
22 <div class="spirit-nav">
23 <a accesskey="p" href="tut_mean_size.html"><img src="../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../st_eg.html"><img src="../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../index.html"><img src="../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="paired_st.html"><img src="../../../../../../../../doc/src/images/next.png" alt="Next"></a>
26 <div class="titlepage"><div><div><h5 class="title">
27 <a name="math_toolkit.stat_tut.weg.st_eg.two_sample_students_t"></a><a class="link" href="two_sample_students_t.html" title="Comparing the means of two samples with the Students-t test">Comparing
28 the means of two samples with the Students-t test</a>
29 </h5></div></div></div>
31 Imagine that we have two samples, and we wish to determine whether their
32 means are different or not. This situation often arises when determining
33 whether a new process or treatment is better than an old one.
36 In this example, we'll be using the <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda3531.htm" target="_top">Car
37 Mileage sample data</a> from the <a href="http://www.itl.nist.gov" target="_top">NIST
38 website</a>. The data compares miles per gallon of US cars with miles
39 per gallon of Japanese cars.
42 The sample code is in <a href="../../../../../../example/students_t_two_samples.cpp" target="_top">students_t_two_samples.cpp</a>.
45 There are two ways in which this test can be conducted: we can assume
46 that the true standard deviations of the two samples are equal or not.
47 If the standard deviations are assumed to be equal, then the calculation
48 of the t-statistic is greatly simplified, so we'll examine that case
49 first. In real life we should verify whether this assumption is valid
50 with a Chi-Squared test for equal variances.
53 We begin by defining a procedure that will conduct our test assuming
56 <pre class="programlisting"><span class="comment">// Needed headers:</span>
57 <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">math</span><span class="special">/</span><span class="identifier">distributions</span><span class="special">/</span><span class="identifier">students_t</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span>
58 <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span>
59 <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iomanip</span><span class="special">></span>
60 <span class="comment">// Simplify usage:</span>
61 <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">;</span>
62 <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">std</span><span class="special">;</span>
64 <span class="keyword">void</span> <span class="identifier">two_samples_t_test_equal_sd</span><span class="special">(</span>
65 <span class="keyword">double</span> <span class="identifier">Sm1</span><span class="special">,</span> <span class="comment">// Sm1 = Sample 1 Mean.</span>
66 <span class="keyword">double</span> <span class="identifier">Sd1</span><span class="special">,</span> <span class="comment">// Sd1 = Sample 1 Standard Deviation.</span>
67 <span class="keyword">unsigned</span> <span class="identifier">Sn1</span><span class="special">,</span> <span class="comment">// Sn1 = Sample 1 Size.</span>
68 <span class="keyword">double</span> <span class="identifier">Sm2</span><span class="special">,</span> <span class="comment">// Sm2 = Sample 2 Mean.</span>
69 <span class="keyword">double</span> <span class="identifier">Sd2</span><span class="special">,</span> <span class="comment">// Sd2 = Sample 2 Standard Deviation.</span>
70 <span class="keyword">unsigned</span> <span class="identifier">Sn2</span><span class="special">,</span> <span class="comment">// Sn2 = Sample 2 Size.</span>
71 <span class="keyword">double</span> <span class="identifier">alpha</span><span class="special">)</span> <span class="comment">// alpha = Significance Level.</span>
72 <span class="special">{</span>
75 Our procedure will begin by calculating the t-statistic, assuming equal
76 variances the needed formulae are:
78 <div class="blockquote"><blockquote class="blockquote"><p>
79 <span class="inlinemediaobject"><img src="../../../../../equations/dist_tutorial1.svg"></span>
81 </p></blockquote></div>
83 where Sp is the "pooled" standard deviation of the two samples,
84 and <span class="emphasis"><em>v</em></span> is the number of degrees of freedom of the
85 two combined samples. We can now write the code to calculate the t-statistic:
87 <pre class="programlisting"><span class="comment">// Degrees of freedom:</span>
88 <span class="keyword">double</span> <span class="identifier">v</span> <span class="special">=</span> <span class="identifier">Sn1</span> <span class="special">+</span> <span class="identifier">Sn2</span> <span class="special">-</span> <span class="number">2</span><span class="special">;</span>
89 <span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">55</span><span class="special">)</span> <span class="special"><<</span> <span class="identifier">left</span> <span class="special"><<</span> <span class="string">"Degrees of Freedom"</span> <span class="special"><<</span> <span class="string">"= "</span> <span class="special"><<</span> <span class="identifier">v</span> <span class="special"><<</span> <span class="string">"\n"</span><span class="special">;</span>
90 <span class="comment">// Pooled variance:</span>
91 <span class="keyword">double</span> <span class="identifier">sp</span> <span class="special">=</span> <span class="identifier">sqrt</span><span class="special">(((</span><span class="identifier">Sn1</span><span class="special">-</span><span class="number">1</span><span class="special">)</span> <span class="special">*</span> <span class="identifier">Sd1</span> <span class="special">*</span> <span class="identifier">Sd1</span> <span class="special">+</span> <span class="special">(</span><span class="identifier">Sn2</span><span class="special">-</span><span class="number">1</span><span class="special">)</span> <span class="special">*</span> <span class="identifier">Sd2</span> <span class="special">*</span> <span class="identifier">Sd2</span><span class="special">)</span> <span class="special">/</span> <span class="identifier">v</span><span class="special">);</span>
92 <span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">55</span><span class="special">)</span> <span class="special"><<</span> <span class="identifier">left</span> <span class="special"><<</span> <span class="string">"Pooled Standard Deviation"</span> <span class="special"><<</span> <span class="string">"= "</span> <span class="special"><<</span> <span class="identifier">sp</span> <span class="special"><<</span> <span class="string">"\n"</span><span class="special">;</span>
93 <span class="comment">// t-statistic:</span>
94 <span class="keyword">double</span> <span class="identifier">t_stat</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">Sm1</span> <span class="special">-</span> <span class="identifier">Sm2</span><span class="special">)</span> <span class="special">/</span> <span class="special">(</span><span class="identifier">sp</span> <span class="special">*</span> <span class="identifier">sqrt</span><span class="special">(</span><span class="number">1.0</span> <span class="special">/</span> <span class="identifier">Sn1</span> <span class="special">+</span> <span class="number">1.0</span> <span class="special">/</span> <span class="identifier">Sn2</span><span class="special">));</span>
95 <span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">55</span><span class="special">)</span> <span class="special"><<</span> <span class="identifier">left</span> <span class="special"><<</span> <span class="string">"T Statistic"</span> <span class="special"><<</span> <span class="string">"= "</span> <span class="special"><<</span> <span class="identifier">t_stat</span> <span class="special"><<</span> <span class="string">"\n"</span><span class="special">;</span>
98 The next step is to define our distribution object, and calculate the
99 complement of the probability:
101 <pre class="programlisting"><span class="identifier">students_t</span> <span class="identifier">dist</span><span class="special">(</span><span class="identifier">v</span><span class="special">);</span>
102 <span class="keyword">double</span> <span class="identifier">q</span> <span class="special">=</span> <span class="identifier">cdf</span><span class="special">(</span><span class="identifier">complement</span><span class="special">(</span><span class="identifier">dist</span><span class="special">,</span> <span class="identifier">fabs</span><span class="special">(</span><span class="identifier">t_stat</span><span class="special">)));</span>
103 <span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">55</span><span class="special">)</span> <span class="special"><<</span> <span class="identifier">left</span> <span class="special"><<</span> <span class="string">"Probability that difference is due to chance"</span> <span class="special"><<</span> <span class="string">"= "</span>
104 <span class="special"><<</span> <span class="identifier">setprecision</span><span class="special">(</span><span class="number">3</span><span class="special">)</span> <span class="special"><<</span> <span class="identifier">scientific</span> <span class="special"><<</span> <span class="number">2</span> <span class="special">*</span> <span class="identifier">q</span> <span class="special"><<</span> <span class="string">"\n\n"</span><span class="special">;</span>
107 Here we've used the absolute value of the t-statistic, because we initially
108 want to know simply whether there is a difference or not (a two-sided
109 test). However, we can also test whether the mean of the second sample
110 is greater or is less (one-sided test) than that of the first: all the
111 possible tests are summed up in the following table:
113 <div class="informaltable"><table class="table">
134 The Null-hypothesis: there is <span class="bold"><strong>no difference</strong></span>
140 Reject if complement of CDF for |t| < significance level
144 <code class="computeroutput"><span class="identifier">cdf</span><span class="special">(</span><span class="identifier">complement</span><span class="special">(</span><span class="identifier">dist</span><span class="special">,</span>
145 <span class="identifier">fabs</span><span class="special">(</span><span class="identifier">t</span><span class="special">)))</span>
146 <span class="special"><</span> <span class="identifier">alpha</span>
147 <span class="special">/</span> <span class="number">2</span></code>
154 The Alternative-hypothesis: there is a <span class="bold"><strong>difference</strong></span>
160 Reject if complement of CDF for |t| > significance level
164 <code class="computeroutput"><span class="identifier">cdf</span><span class="special">(</span><span class="identifier">complement</span><span class="special">(</span><span class="identifier">dist</span><span class="special">,</span>
165 <span class="identifier">fabs</span><span class="special">(</span><span class="identifier">t</span><span class="special">)))</span>
166 <span class="special">></span> <span class="identifier">alpha</span>
167 <span class="special">/</span> <span class="number">2</span></code>
174 The Alternative-hypothesis: Sample 1 Mean is <span class="bold"><strong>less</strong></span>
180 Reject if CDF of t > significance level:
183 <code class="computeroutput"><span class="identifier">cdf</span><span class="special">(</span><span class="identifier">dist</span><span class="special">,</span>
184 <span class="identifier">t</span><span class="special">)</span>
185 <span class="special">></span> <span class="identifier">alpha</span></code>
192 The Alternative-hypothesis: Sample 1 Mean is <span class="bold"><strong>greater</strong></span>
198 Reject if complement of CDF of t > significance level:
201 <code class="computeroutput"><span class="identifier">cdf</span><span class="special">(</span><span class="identifier">complement</span><span class="special">(</span><span class="identifier">dist</span><span class="special">,</span>
202 <span class="identifier">t</span><span class="special">))</span>
203 <span class="special">></span> <span class="identifier">alpha</span></code>
209 <div class="note"><table border="0" summary="Note">
211 <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../../../../doc/src/images/note.png"></td>
212 <th align="left">Note</th>
214 <tr><td align="left" valign="top"><p>
215 For a two-sided test we must compare against alpha / 2 and not alpha.
219 Most of the rest of the sample program is pretty-printing, so we'll skip
220 over that, and take a look at the sample output for alpha=0.05 (a 95%
221 probability level). For comparison the dataplot output for the same data
222 is in <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm" target="_top">section
223 1.3.5.3</a> of the <a href="http://www.itl.nist.gov/div898/handbook/" target="_top">NIST/SEMATECH
224 e-Handbook of Statistical Methods.</a>.
226 <pre class="programlisting"> ________________________________________________
227 Student t test for two samples (equal variances)
228 ________________________________________________
230 Number of Observations (Sample 1) = 249
231 Sample 1 Mean = 20.145
232 Sample 1 Standard Deviation = 6.4147
233 Number of Observations (Sample 2) = 79
234 Sample 2 Mean = 30.481
235 Sample 2 Standard Deviation = 6.1077
236 Degrees of Freedom = 326
237 Pooled Standard Deviation = 6.3426
238 T Statistic = -12.621
239 Probability that difference is due to chance = 5.273e-030
241 Results for Alternative Hypothesis and alpha = 0.0500
243 Alternative Hypothesis Conclusion
244 Sample 1 Mean != Sample 2 Mean NOT REJECTED
245 Sample 1 Mean < Sample 2 Mean NOT REJECTED
246 Sample 1 Mean > Sample 2 Mean REJECTED
249 So with a probability that the difference is due to chance of just 5.273e-030,
250 we can safely conclude that there is indeed a difference.
253 The tests on the alternative hypothesis show that we must also reject
254 the hypothesis that Sample 1 Mean is greater than that for Sample 2:
255 in this case Sample 1 represents the miles per gallon for Japanese cars,
256 and Sample 2 the miles per gallon for US cars, so we conclude that Japanese
257 cars are on average more fuel efficient.
260 Now that we have the simple case out of the way, let's look for a moment
261 at the more complex one: that the standard deviations of the two samples
262 are not equal. In this case the formula for the t-statistic becomes:
264 <div class="blockquote"><blockquote class="blockquote"><p>
265 <span class="inlinemediaobject"><img src="../../../../../equations/dist_tutorial2.svg"></span>
267 </p></blockquote></div>
269 And for the combined degrees of freedom we use the <a href="http://en.wikipedia.org/wiki/Welch-Satterthwaite_equation" target="_top">Welch-Satterthwaite</a>
272 <div class="blockquote"><blockquote class="blockquote"><p>
273 <span class="inlinemediaobject"><img src="../../../../../equations/dist_tutorial3.svg"></span>
275 </p></blockquote></div>
277 Note that this is one of the rare situations where the degrees-of-freedom
278 parameter to the Student's t distribution is a real number, and not an
281 <div class="note"><table border="0" summary="Note">
283 <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../../../../doc/src/images/note.png"></td>
284 <th align="left">Note</th>
286 <tr><td align="left" valign="top"><p>
287 Some statistical packages truncate the effective degrees of freedom
288 to an integer value: this may be necessary if you are relying on lookup
289 tables, but since our code fully supports non-integer degrees of freedom
290 there is no need to truncate in this case. Also note that when the
291 degrees of freedom is small then the Welch-Satterthwaite approximation
292 may be a significant source of error.
296 Putting these formulae into code we get:
298 <pre class="programlisting"><span class="comment">// Degrees of freedom:</span>
299 <span class="keyword">double</span> <span class="identifier">v</span> <span class="special">=</span> <span class="identifier">Sd1</span> <span class="special">*</span> <span class="identifier">Sd1</span> <span class="special">/</span> <span class="identifier">Sn1</span> <span class="special">+</span> <span class="identifier">Sd2</span> <span class="special">*</span> <span class="identifier">Sd2</span> <span class="special">/</span> <span class="identifier">Sn2</span><span class="special">;</span>
300 <span class="identifier">v</span> <span class="special">*=</span> <span class="identifier">v</span><span class="special">;</span>
301 <span class="keyword">double</span> <span class="identifier">t1</span> <span class="special">=</span> <span class="identifier">Sd1</span> <span class="special">*</span> <span class="identifier">Sd1</span> <span class="special">/</span> <span class="identifier">Sn1</span><span class="special">;</span>
302 <span class="identifier">t1</span> <span class="special">*=</span> <span class="identifier">t1</span><span class="special">;</span>
303 <span class="identifier">t1</span> <span class="special">/=</span> <span class="special">(</span><span class="identifier">Sn1</span> <span class="special">-</span> <span class="number">1</span><span class="special">);</span>
304 <span class="keyword">double</span> <span class="identifier">t2</span> <span class="special">=</span> <span class="identifier">Sd2</span> <span class="special">*</span> <span class="identifier">Sd2</span> <span class="special">/</span> <span class="identifier">Sn2</span><span class="special">;</span>
305 <span class="identifier">t2</span> <span class="special">*=</span> <span class="identifier">t2</span><span class="special">;</span>
306 <span class="identifier">t2</span> <span class="special">/=</span> <span class="special">(</span><span class="identifier">Sn2</span> <span class="special">-</span> <span class="number">1</span><span class="special">);</span>
307 <span class="identifier">v</span> <span class="special">/=</span> <span class="special">(</span><span class="identifier">t1</span> <span class="special">+</span> <span class="identifier">t2</span><span class="special">);</span>
308 <span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">55</span><span class="special">)</span> <span class="special"><<</span> <span class="identifier">left</span> <span class="special"><<</span> <span class="string">"Degrees of Freedom"</span> <span class="special"><<</span> <span class="string">"= "</span> <span class="special"><<</span> <span class="identifier">v</span> <span class="special"><<</span> <span class="string">"\n"</span><span class="special">;</span>
309 <span class="comment">// t-statistic:</span>
310 <span class="keyword">double</span> <span class="identifier">t_stat</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">Sm1</span> <span class="special">-</span> <span class="identifier">Sm2</span><span class="special">)</span> <span class="special">/</span> <span class="identifier">sqrt</span><span class="special">(</span><span class="identifier">Sd1</span> <span class="special">*</span> <span class="identifier">Sd1</span> <span class="special">/</span> <span class="identifier">Sn1</span> <span class="special">+</span> <span class="identifier">Sd2</span> <span class="special">*</span> <span class="identifier">Sd2</span> <span class="special">/</span> <span class="identifier">Sn2</span><span class="special">);</span>
311 <span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">55</span><span class="special">)</span> <span class="special"><<</span> <span class="identifier">left</span> <span class="special"><<</span> <span class="string">"T Statistic"</span> <span class="special"><<</span> <span class="string">"= "</span> <span class="special"><<</span> <span class="identifier">t_stat</span> <span class="special"><<</span> <span class="string">"\n"</span><span class="special">;</span>
314 Thereafter the code and the tests are performed the same as before. Using
315 are car mileage data again, here's what the output looks like:
317 <pre class="programlisting"> __________________________________________________
318 Student t test for two samples (unequal variances)
319 __________________________________________________
321 Number of Observations (Sample 1) = 249
322 Sample 1 Mean = 20.145
323 Sample 1 Standard Deviation = 6.4147
324 Number of Observations (Sample 2) = 79
325 Sample 2 Mean = 30.481
326 Sample 2 Standard Deviation = 6.1077
327 Degrees of Freedom = 136.87
328 T Statistic = -12.946
329 Probability that difference is due to chance = 1.571e-025
331 Results for Alternative Hypothesis and alpha = 0.0500
333 Alternative Hypothesis Conclusion
334 Sample 1 Mean != Sample 2 Mean NOT REJECTED
335 Sample 1 Mean < Sample 2 Mean NOT REJECTED
336 Sample 1 Mean > Sample 2 Mean REJECTED
339 This time allowing the variances in the two samples to differ has yielded
340 a higher likelihood that the observed difference is down to chance alone
341 (1.571e-025 compared to 5.273e-030 when equal variances were assumed).
342 However, the conclusion remains the same: US cars are less fuel efficient
343 than Japanese models.
346 <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
347 <td align="left"></td>
348 <td align="right"><div class="copyright-footer">Copyright © 2006-2019 Nikhar
349 Agrawal, Anton Bikineev, Paul A. Bristow, Marco Guazzone, Christopher Kormanyos,
350 Hubert Holin, Bruno Lalande, John Maddock, Jeremy Murphy, Matthew Pulver, Johan
351 Råde, Gautam Sewani, Benjamin Sobotta, Nicholas Thompson, Thijs van den Berg,
352 Daryle Walker and Xiaogang Zhang<p>
353 Distributed under the Boost Software License, Version 1.0. (See accompanying
354 file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
359 <div class="spirit-nav">
360 <a accesskey="p" href="tut_mean_size.html"><img src="../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../st_eg.html"><img src="../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../index.html"><img src="../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="paired_st.html"><img src="../../../../../../../../doc/src/images/next.png" alt="Next"></a>