HYPOTHESIS TESTING: • 2 Sample t-Test • One Way ANOVA M. A. Sibley Consulting – All Rights Reserved HypothesisTest 1 Hypothesis Testing • How do you sift through variables to separate the “Vital Few” from the “Trivial Many”? • There are many tools: – – – – – – The Cause & Effect (Fishbone) Diagram The Cause & Effect Matrix Failure Modes and Effect Analysis (FMEA) Prior Knowledge Graphical Analysis Hypothesis Testing M. A. Sibley Consulting – All Rights Reserved HypothesisTest 2 Hypothesis Testing • Why would we want to use Hypothesis Testing? • Hypothesis Testing takes a practical question like: “Is the polymer viscosity higher with the new solvent?” and frames it in statistical terms • It quantifies the risk that your conclusion is right or wrong so you can justify spending more time & money on experiments, scale ups, etc. • It reduces subjectivity in decision making • There are many forms of hypothesis tests which can answer questions related to differences in means or variability, proportion of defects, counts of occurrences, etc. M. A. Sibley Consulting – All Rights Reserved HypothesisTest 3 Hypothesis Testing cont’d Population Sample Sampling Scheme Should select a representative sample. Conclusions about the Population Data M. A. Sibley Consulting – All Rights Reserved HypothesisTest 4 Hypothesis Testing cont’d • Hypothesis Tests are statements about Population Parameters based on Sample Statistics Population Parameter Sample Statistic π s p Mean Standard Deviation Proportion M. A. Sibley Consulting – All Rights Reserved HypothesisTest 5 Hypothesis Testing cont’d Hypothesis Testing Roadmap 1. Select the X and Y for your test. eg. Y = Breaking Strength; X = Additive Presence 2. Find, or better still, run experiments to obtain data. 3. Graph. 4. If graphical analysis looks promising continue. 5. Write competing hypotheses: eg. • Null Hypothesis (Ho) = additive does not affect breaking strength • Alternative Hypothesis (Ha) = additive does affect breaking strength 6. Select appropriate statistical test eg. 2 Sample t-test. 7. Verify data is acceptable for the test eg. Data is normally distributed at each level (eg. With or without additive). 8. Perform analysis. 9. Draw conclusions: eg. We have statistical evidence that the presence of the Additive increases Breaking Strength. M. A. Sibley Consulting – All Rights Reserved HypothesisTest 6 Hypothesis Testing cont’d • Statistical Significance: • Ho is assumed; the burden of proof is on Ha. • Look for strong evidence to reject Ho. Then we accept Ha. • Typically we look for 95% confidence that Ho is false. • Ha is sometimes called the “Research Claim”. • 2 outcomes: • Reject Ho and accept Ha. statistically significant • Fail to reject Ho. not statistically significant M. A. Sibley Consulting – All Rights Reserved HypothesisTest 7 Hypothesis Testing cont’d • Consider a Canadian Court of Law: In everyday terms: Conclusion (Verdict) Innocent Guilty Innocent Correct Decision Incorrect Decision Guilty Incorrect Decision Correct Decision True State In statistical terms: Conclusion (Verdict) Not Reject Ho Reject Ho Ho true Correct Decision Type 1 (α) Error Ho false Type 2 (β) Error Correct Decision True State Which error are we more tolerant of? M. A. Sibley Consulting – All Rights Reserved HypothesisTest Even if there is a statistically significant difference, it may not be practically significant! 8 Hypothesis Testing cont’d Risks: • α Risk: conclude there is a difference when there is not • Make changes, investments that are not needed • β Risk: Conclude there is no difference when there is a difference. • Status quo; missed opportunity M. A. Sibley Consulting – All Rights Reserved HypothesisTest 9 Hypothesis Testing cont’d • All statistical tests calculate a p-value (Probability-value) • Decision Rule: • Reject Ho if the p-value is less than the “critical” value you choose before hand. Ha is supported. • If not less, cannot reject Ho; Ha is not supported. • Critical values (Pcritical or α risk): • Typically use 0.05 • For a critical safety system might choose 0.003 • For a marketing decision might choose 0.2 “If p is low the Null must go” M. A. Sibley Consulting – All Rights Reserved HypothesisTest 10 Hypothesis Testing cont’d Major Hypothesis Tests Y Type X Type # of X’s # Subgroups Continuous Test Discrete 1 1 1 Sample t-Test Continuous Discrete 1 2 2 Sample t-Test Continuous Discrete 1 2+ 1 way ANOVA (ANOVA = Analysis of Variance) Continuous Continuous /Discrete 2+ 2+ ANOVA GLM (GLM = General Linear Model) Continuous Continuous 1 n/a (Linear) Regression Continuous Continuous 2+ n/a Multiple Regression Discrete Discrete 1 2+ π Analysis of Proportions Discrete or Binary Continuous 1+ 2+ π Binary Logistic Regression Continuous n/a Test for Equal Variance Highlighted tests will be shown in this presentation. M. A. Sibley Consulting – All Rights Reserved HypothesisTest 11 Hypothesis Testing cont’d • Acceptable data: • Representative • No outliers, little auto-correlation (which is where next value is likely to be similar to the previous value), no severe skewing (for most tests), no distinct bi-modality, roughly bell shaped distribution • Correctly measured, recorded • Good measurement capability • Total Variance = Process Variance + Measurement Variance • If Measurement Variance is too high a proportion of the total variance, then seeing the benefit of a process improvement can be very difficult. This topic is called “Measurement Systems Analysis” (MSA). The premier tool is Gauge R & R. • Enough data to have a sensitive enough test • “Power & Sample Size” (Stat > Power and Sample Size) • More samples reduces the β risk – i.e. increases the likelihood of seeing a difference if there really is one. M. A. Sibley Consulting – All Rights Reserved HypothesisTest 12 Hypothesis Testing cont’d 2 Sample t-Test: • A sample drawn from a normal distribution follows a “Student-t” distribution, which is a bell shaped distribution like the normal distribution but with heavier tails. As the number of samples increases the corresponding tdistribution becomes closer to normal. • This distribution lets us draw conclusions about the range in which the population mean is likely to be found. • There are 2 types of 2 sample t-tests: • 2 Sample – eg. Before / After a process change, Batches on Night Shift vs Day Shift • Paired – eg. Same sample analyzed on instrument A vs instrument B.; hair coverage before and after use of Rogaine by a group of individuals. This data could be analyzed by a simple 2 sample t-test but it is then less “powerful”. M. A. Sibley Consulting – All Rights Reserved HypothesisTest 13 Hypothesis Testing cont’d • Open the worksheet: 2 Sample t Test.MTW • Perform a 2 Sample t-Test (Stat > Basic Statistics > 2 Sample t…) Output: Two-sample T for Viscosity vs New Air Dryer Installation After Before N 20 23 Mean 63.04 62.02 StDev 1.76 1.93 SE Mean 0.39 0.40 Not statistically significant! Difference = mu (After) - mu (Before) Estimate for difference: 1.028 95% CI for difference: (-0.109, 2.165) T-Test of difference = 0 (vs not =): T-Value = 1.83 M. A. Sibley Consulting – All Rights Reserved End of story? P-Value = 0.075 HypothesisTest DF = 40 14 Hypothesis Testing cont’d • We look at the data graphically (which we should have done beforehand!) We see an “outlier”. The comment for this point says “Faulty sample valve operation; water in sample” so we are justified in removing this point from the analysis. Boxplot of Viscosity 67 66 65 Viscosity 64 63 62 61 60 59 58 After Before vs New Air Dryer Installation M. A. Sibley Consulting – All Rights Reserved HypothesisTest 15 Hypothesis Testing cont’d • Output after removal (replacement asterisk – Minitab’s missing data symbol): Boxplot of Viscosity Statistically significant! Reject H0 67 66 65 Two-sample T for Viscosity 64 Viscosity vs New Air Dryer Installation N Mean StDev After 20 63.04 1.76 Before 22 61.79 1.64 of data point by an SE Mean 0.39 0.35 63 62 61 60 59 58 After Difference = mu (After) - mu (Before) Estimate for difference: 1.254 95% CI for difference: (0.191, 2.317) T-Test of difference = 0 (vs not =): T-Value = 2.39 M. A. Sibley Consulting – All Rights Reserved Before vs New Air Dryer Installation P-Value = 0.022 HypothesisTest DF = 38 16 Hypothesis Testing cont’d • We notice that the graph shows “After” before “Before”! We can fix this: • Right Click on the column: After we click on OK, we right click on the graph and select Update Graph Now or we run the analysis which generated it again. M. A. Sibley Consulting – All Rights Reserved HypothesisTest 17 Hypothesis Testing cont’d One Way ANOVA: ANOVA or Analysis of Variance is the hypothesis test equivalent of the BoxPlot i.e. it is used to judge if a discrete X can explain variability in a Y. Theory: Consider these 9 observations (Y) (3 per colour) Colour (X) i 1 2 3 j Red 1 11 10 9 Blue 2 10 15 13 14 Green 3 16 13 16 14 15 Grand Avg (𝑥 ): Row Avg (𝑥𝑗 ) 13 We can see that if the numbers within a row are similar to the row average but the row averages are quite different from each other, then Colour might be a significant factor in explaining overall variability in the Y. The ANOVA test uses this principle to calculate the p-Value that you can use to judge if the factor is significant in explaining the variability. M. A. Sibley Consulting – All Rights Reserved HypothesisTest 18 Hypothesis Testing cont’d Reference Slide One Way ANOVA: Theory cont’d: Here are the formulae: • 𝑆𝑆𝑡𝑜𝑡𝑎𝑙 = 𝑝 𝑗=1 • 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 = • 𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛 = 𝑛𝑗 ( 𝑥𝑖𝑗 𝑖=1 − 𝑥 )2 𝑝 𝑗=1 𝑛𝑗 𝑥𝑗 − 𝑛𝑗 𝑝 𝑗=1 𝑖=1( 𝑥𝑖𝑗 𝑥 2 − 𝑥𝑗 )2 • Here is the same table but now showing *Squared* Differences from the Grand Average: i 1 2 3 Red 1 4 9 16 Blue 2 4 0 1 Green 3 9 0 9 Colour j The sum of these squared differences. 𝑆𝑆𝑡𝑜𝑡𝑎𝑙 Reference Slide M. A. Sibley Consulting – All Rights Reserved HypothesisTest 52 19 Hypothesis Testing cont’d Reference Slide One Way ANOVA: Theory cont’d: • In a similar fashion we calculate 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 and 𝑆𝑆𝑡𝑜𝑡𝑎𝑙 . • Degrees of freedom (df) is the number of values in the final calculation of a statistic that are free to vary. • • So for our total sum of squares, if we know the sum then once we know n-1 of the values that make up the sum, then the last sum is known (i.e. can’t vary) so df = n-1 Knowing this we calculate Mean Squares (MS): 𝑆𝑆 𝑑𝑓 Reference Slide = 𝑀𝑆 = 𝑠 2 Source SS df MS Between 42 3-1=2 21.0 Within 10 9-3=6 1.67 Total 52 9-1=8 M. A. Sibley Consulting – All Rights Reserved HypothesisTest 20 Hypothesis Testing cont’d Reference Slide One Way ANOVA: Theory cont’d: • Variances follow a distribution know as ChiSquare • The ratio of 2 scaled Chi-Squared variables follows the F Distribution • For our ANOVA test since Mean Squares are variances we calculate our test statistic F: 𝑀𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 21.0 𝐹= = = 12.6 𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 1.67 • Looking up 12.6 in the F distribution with the right degrees of freedom gives us a value of 0.007 which is the P-Value for our ANOVA test. Reference Slide M. A. Sibley Consulting – All Rights Reserved HypothesisTest 21 Hypothesis Testing cont’d Reference Slide One Way ANOVA: Theory cont’d: • Here is Minitab’s output for this data: One-way ANOVA: Value versus Colour Source Colour Error Total DF 2 6 8 S = 1.291 Level Blue Green Red N 3 3 3 SS 42.00 10.00 52.00 MS 21.00 1.67 F 12.60 R-Sq = 80.77% Mean 14.000 15.000 10.000 StDev 1.000 1.732 1.000 P 0.007 P is low, the null must go. Statistically significant! Reject H0; accept Ha R-Sq(adj) = 74.36% Individual 95% CIs For Mean Based on Pooled StDev -------+---------+---------+---------+-(------*------) (------*------) (------*------) -------+---------+---------+---------+-10.0 12.5 15.0 17.5 Pooled StDev = 1.291 Reference Slide M. A. Sibley Consulting – All Rights Reserved HypothesisTest 22 Hypothesis Testing cont’d Reference Slide One Way ANOVA: Theory cont’d: • Here is Minitab’s output for this data: Individual Value Plot of Value vs Colour 16 15 Value 14 13 12 11 10 9 Red Blue Green Colour Reference Slide M. A. Sibley Consulting – All Rights Reserved HypothesisTest 23 Hypothesis Testing cont’d Exercise 5 minutes 1. Open the dataset: HOMECAR.MTW 2. Create a BoxPlot of Consumption vs Month. (You will have to create the variable Month from the date). 3. Perform a Hypothesis test to see if there is a statistically significant difference in Fuel Consumption vs Month. Note: you will need to use the Hypothesis Test equivalent of the BoxPlot, namely ANOVA (Analysis of Variance). Stat > ANOVA > One-Way If time permits, repeat the exercise for Quarter (of the year). M. A. Sibley Consulting – All Rights Reserved HypothesisTest 24 Hypothesis Testing cont’d • We create a new variable “Month” from Date M. A. Sibley Consulting – All Rights Reserved HypothesisTest 25 Hypothesis Testing cont’d • We perform a one way ANOVA: M. A. Sibley Consulting – All Rights Reserved HypothesisTest 26 Hypothesis Testing cont’d 10 Consumption(L/100Km) Boxplot output from the ANOVA shows a suggestive pattern vs month but it is still in the “grey area”. Boxplot of Consumption(L/100Km) 9 8 7 6 5 1 2 3 4 5 6 7 Month 8 9 10 11 12 ANOVA does not differentiate based on the order of the categories – i.e. our eye sees a non random seasonal pattern but ANOVA has the same output no matter what the ordering of the categories is. M. A. Sibley Consulting – All Rights Reserved HypothesisTest 27 Hypothesis Testing cont’d No obvious change in variance or curvature in the residuals. We look for problems in the residuals. Residual Plots for Consumption(L/100Km) Normal Probability Plot Versus Fits The distribution looks to be roughly normal 99 2 90 1 Residual Percent 99.9 50 10 1 0 -1 -2 0.1 -2 -1 0 Residual 1 2 7.00 7.25 7.50 7.75 Fitted Value Histogram Versus Order 2 15 Residual Frequency 20 10 5 0 8.00 1 0 -1 -2 -2.25 -1.50 -0.75 0.00 0.75 Residual 1.50 2.25 1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 Observation Order A random pattern vs order of observations (a good thing!). M. A. Sibley Consulting – All Rights Reserved HypothesisTest 28 Hypothesis Testing cont’d The ANOVA table in the session window: One-way ANOVA: Consumption(L/100Km) versus Month Source Month Error Total DF 11 126 137 S = 0.7474 SS 8.480 70.391 78.871 MS 0.771 0.559 R-Sq = 10.75% F 1.38 P 0.190 R-Sq(adj) = 2.96% P-Value is not below 0.05 so we cannot reject Ho i.e. we have no / insufficient evidence that there is a relationship between month and fuel consumption. M. A. Sibley Consulting – All Rights Reserved HypothesisTest 29 Hypothesis Testing cont’d Level 1 2 3 4 5 6 7 8 9 10 11 12 N 7 7 8 9 9 11 20 21 10 13 12 11 Mean 8.0276 7.7260 7.7382 7.5029 7.1770 7.2524 7.3601 7.2793 7.0942 7.5054 7.7771 7.6613 StDev 0.6886 0.5718 0.8514 0.5035 0.5295 0.8357 0.9416 0.8750 0.5938 0.4882 0.4345 0.9480 Individual 95% CIs For Mean Based on Pooled StDev -------+---------+---------+---------+-(-----------*----------) (-----------*----------) (----------*---------) (---------*---------) (---------*--------) (--------*--------) (-----*------) (------*-----) (--------*--------) (-------*-------) (--------*-------) (--------*--------) -------+---------+---------+---------+-7.00 7.50 8.00 8.50 The confidence intervals are overlapping Pooled StDev = 0.7474 M. A. Sibley Consulting – All Rights Reserved HypothesisTest 30 Hypothesis Testing cont’d • There was a pattern in the data that suggested a seasonal effect. If we fewer categories, but more data in each category, we might we a statistically significant effect. • We create a new variable “Quarter” in a similar fashion to how we created “Month”. M. A. Sibley Consulting – All Rights Reserved HypothesisTest 31 Hypothesis Testing cont’d Boxplot of Consumption(L/100Km) 10 Consumption(L/100Km) The graph is suggestive of a difference, but it is not completely clear cut, so we judge by the ANOVA table in the session window. 9 8 7 6 5 1 2 3 4 Quarter M. A. Sibley Consulting – All Rights Reserved HypothesisTest 32 Hypothesis Testing cont’d One-way ANOVA: Consumption(L/100Km) versus Source Quarter Error Total DF 3 134 137 S = 0.7344 SS 6.595 72.275 78.871 MS 2.198 0.539 R-Sq = 8.36% F 4.08 P 0.008 P-Value is below 0.05 so Quarter we reject Ho and accept Ha i.e. we have evidence that there is a relationship between quarter and fuel consumption. R-Sq(adj) = 6.31% This source of variation explains 8% of the variance in fuel consumption. Level 1 2 3 4 N 22 29 51 36 Mean 7.8264 7.3067 7.2747 7.6436 StDev 0.7002 0.6488 0.8462 0.6412 Individual 95% CIs For Mean Based on Pooled StDev The -----+---------+---------+---------+---confidence (---------*---------) intervals are (--------*--------) not completely (-----*------) (-------*-------) overlapping -----+---------+---------+---------+---7.20 7.50 7.80 8.10 The max. seasonal difference in fuel consumption is about 0.5 L/100Km M. A. Sibley Consulting – All Rights Reserved HypothesisTest 33
© Copyright 2025