Download Report

HYPOTHESIS TESTING:
• 2 Sample t-Test
• One Way ANOVA
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
1
Hypothesis Testing
• How do you sift through variables to
separate the “Vital Few” from the “Trivial
Many”?
• There are many tools:
–
–
–
–
–
–
The Cause & Effect (Fishbone) Diagram
The Cause & Effect Matrix
Failure Modes and Effect Analysis (FMEA)
Prior Knowledge
Graphical Analysis
Hypothesis Testing
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
2
Hypothesis Testing
• Why would we want to use Hypothesis
Testing?
• Hypothesis Testing takes a practical question like:
“Is the polymer viscosity higher with the new
solvent?” and frames it in statistical terms
• It quantifies the risk that your conclusion is right
or wrong so you can justify spending more time &
money on experiments, scale ups, etc.
• It reduces subjectivity in decision making
• There are many forms of hypothesis tests which
can answer questions related to differences in
means or variability, proportion of defects, counts
of occurrences, etc.
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
3
Hypothesis Testing cont’d
Population
Sample
Sampling
Scheme
Should select a
representative
sample.
Conclusions
about the
Population
Data
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
4
Hypothesis Testing cont’d
• Hypothesis Tests are statements
about Population Parameters based
on Sample Statistics
Population
Parameter
Sample
Statistic
π
s
p
Mean
Standard Deviation
Proportion
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
5
Hypothesis Testing cont’d
Hypothesis Testing Roadmap
1. Select the X and Y for your test. eg. Y = Breaking Strength;
X = Additive Presence
2. Find, or better still, run experiments to obtain data.
3. Graph.
4. If graphical analysis looks promising continue.
5. Write competing hypotheses: eg.
• Null Hypothesis (Ho) = additive does not affect breaking
strength
• Alternative Hypothesis (Ha) = additive does affect
breaking strength
6. Select appropriate statistical test eg. 2 Sample t-test.
7. Verify data is acceptable for the test eg. Data is normally
distributed at each level (eg. With or without additive).
8. Perform analysis.
9. Draw conclusions: eg. We have statistical evidence that the
presence of the Additive increases Breaking Strength.
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
6
Hypothesis Testing cont’d
• Statistical Significance:
• Ho is assumed; the burden of proof is on Ha.
• Look for strong evidence to reject Ho. Then
we accept Ha.
• Typically we look for 95% confidence that Ho
is false.
• Ha is sometimes called the “Research Claim”.
• 2 outcomes:
• Reject Ho and accept Ha.  statistically significant
• Fail to reject Ho.  not statistically significant
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
7
Hypothesis Testing cont’d
• Consider a Canadian Court of Law:
In everyday terms:
Conclusion (Verdict)
Innocent
Guilty
Innocent
Correct
Decision
Incorrect
Decision
Guilty
Incorrect
Decision
Correct
Decision
True State
In statistical terms:
Conclusion (Verdict)
Not Reject Ho
Reject Ho
Ho true
Correct
Decision
Type 1 (α)
Error
Ho false
Type 2 (β)
Error
Correct
Decision
True State
Which error
are we
more
tolerant of?
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
Even if there is
a statistically
significant
difference, it
may not be
practically
significant!
8
Hypothesis Testing cont’d
Risks:
• α Risk: conclude there is a difference
when there is not
• Make changes, investments that are not
needed
• β Risk: Conclude there is no
difference when there is a difference.
• Status quo; missed opportunity
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
9
Hypothesis Testing cont’d
• All statistical tests calculate a p-value
(Probability-value)
• Decision Rule:
• Reject Ho if the p-value is less than the “critical”
value you choose before hand. Ha is supported.
• If not less, cannot reject Ho; Ha is not supported.
• Critical values (Pcritical or α risk):
• Typically use 0.05
• For a critical safety system might choose 0.003
• For a marketing decision might choose 0.2
“If p is low the Null must go”
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
10
Hypothesis Testing cont’d
Major Hypothesis Tests
Y Type
X Type
# of
X’s
#
Subgroups
Continuous
Test
Discrete
1
1
1 Sample t-Test
Continuous
Discrete
1
2
2 Sample t-Test
Continuous
Discrete
1
2+
1 way ANOVA (ANOVA =
Analysis of Variance)
Continuous
Continuous
/Discrete
2+
2+
ANOVA GLM (GLM =
General Linear Model)
Continuous
Continuous
1
n/a
(Linear) Regression
Continuous
Continuous
2+
n/a
Multiple Regression
Discrete
Discrete
1
2+
π
Analysis of Proportions
Discrete or
Binary
Continuous
1+
2+
π
Binary Logistic Regression
Continuous
n/a
Test for Equal Variance
Highlighted tests will be shown in this presentation.
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
11
Hypothesis Testing cont’d
• Acceptable data:
• Representative
• No outliers, little auto-correlation (which is where next value is likely
to be similar to the previous value), no severe skewing (for most
tests), no distinct bi-modality, roughly bell shaped distribution
• Correctly measured, recorded
• Good measurement capability
• Total Variance = Process Variance + Measurement Variance
• If Measurement Variance is too high a proportion of the total
variance, then seeing the benefit of a process improvement can
be very difficult. This topic is called “Measurement Systems
Analysis” (MSA). The premier tool is Gauge R & R.
• Enough data to have a sensitive enough test
• “Power & Sample Size” (Stat > Power and Sample Size)
• More samples reduces the β risk – i.e. increases the likelihood of
seeing a difference if there really is one.
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
12
Hypothesis Testing cont’d
2 Sample t-Test:
• A sample drawn from a normal distribution follows a
“Student-t” distribution, which is a bell shaped distribution
like the normal distribution but with heavier tails. As the
number of samples increases the corresponding tdistribution becomes closer to normal.
• This distribution lets us draw conclusions about the range in
which the population mean is likely to be found.
• There are 2 types of 2 sample t-tests:
• 2 Sample – eg. Before / After a process change, Batches on
Night Shift vs Day Shift
• Paired – eg. Same sample analyzed on instrument A vs
instrument B.; hair coverage before and after use of Rogaine
by a group of individuals. This data could be analyzed by a
simple 2 sample t-test but it is then less “powerful”.
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
13
Hypothesis Testing cont’d
• Open the worksheet: 2 Sample t Test.MTW
• Perform a 2 Sample t-Test (Stat > Basic
Statistics > 2 Sample t…)
Output:
Two-sample T for Viscosity
vs New Air Dryer Installation
After
Before
N
20
23
Mean
63.04
62.02
StDev
1.76
1.93
SE Mean
0.39
0.40
Not
statistically
significant!
Difference = mu (After) - mu (Before)
Estimate for difference: 1.028
95% CI for difference: (-0.109, 2.165)
T-Test of difference = 0 (vs not =): T-Value = 1.83
M. A. Sibley Consulting – All Rights Reserved
End of
story?
P-Value = 0.075
HypothesisTest
DF = 40
14
Hypothesis Testing cont’d
• We look at the data graphically (which we should
have done beforehand!)
We see an “outlier”. The
comment for this point says
“Faulty sample valve operation;
water in sample” so we are
justified in removing this point
from the analysis.
Boxplot of Viscosity
67
66
65
Viscosity
64
63
62
61
60
59
58
After
Before
vs New Air Dryer Installation
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
15
Hypothesis Testing cont’d
• Output after removal (replacement
asterisk – Minitab’s missing data symbol):
Boxplot of Viscosity
Statistically significant!
Reject H0
67
66
65
Two-sample T for Viscosity
64
Viscosity
vs New Air Dryer Installation
N
Mean
StDev
After
20 63.04
1.76
Before
22 61.79
1.64
of data point by an
SE Mean
0.39
0.35
63
62
61
60
59
58
After
Difference = mu (After) - mu (Before)
Estimate for difference: 1.254
95% CI for difference: (0.191, 2.317)
T-Test of difference = 0 (vs not =): T-Value = 2.39
M. A. Sibley Consulting – All Rights Reserved
Before
vs New Air Dryer Installation
P-Value = 0.022
HypothesisTest
DF = 38
16
Hypothesis Testing cont’d
• We notice that the graph shows “After”
before “Before”! We can fix this:
• Right Click on the column:
After we click on
OK, we right click
on the graph and
select Update
Graph Now or we
run the analysis
which generated
it again.
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
17
Hypothesis Testing cont’d
One Way ANOVA:
ANOVA or Analysis of Variance is the hypothesis test equivalent of the
BoxPlot i.e. it is used to judge if a discrete X can explain variability in a Y.
Theory: Consider these 9 observations (Y) (3 per colour)
Colour (X)
i 
1
2
3
 j
Red
1
11
10
9
Blue
2
10
15
13
14
Green
3
16
13
16
14
15
Grand Avg (𝑥 ):
Row Avg (𝑥𝑗 )
13
We can see that if the numbers within a row are similar to the row
average but the row averages are quite different from each other, then
Colour might be a significant factor in explaining overall variability in the
Y. The ANOVA test uses this principle to calculate the p-Value that you
can use to judge if the factor is significant in explaining the variability.
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
18
Hypothesis Testing cont’d
Reference Slide
One Way ANOVA: Theory cont’d:
Here are the formulae:
• 𝑆𝑆𝑡𝑜𝑡𝑎𝑙 =
𝑝
𝑗=1
• 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 =
• 𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛 =
𝑛𝑗
( 𝑥𝑖𝑗
𝑖=1
− 𝑥 )2
𝑝
𝑗=1 𝑛𝑗 𝑥𝑗 −
𝑛𝑗
𝑝
𝑗=1 𝑖=1( 𝑥𝑖𝑗
𝑥
2
− 𝑥𝑗 )2
• Here is the same table but now showing *Squared*
Differences from the Grand Average:
i 
1
2
3
Red
1
4
9
16
Blue
2
4
0
1
Green
3
9
0
9
Colour
 j
The sum of these
squared differences.
𝑆𝑆𝑡𝑜𝑡𝑎𝑙
Reference Slide
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
52
19
Hypothesis Testing cont’d
Reference Slide
One Way ANOVA: Theory cont’d:
• In a similar fashion we calculate 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 and 𝑆𝑆𝑡𝑜𝑡𝑎𝑙 .
• Degrees of freedom (df) is the number of values in the
final calculation of a statistic that are free to vary.
•
•
So for our total sum of squares, if we know the sum then
once we know n-1 of the values that make up the sum, then
the last sum is known (i.e. can’t vary) so df = n-1
Knowing this we calculate Mean Squares (MS):
𝑆𝑆
𝑑𝑓
Reference Slide
= 𝑀𝑆 = 𝑠 2
Source
SS
df
MS
Between
42
3-1=2
21.0
Within
10
9-3=6
1.67
Total
52
9-1=8
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
20
Hypothesis Testing cont’d
Reference Slide
One Way ANOVA: Theory cont’d:
• Variances follow a distribution know as ChiSquare
• The ratio of 2 scaled Chi-Squared variables
follows the F Distribution
• For our ANOVA test since Mean Squares are
variances we calculate our test statistic F:
𝑀𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛
21.0
𝐹=
=
= 12.6
𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛
1.67
• Looking up 12.6 in the F distribution with the
right degrees of freedom gives us a value of
0.007 which is the P-Value for our ANOVA test.
Reference Slide
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
21
Hypothesis Testing cont’d
Reference Slide
One Way ANOVA: Theory cont’d:
•
Here is Minitab’s output for this data:
One-way ANOVA: Value versus Colour
Source
Colour
Error
Total
DF
2
6
8
S = 1.291
Level
Blue
Green
Red
N
3
3
3
SS
42.00
10.00
52.00
MS
21.00
1.67
F
12.60
R-Sq = 80.77%
Mean
14.000
15.000
10.000
StDev
1.000
1.732
1.000
P
0.007
P is low, the null must go.
Statistically significant!
Reject H0; accept Ha
R-Sq(adj) = 74.36%
Individual 95% CIs For Mean Based on
Pooled StDev
-------+---------+---------+---------+-(------*------)
(------*------)
(------*------)
-------+---------+---------+---------+-10.0
12.5
15.0
17.5
Pooled StDev = 1.291
Reference Slide
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
22
Hypothesis Testing cont’d
Reference Slide
One Way ANOVA: Theory cont’d:
• Here is Minitab’s output for this data:
Individual Value Plot of Value vs Colour
16
15
Value
14
13
12
11
10
9
Red
Blue
Green
Colour
Reference Slide
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
23
Hypothesis Testing cont’d
Exercise
5 minutes
1.
Open the dataset: HOMECAR.MTW
2. Create a BoxPlot of Consumption vs Month. (You will have to
create the variable Month from the date).
3. Perform a Hypothesis test to see if there is a statistically
significant difference in Fuel Consumption vs Month.
Note: you will need to use the Hypothesis Test equivalent of the
BoxPlot, namely ANOVA (Analysis of Variance).
Stat > ANOVA > One-Way
If time permits, repeat the exercise for Quarter (of the year).
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
24
Hypothesis Testing cont’d
• We create a new variable “Month” from Date
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
25
Hypothesis Testing cont’d
• We perform a one way ANOVA:
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
26
Hypothesis Testing cont’d
10
Consumption(L/100Km)
Boxplot output
from the
ANOVA shows
a suggestive
pattern vs
month but it is
still in the
“grey area”.
Boxplot of Consumption(L/100Km)
9
8
7
6
5
1
2
3
4
5
6
7
Month
8
9
10
11
12
ANOVA does not differentiate based on the order of the categories – i.e.
our eye sees a non random seasonal pattern but ANOVA has the same
output no matter what the ordering of the categories is.
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
27
Hypothesis Testing cont’d
No obvious change in variance or
curvature in the residuals.
We look for
problems in
the residuals.
Residual Plots for Consumption(L/100Km)
Normal Probability Plot
Versus Fits
The distribution
looks to be
roughly normal
99
2
90
1
Residual
Percent
99.9
50
10
1
0
-1
-2
0.1
-2
-1
0
Residual
1
2
7.00
7.25
7.50
7.75
Fitted Value
Histogram
Versus Order
2
15
Residual
Frequency
20
10
5
0
8.00
1
0
-1
-2
-2.25
-1.50
-0.75
0.00
0.75
Residual
1.50
2.25
1 10 20 30 40 50 60 70
80 90 100 110 120 130 140
Observation Order
A random pattern vs order of observations (a good thing!).
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
28
Hypothesis Testing cont’d
The ANOVA table in the session window:
One-way ANOVA: Consumption(L/100Km) versus Month
Source
Month
Error
Total
DF
11
126
137
S = 0.7474
SS
8.480
70.391
78.871
MS
0.771
0.559
R-Sq = 10.75%
F
1.38
P
0.190
R-Sq(adj) = 2.96%
P-Value is not below 0.05 so we cannot reject Ho i.e.
we have no / insufficient evidence that there is a
relationship between month and fuel consumption.
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
29
Hypothesis Testing cont’d
Level
1
2
3
4
5
6
7
8
9
10
11
12
N
7
7
8
9
9
11
20
21
10
13
12
11
Mean
8.0276
7.7260
7.7382
7.5029
7.1770
7.2524
7.3601
7.2793
7.0942
7.5054
7.7771
7.6613
StDev
0.6886
0.5718
0.8514
0.5035
0.5295
0.8357
0.9416
0.8750
0.5938
0.4882
0.4345
0.9480
Individual 95% CIs For Mean Based on
Pooled StDev
-------+---------+---------+---------+-(-----------*----------)
(-----------*----------)
(----------*---------)
(---------*---------)
(---------*--------)
(--------*--------)
(-----*------)
(------*-----)
(--------*--------)
(-------*-------)
(--------*-------)
(--------*--------)
-------+---------+---------+---------+-7.00
7.50
8.00
8.50
The confidence
intervals are
overlapping
Pooled StDev = 0.7474
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
30
Hypothesis Testing cont’d
• There was a pattern in the data that
suggested a seasonal effect. If we
fewer categories, but more data in
each category, we might we a
statistically significant effect.
• We create a new variable “Quarter” in
a similar fashion to how we created
“Month”.
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
31
Hypothesis Testing cont’d
Boxplot of Consumption(L/100Km)
10
Consumption(L/100Km)
The graph is
suggestive of
a difference,
but it is not
completely
clear cut, so
we judge by
the ANOVA
table in the
session
window.
9
8
7
6
5
1
2
3
4
Quarter
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
32
Hypothesis Testing cont’d
One-way ANOVA: Consumption(L/100Km) versus
Source
Quarter
Error
Total
DF
3
134
137
S = 0.7344
SS
6.595
72.275
78.871
MS
2.198
0.539
R-Sq = 8.36%
F
4.08
P
0.008
P-Value is below 0.05 so
Quarter
we reject Ho and accept Ha
i.e. we have evidence that
there is a relationship
between quarter and fuel
consumption.
R-Sq(adj) = 6.31%
This source of variation explains 8% of the variance in fuel consumption.
Level
1
2
3
4
N
22
29
51
36
Mean
7.8264
7.3067
7.2747
7.6436
StDev
0.7002
0.6488
0.8462
0.6412
Individual 95% CIs For Mean Based on
Pooled StDev
The
-----+---------+---------+---------+---confidence
(---------*---------)
intervals are
(--------*--------)
not completely
(-----*------)
(-------*-------)
overlapping
-----+---------+---------+---------+---7.20
7.50
7.80
8.10
The max. seasonal difference in fuel consumption is about 0.5 L/100Km
M. A. Sibley Consulting – All Rights Reserved
HypothesisTest
33