Power Curves for t-test α = .05, X1, . . . , Xn ∼ N (µ, 1) One-Sample t-Test 1.0 n=20 0.0 0.2 data: rnorm(20, mean = 2, sd = 2) t = 2.384, df = 19, p-value = 0.02771 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 0.1558692 2.3985594 sample estimates: mean of x 1.277214 0.6 One Sample t-test 0.4 power 0.8 > t.test(rnorm(20,mean=2,sd=2)) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 2.0 2.5 3.0 2.0 2.5 3.0 mu power 0.2 One Sample t-test 0.4 > t.test(rnorm(20,mean=0,sd=2)) 0.6 0.8 1.0 n=10 0.0 data: rnorm(20, mean = 0, sd = 2) t = -0.9111, df = 19, p-value = 0.3737 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: -1.5884003 0.6249483 sample estimates: mean of x -0.481726 0.0 0.5 1.0 1.5 mu 0.6 0.0 0.2 > 0.4 power 0.8 1.0 n=5 0.0 0.5 1.0 30 1.5 mu 31 Shoe Wear > library(MASS) > shoes $A [1] 13.2 8.2 10.9 14.3 10.7 6.6 9.5 10.8 $B [1] 14.0 8.8 11.2 14.2 11.8 6.4 9.8 11.3 > layout(rbind(c(1,2),c(1,3))) > boxplot(list(A=shoes$A, B=shoes$B), notch=T, horizontal=T, boxwex=.5) > qqnorm(shoes$A); qqline(shoes$A) > qqnorm(shoes$B); qqline(shoes$B) Paired t-Test 8.8 13.3 > t.test(shoes$A, shoes$B, paired=T) 9.3 13.6 Paired t-test data: shoes$A and shoes$B t = -3.3489, df = 9, p-value = 0.008539 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.6869539 -0.1330461 sample estimates: mean of the differences -0.41 12 10 > t.test(shoes$A-shoes$B) 8 B Sample Quantiles 14 Normal Q−Q Plot One Sample t-test −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 data: shoes$A - shoes$B t = -3.3489, df = 9, p-value = 0.008539 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: -0.6869539 -0.1330461 sample estimates: mean of x -0.41 Theoretical Quantiles 12 10 8 A Sample Quantiles 14 Normal Q−Q Plot > 8 10 12 14 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 Theoretical Quantiles 32 33 Non-normality of Traffic Data Paired vs. Unpaired > t.test(shoes$A, shoes$B, paired=T) Paired t-test data: shoes$A and shoes$B t = -3.3489, df = 9, p-value = 0.008539 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.6869539 -0.1330461 sample estimates: mean of the differences -0.41 Normal Q−Q Plot # bad idea: using paired=F Normal Q−Q Plot 30 > t.test(shoes$A, shoes$B) > library(MASS) > attach(Traffic) > qqnorm(y[limit=="yes"]); qqline(y[limit=="yes"]) > qqnorm(y[limit=="no"]); qqline(y[limit=="no"]) > shapiro.test(y[limit=="yes"]) Shapiro-Wilk normality test data: y[limit == "yes"] W = 0.9213, p-value = 0.0003330 > shapiro.test(y[limit=="no"]) Shapiro-Wilk normality test data: y[limit == "no"] W = 0.9516, p-value = 0.0003928 > 20 15 Sample Quantiles 5 −2 > 10 20 15 10 5 data: shoes$A and shoes$B t = -0.3689, df = 17.987, p-value = 0.7165 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.745046 1.925046 sample estimates: mean of x mean of y 10.63 11.04 Sample Quantiles 25 25 Welch Two Sample t-test −1 0 1 Theoretical Quantiles 2 −3 −2 −1 0 1 2 3 Theoretical Quantiles 35 34 Two-Sample t-Test Normality (?) of Log Traffic Data > ly.yes <- log(y[limit=="yes"]) > ly.no <- log(y[limit=="no"]) > t.test(ly.yes, ly.no) Welch Two Sample t-test data: ly.yes and ly.no t = -3.2954, df = 147.673, p-value = 0.001231 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.31472006 -0.07876154 sample estimates: mean of x mean of y 2.866770 3.063511 > var.test(ly.yes, ly.no) F test to compare two variances data: ly.yes and ly.no F = 0.9262, num df = 68, denom df = 114, p-value = 0.7389 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.6109935 1.4372433 sample estimates: ratio of variances 0.9262063 > t.test(ly.yes, ly.no, var.equal=T) Two Sample t-test data: ly.yes and ly.no t = -3.2638, df = 182, p-value = 0.001313 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.31567636 -0.07780524 sample estimates: mean of x mean of y 2.866770 3.063511 > > qqnorm(log(y[limit=="yes"])) > qqnorm(log(y[limit=="no"])) > shapiro.test(log(y[limit=="yes"])) Shapiro-Wilk normality test data: log(y[limit == "yes"]) W = 0.9832, p-value = 0.4814 > shapiro.test(log(y[limit=="no"])) Shapiro-Wilk normality test data: log(y[limit == "no"]) W = 0.9868, p-value = 0.3235 > Normal Q−Q Plot 3.0 2.5 2.0 Sample Quantiles 1.0 1.5 2.5 2.0 1.5 1.0 Sample Quantiles 3.0 Normal Q−Q Plot −2 −1 0 1 Theoretical Quantiles 2 −3 −2 −1 0 1 2 3 Theoretical Quantiles 36 37 Approximate Z-Tests in R Wilcoxon Signed-Rank Test Instead of log-transforming traffic data, let’s do an approximate Z-test on the untransformed data: • For one sample: • Same statistic, so use same test function: > x <- c(8.5,8.6,6.4,12.1,8.2,7.4,7.8,8.3,10.3,8.4) > wilcox.test(x, mu=10, conf.int=T) Wilcoxon signed rank test data: x V = 8, p-value = 0.04883 alternative hypothesis: true mu is not equal to 10 95 percent confidence interval: 7.50 9.95 sample estimates: (pseudo)median 8.35 > t.test(y[limit=="yes"],y[limit=="no"]) Welch Two Sample t-test data: y[limit == "yes"] and y[limit == "no"] t = -3.3995, df = 165.545, p-value = 0.000846 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -6.666816 -1.767967 sample estimates: mean of x mean of y 18.91304 23.13043 • For paired data: > wilcox.test(shoes$A, shoes$B, paired=T) Wilcoxon signed rank test with continuity correction data: shoes$A and shoes$B V = 3, p-value = 0.01431 alternative hypothesis: true mu is not equal to 0 Warning message: Cannot compute exact p-value with ties in: wilcox.test.default(shoes$A, shoes$B, paired = T) > wilcox.test(shoes$A-shoes$B) Wilcoxon signed rank test with continuity correction data: shoes$A - shoes$B V = 3, p-value = 0.01431 alternative hypothesis: true mu is not equal to 0 Warning message: Cannot compute exact p-value with ties in: wilcox.test.default(shoes$A - shoes$B) > • If you want to be picky, use p-value based on Z instead of t(165.545): > 2*pt(-3.3995,df=165.545,lower.tail=T) [1] 0.0008459695 > 2*pnorm(-3.3995,lower.tail=T) [1] 0.0006750918 > but this is all so approximate anyway, why not take the more conservative t-based p-value? Recall that “exact” p-value for log-transformed data assuming normality and equal variances was p = 0.001313. 39 38 Why I Love the Wilcoxon Test 1.5 1.5 2.0 2.5 3.0 0.0 0.5 0.6 t mu 2.0 2.5 3.0 power 0.5 1.0 1.0 0.8 0.6 power 0.4 0.2 0.0 0.5 1.0 1.5 mu 2.0 0.5 1.0 1.0 w 0.8 2.0 2.5 2.5 3.0 0.6 3.0 w w w w wt wt w t w t w t 0.0 t t t w t t w w w w t t t t t t t t t t 0.5 1.0 1.5 w w 2.0 2.5 w w w w w w w w w w t t t t t t t t t t w w w t 1.5 w 2.0 w t 1.0 w w w Mixture of 90% N(mu,sd=1) and 10% N(mu,sd=15), n=20 t 0.5 2.0 t 1.0 0.0 wt t n=2 t t wt w mu t t wt t−test (theoretical under normality) t−test (simulated) Wilcoxon (simulated) mu t t wt 1.5 w 0.4 power 1.5 t 0.10 t t t t w w w w w w w w w w w 0.0 t 0.0 0.4 power t 0.2 1.5 0.0 t wt wt w w w w w w w w w t 0.25 t t t t t t t w w w w w w w w w w w 0.20 1.0 0.8 0.6 0.4 t t t t 0.0 wt t−Dist Sample, df=2, n=20 1.0 3.0 t 0.2 power 2.5 t t 0.0 2.0 wt mu t n=3 t wt 0.0 t 0.6 0.2 1.5 t t 3.0 0.2 w 1.0 2.5 w 0.0 t t w w 2.0 wt wt w power t w w mu t 1.0 1.5 t t w n=4 0.5 1.0 wt wt n=5 t w t w t t w w 0.4 t mu 0.0 0.5 0.8 1.0 0.8 t 0.6 power 0.2 0.0 0.0 wt 1.0 0.0 wt wt wt mu t 0.4 1.0 0.8 0.6 0.4 wt 0.2 power t w 0.5 3.0 n=6 wt wt wt wt wt t w 0.0 2.5 mu n=7 t w 2.0 0.0 1.0 wt t w 0.8 0.5 mu wt 0.8 0.0 t w 0.0 wt wt 0.6 3.0 power 2.5 wt t w 0.4 2.0 wt wt 0.2 1.5 wt t w 0.15 1.0 0.05 0.0 0.0 0.5 t w t w 0.2 0.2 t w t w wt 0.0 power t w wt t w 0.6 0.6 power t w Normal Sample, sd=1, n=20 w w w w t wt t t t t w 0.4 0.8 t w t w t w t t w t w w 0.4 0.8 0.6 0.4 w t 0.2 power wt n=8 1.0 n=9 1.0 1.0 n=10 t wt wt wt wt wt wt w Why I Love It Even More 3.0 mu w wt 0.0 w t w t t t t t 0.5 t t t t 1.0 1.5 2.0 mu 40 41 Power of Shapiro Test Two-Sample Wilcoxon Test > wilcox.test(y.yes,y.no) Wilcoxon rank sum test with continuity correction data: y.yes and y.no W = 2878, p-value = 0.001830 alternative hypothesis: true mu is not equal to 0 > wilcox.test(log(y.yes),log(y.no)) Wilcoxon rank sum test with continuity correction data: log(y.yes) and log(y.no) W = 2878, p-value = 0.001830 alternative hypothesis: true mu is not equal to 0 > > rejects <- 0 > for (i in 1:10000) { + x <- rnorm(30, mean=45, sd=20) + if (shapiro.test(x)$p.value < 0.05) rejects <- rejects + 1 + } > rejects [1] 530 > rejects <- 0 > for (i in 1:10000) { + x <- rgamma(30, shape=2, rate=20) + if (shapiro.test(x)$p.value < 0.05) rejects <- rejects + 1 + } > rejects [1] 7513 > rejects <- 0 > for (i in 1:10000) { + x <- 30+45*rt(30, df=4) + if (shapiro.test(x)$p.value < 0.05) rejects <- rejects + 1 + } > rejects [1] 3249 > 43 42 Power of Shapiro Test Multigroup Location Tests 100 Normal Sample (Shapiro p=0.24) 60 40 20 Sample Quantiles 80 > names(PlantGrowth) [1] "weight" "group" > levels(PlantGrowth$group) [1] "ctrl" "trt1" "trt2" > oneway.test(weight ~ group, data=PlantGrowth) 0 One-way analysis of means (not assuming equal variances) −2 −1 0 1 data: weight and group F = 5.181, num df = 2.000, denom df = 17.128, p-value = 0.01739 2 Theoretical Quantiles > oneway.test(weight ~ group, data=PlantGrowth, var.equal=T) 0.5 Gamma Sample (Shapiro p=0.000063) 0.2 0.3 data: weight and group F = 4.8461, num df = 2, denom df = 27, p-value = 0.01591 > summary(aov(weight ~ group, data=PlantGrowth)) Df Sum Sq Mean Sq F value Pr(>F) group 2 3.7663 1.8832 4.8461 0.01591 * Residuals 27 10.4921 0.3886 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > kruskal.test(weight ~ group, data=PlantGrowth) 0.1 Sample Quantiles 0.4 One-way analysis of means −2 −1 0 1 2 Theoretical Quantiles 50 Kruskal-Wallis rank sum test −50 0 data: weight by group Kruskal-Wallis chi-squared = 7.9882, df = 2, p-value = 0.01842 > −100 Sample Quantiles 100 t Sample (Shapiro p=0.097) −2 −1 0 Theoretical Quantiles 1 2 44 45 Testing Two-Way Contingencies Testing Two-Way Contingencies > table(matlab,stat404) stat404 matlab - n y 0 5 9 17 1 0 6 5 2 1 1 1 3 0 0 2 > chisq.test(table(matlab,stat404)) > table(r,stat404) stat404 r - n y 0 2 4 0 1 3 4 1 2 0 7 14 3 1 1 10 > chisq.test(table(r,stat404)) Pearson’s Chi-squared test Pearson’s Chi-squared test data: table(matlab, stat404) X-squared = 6.3824, df = 6, p-value = 0.3817 data: table(r, stat404) X-squared = 21.9431, df = 6, p-value = 0.00124 Warning message: Chi-squared approximation may be incorrect in: chisq.test(table(matlab, stat404)) > fisher.test(table(matlab,stat404)) Warning message: Chi-squared approximation may be incorrect in: chisq.test(table(r, stat404)) > fisher.test(table(r,stat404)) Fisher’s Exact Test for Count Data Fisher’s Exact Test for Count Data data: table(matlab, stat404) p-value = 0.3786 alternative hypothesis: two.sided data: table(r, stat404) p-value = 0.0001139 alternative hypothesis: two.sided > > 47 46 Bigger Example > library(MASS) > names(Melanoma) [1] "time" "status" "sex" [6] "thickness" "ulcer" > attach(Melanoma) > table(sex,ulcer) ulcer sex 0 1 0 79 47 1 36 43 > chisq.test(table(sex,ulcer)) For Unpaired (Independent) Samples "age" "year" Independent, iid samples: X1 , . . . , X20 ∼ N (µX , 1) Y1, . . . , Y20 ∼ N (µY , 1) At α = 0.05, Pearson’s Chi-squared test with Yates’ continuity correction Simulated Power for Unpaired Samples (n=20) 1.0 data: table(sex, ulcer) X-squared = 5.1099, df = 1, p-value = 0.02379 0.8 u w p 0.6 power data: table(sex, ulcer) p-value = 0.02061 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 1.089871 3.700065 sample estimates: odds ratio 2.000701 p u w p u w p u w u p w u w p 0.4 Fisher’s Exact Test for Count Data u w p u w p u w p u w p u w p 0.2 > fisher.test(table(sex,ulcer)) u w p p u w paired (wrong!) unpaired (okay) Welch (okay) p u w p u w 0.0 p u w p u w 0.5 1.0 1.5 difference in means > 48 49 For Paired Data Summary of Tests (X1, Y1 ), . . . , (X20 , Y20 ) iid bivariate normal. Location tests: At α = 0.05, • t.test: t-tests for one sample, paired, two-sample (equal variance and Welch’s approximation). Can also be used for approximate Z-tests. • Positive correlation (usual) ρ = +0.9 Simulated Power for Paired Data (n=20) p p p 0.8 p u w p u w p u w p u w p u w p u w p u w p u w u w p 0.4 power p p u w 0.0 p p u w u w p u w u w u w u w u w 0.0 0.5 paired (okay) unpaired (wrong!) Welch (wrong!) 1.0 1.5 difference in means 1.0 power 0.6 0.2 u w u w p 0.0 u w p u w p u w p u w p u w p p u w u w p p p u w 1.0 u w p u w p u w p u w p p p p 0.5 u w • kruskal.test: Kruskal-Wallis rank-sum test for difference in means of two or more samples. • chisq.test: χ2 -test for one-way and interaction in two-way contingency tables. Simulated Power for Paired Data (n=20) u w • oneway.test: F -test for difference in means of two or more samples (equal variance and Welch’s approximation). Categorical data tests: • Negative correlation (unusual) ρ = −0.9 u w • wilcox.test: Wilcoxon signed-rank test for one sample or paired data, Wilcoxon/Mann-Whitney rank-sum test for two samples. paired (okay) unpaired (wrong!) Welch (wrong!) 1.5 • fisher.test: Fisher’s exact test for interaction in two-way contingency tables. Other tests: • shapiro.test: Shapiro-Wilk test for H0 : sample is normal. • var.test: F -test for H0 : samples have equal variances. difference in means 50 51
© Copyright 2024