Study Design I. Sample Size Consideration Tuan V. Nguyen

Study Design I.
Sample Size Consideration
Tuan V. Nguyen
Garvan Institute of Medical Research
Sydney, Australia
The
The classical
classical hypothesis
hypothesis testing
testing
• Define a null hypothesis:
– Ho: Xt = Xc
• Define an alternative hypothesis:
– Ha: Xt > Xc , Xt < Xc , Xt not equal to Xc.
• Perform a test of significance on the null hypothesis.
– Assume that the null hypothesis is true.
– Determine the probability of obtaining the observations
found in the data.
• Accept or reject Ho
– If the Ho is rejected, the alternative hypothesis is accepted.
– But there are many alternative hypotheses!
Diagnosis
Diagnosis and
and statistical
statistical reasoning
reasoning
Disease status
Present
Absent
Significance Difference is
Present
Absent
(Ho not true)
Test result
+ve
True +ve
False +ve
Test result
Reject Ho No error
1-β
True -ve
Accept Ho
(sensitivity)
-ve
False -ve
(Specificity)
(Ho is true)
Type I err.
α
Type II err. No error
β
1−α
α : significance level
1-β : power
Study
Study Design
Design Issues
Issues
•
•
•
•
Setting
Participants: inclusion / exclusion criteria
Design: cross-sectional, longitudinal
Measurements: outcome, covariates / risk
factors
• Analysis
• Sample size / power issues
Sample
Sample size
size issues
issues
• How many observations / subjects?
– Practical and statistical issues
– Ethical issues
• Ethical issues in clinical studies
– Unnecessarily large number of patients may be
deemed unethical
– Too small a sample may also be unethical as
the study can’t show anything.
Large
Large difference
difference vs
vs Statistical
Statistical significance
significance
Status
Improved
Group A Group B
Status
9
18
Improved
No improved
21
12
Total
30
30
Chi-square: 5.4; P < 0.05
“Statistically significant”
Group A Group B
6
12
No improved
14
8
Total
20
20
Chi-square: 3.3; P > 0.05
“Statistically insignificant”
Effect of sample size: a simulation
True mean: 100
True SD: 15
True mean: 100
True SD: 35
Sample size
Est. M SD
Est. M SD
10
50
100
200
500
1000
2000
10000
100000
98.0
100.4
101.3
99.9
99.8
99.5
99.7
100.1
100.0
108.9
95.3
99.1
100.3
98.9
99.9
99.9
99.9
100.0
11.0
13.6
14.4
15.2
15.3
15.1
15.0
15.0
15.0
32.2
41.4
35.5
33.2
33.8
35.0
34.7
35.0
35.0
Specification
Specification for
for sample
sample size
size determination
determination
•
•
•
•
Parameter of major interest
Magnitude of difference in the parameter
Variability of the parameter
Bound of errors (type I and type II error
rates)
Parameter
Parameter of
of Interest
Interest
• Type of measurement of primary interest:
– Continuous or categorical outcome
• Examples:
– Mortality: proportion (or probability) of death/survival
– Blood pressure: difference in BP in mmHg
– Quality of life: change in QoL scores
Variability
Variability of
of the
the Parameter
Parameter of
of Interest
Interest
• If the parameter is a continuous variable:
– What is the standard deviation (SD) ?
• If the parameter is a categorical variable:
– SD can be estimated from the proportion/probability.
Magnitude
Magnitude of
of Difference
Difference of
of Interest
Interest
• Distinction between clinical and statistical relevance.
• Change from baseline or difference between groups.
• Examples:
– Probability of survival: 85% vs 80%
– Blood pressure: difference between groups by 1 SD.
– Quality of life: difference in the change in QoL
between groups by 5%.
0.95
0.95
Z2
-1.96
0
1.96
0.025
0
0.025
Prob.
0.80
0.90
0.95
0.99
Z1
0.84
1.28
1.64
2.33
Z2
1.28
1.64
1.96
2.81
Z1
1.64
0.05
Alpha
c
0.20
0.10
0.05
0.01
Zα
(One-sided)
0.84
1.28
1.64
2.33
Zα/2
(Two-sided)
1.28
1.64
1.96
2.81
Power
0.80
0.90
0.95
0.99
Z1−β
0.84
1.28
1.64
2.33
• The serum cholesterol levels of
Californian children have a mean of
175 mg/100ml and a standard
deviation of 30 mg/100ml. The
distribution of the cholesterol levels
is normal.
116
• 95% of the children should have
cholesterol levels ranged between
175 + (1.96x30) = 116 and 234
mg/100ml.
175
234
• If we let X be the chol. level for any
child, then X can be converted to a
variable with mean=0 and SD=1:
Z = (X – 175) / 30
mg/100l
Z
-1.96
Abnormal?
0
1.96
Abnormal?
Study
Study design
design and
and Outcome
Outcome
• Single population
• Two populations
• Continuous measurement
• Categorical outcome
• Correlation
Single
Single Population
Population
Sample
Sample size
size for
for estimating
estimating aa population
population mean
mean
• How close to the true mean
• Confidence around the sample mean
• Type I error.
• N = (Zα/2)2 σ2 / d2
σ: standard deviation
d: the accuracy of estimate (how
close to the true mean).
Zα/2: A Normal deviate reflects the
type I error.
• Example: we want to estimate the
average weight in a population,
and we want the error of
estimation to be less than 2 kg of
the true mean, with a probability
of 95% (e.g., error rate of 5%).
• N = (1.96)2 σ2 / 22
Sample size
96
138
188
246
311
384
400
350
300
Sample size
Std Dev (σ)
10
12
14
16
18
20
450
250
200
150
100
50
0
0
5
10
15
Standard deviation
20
25
Sample
Sample size
size for
for estimating
estimating aa population
population proportion
proportion
• How close to the true proportion
• Confidence around the sample
proportion.
• Type I error.
• N = (Zα/2)2 p(1-p) / d2
p: proportion to be estimated.
d: the accuracy of estimate (how
close to the true proportion).
Zα/2: A Normal deviate reflects the
type I error.
• Example: The prevalence of
osteoporosis in the general
population is around 30%. We
want to estimate the prevalence p
in a community within 2% with
95% confidence interval.
• N = (1.96)2 (0.3)(0.7) /
0.022 = 2017 subjects.
• N = (1.96)2 (0.3)(0.7) /
0.022 = 2017 subjects.
2500
2000
Sample size
• Example: The prevalence of
osteoporosis in the general
population is around 30%. We
want to estimate the prevalence p
in a community within 2% with
95% confidence interval.
1500
1000
500
0
0
0.02
0.04
0.06
Standard deviation
0.08
0.1
Sample
Sample size
size for
for estimating
estimating aa correlation
correlation coeffcient
coeffcient
• In observational studies which involve estimate a correlation (r) between two
variables of interest, say, X and Y, a typical hypothesis is of the form:
– Ho: r = 0
vs
H1: r not equal to 0.
• The test statistic is of the Fisher's z transformation, which can be written as:
1
é1 + r ù
t = log e ê
n−3
ú
2
ë1 − r û
• Where n is the sample size and r is the observed correlation coefficient.
• It can be shown that t is normally distributed with mean 0 and unit variance,
and the sample size to detect a statistical significance of t can be derived as:
(Z
α + Z1− β )
+3
N=
1é
1
+
r
æ
öù
log
ç
÷ú
e
4 êë
1
−
r
è
øû
2
Sample
Sample size
size for
for estimating
estimating r:
r: example
example
• Example: According to the literature, the correlation between salt intake and
systolic blood pressure is around 0.3. A study is conducted to test the
correlation in a population, with the significance level of 1% and power of
90%. The sample size for such a study can be estimated as follows:
2
(
2.33 + 1.28)
N=
+ 3 = 87
1é
æ 1 + 0.3 öù
log
÷
eç
4 êë
è 1 − 0.3 øúû
• A sample size of at least 87 subjects is required for the study.
Sample
Sample size
size for
for difference
difference between
between two
two means
means
• Hypotheses:
Ho: µ1 = µ2 vs. Ha: µ1 = µ2 + d
• Let n1 and n2 be the sample sizes
for group 1 and 2, respectively; N
= n1 + n2 ; r = n1 / n2 ; σ:
standard deviation of the variable
of interest.
• Then, the total sample size is
given by:
2
æ
ö 2
(r + 1)ç Z + Z
÷ σ
1− β ø
è α
N=
rd 2
Where Zα and Z1-β are Normal deviates
• If we let Z = d / σ be the “effect
size”, then:
ö
(r + 1)æç Z + Z
÷
1
α
−
β
è
ø
N=
rZ 2
2
• If n1 = n2 , power = 0.80, alpha
= 0.05, then (Zα + Z1-β)2 = (1.96
+ 1.28)2 = 10.5, then the
equation is reduced to:
N=
21
Z2
Two
Two Populations
Populations
Sample
Sample size
size for
for two
two means
means vs.“effect
vs.“effect size”
size”
Total sample size (N)
2400
2000
1600
1200
800
400
0
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Effect size (d / s)
For a power of 80%, significance level of 5%
2
Sample
Sample size
size for
for difference
difference between
between 22 proportions
proportions
• Hypotheses:
Ho: π1 = π2 vs. Ha: π1 = π2 + d .
• Let p1 and p2 be the sample proportions (e.g. estimates of π1 and π2) for
group 1 and group 2. Then, the sample size to test the hypothesis is:
(
Z
n=
α
2 p(1 − p ) + Z
p (1 − p ) p (1 − p
( p − p )2
1− β
1
1
2
1
2
2
))2
Where: n = sample size for each group ; p = (p1 + p2) / 2 ; Zα and Z1-β are
Normal deviates
A better (more conservative) suggestion for sample size is:
ù
4
né
n = ê1 + 1 +
ú
4ë
np −p û
a
1
2
2
Sample
Sample size
size for
for difference
difference between
between 22 prevalence
prevalence
• For most diseases, the prevalence in the general population is small (e.g. 1
per 1000 subjects). Therefore, a difference formulation is required.
• Let p1 and p2 be the prevalence for population 1 and population 2. Then, the
sample size to test the hypothesis is:
n=
(Z
[0.00061(arcsin
α
+ Z1− β )2
p1 − arcsin p2 )
2
]
Where: n = sample size for each group; Zα and Z1-β are
Normal deviates.
Sample
Sample size
size for
for two
two proportions:
proportions: example
example
• Example: In a condition, the remission rate is expected to be 70% for a new
treatment, and 60% for a conventional treatment. A trial is planned to show
the difference at the significance level of 1% and power of 90%.
• The sample size can be calculated as follows:
– p1 = 0.6; p2 = 0.7; p = (0.6 + 0.7)/2 = 0.65; Z0.01 = 2.81; Z1−0.9 = 1.28.
– The sample size required for each group should be:
(
2.81
n=
2
2 × 0.65 × 0.35 + 1.28 0.6 × 0.4 + 0.7 × 0.3 )
≈ 759
2
(0.6 − 0.7 )
• Adjusted / conservative sample size is:
2
ù
4
759 é
n =
ê1 + 1 +
ú = 836
4 ë
759 0.6 − 0.7 û
a
Sample
Sample size
size for
for two
two proportions
proportions vs.
vs. effect
effect size
size
Difference from p1 by:
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
424
625
759
825
825
759
625
424
131
173
198
206
198
173
131
73
67
82
89
89
82
67
45
.
41
47
50
47
41
31
.
.
28
30
30
28
22
.
.
.
19
20
19
17
.
.
.
.
14
14
13
.
.
.
.
.
10
9
8
.
.
.
.
.
P1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Note: these values are “unadjusted” sample sizes
Sample
Sample size
size for
for estimating
estimating an
an odds
odds ratio
ratio
• In case-control study the data are usually summarized by an odds ratio (OR),
rather then difference between two proportions.
• If p1 and p2 are the proportions of cases and controls, respectively, exposed
to a risk factor, then:
p (1 − p2 )
OR = 1
p2 (1 − p1 )
• If we know the proportion of exposure in the general population (p), the total
sample size N for estimating an OR is:
(1 + r )2 (Zα + Z1− β )2
N=
2
r (ln OR ) p(1 − p )
• Where r = n1 / n2 is the ratio of sample sizes for group 1 and group2; p
is the prevalence of exposure in the controls; and OR is the hypothetical
odds ratio. If n1 = n2 (so that r = 1) then the fomula is reduced to:
4(Zα + Z1− β )
2
N=
(ln OR )2 p(1 − p )
Sample
Sample size
size for
for an
an odds
odds ratio:
ratio: example
example
• Example: The prevalence of vertebral fracture in a population is 25%. It is
interested to estimate the effect of smoking on the fracture, with an odds
ratio of 2, at the significance level of 5% (one-sided test) and power of 80%.
• The total sample size for the study can be estimated by:
4(1.64 + 0.85)
= 275
N=
2
(ln 2) × 0.25 × 0.75
2
Sample
Sample size
size for
for 22 correlation
correlation coefficients
coefficients
• In detecting a relevant difference between two correlation coefficients r1
and r2 obtained from two independent samples of sizes n1 and n2,
respectively, we need to firstly transform these coefficients into z value as
follows:
æ 1 + r1 ö
÷÷
z1 = 0.5 log e çç
è 1 − r1 ø
æ 1 + r2 ö
÷÷
z 2 = 0.5 log e çç
è 1 − r2 ø
• The total sample size N required to detect the difference between two
correlation coefficients r1 and r2, with a significance level of α and power 1β, can be estimated by:
2
4(Zα + Z1− β )
N=
( z1 − z2 )2
Where Zα and Z1-β are Normal deviates
Sample
Sample size
size for
for two
two r’s:
r’s: example
example
• The sample size required to detect the difference between r1 = 0.8 and r2 =
0.4 with the significance level of 5% (two-tailed) and power of 80% can be
solved as follows:
– z1 = 0.5 ln ((1+0.4) / (1-0.4)) = 0.424
– z1 = 0.5 ln ((1+0.8) / (1-0.8)) = 1.098
4(1.96 + 1.28)
N=
2 = 92
(0.424 − 1.098)
2
• 46 subjects is needed in each group.
Some
Some Comments
Comments
•
•
•
•
•
The formulae presented are theoretical.
They are all based on the assumption of Normal distribution.
The estimator [of sample size] has its own variability.
The calculated sample size is only an approximation.
Non-response must be allowed for in the calculation.
Computer
Computer Programs
Programs
• Software program for sample size and power evaluation
– PS (Power and Sample size), from Vanderbilt Medical Center. This can
be obtained from me by sending email to (t.nguyen@garvan.org.au or
t.v.nguyen@unsw.edu.au). Free.
• On-line calculator:
– http://ebook.stat.ucla.edu/calculators/powercalc/
• References:
– Florey CD. Sample size for beginners. BMJ 1993 May 1;306(6886):1181-4
– Day SJ, Graham DF. Sample size and power for comparing two or more treatment groups in
clinical trials. BMJ 1989 Sep 9;299(6700):663-5.
– Miller DK, Homan SM. Graphical aid for determining power of clinical trials involving two groups.
BMJ 1988 Sep 10;297(6649):672-6
– Campbell MJ, Julious SA, Altman DG. Estimating sample sizes for binary, ordered
categorical, and continuous outcomes in two group comparisons. BMJ 1995 Oct
28;311(7013):1145-8.
– Sahai H, Khurshid A. Formulae and tables for the determination of sample sizes and power in clinical trials
for testing differences in proportions for the two-sample design: a review. Stat Med 1996 Jan 15;15(1):1-21.
– Kieser M, Hauschke D. Approximate sample sizes for testing hypotheses about the ratio and difference of
two means. J Biopharm Stat 1999 Nov;9(4):641-50.