Sample Size and Power

Introduction to Biostatistics, Harvard Extension School
Sample Size and Power
© Scott Evans, Ph.D.
1
Introduction to Biostatistics, Harvard Extension School
Sample Size Considerations
A pharmaceutical company calls and
says, “We believe we have found a
cure for the common cold. How
many patients do I need to study to
get our product approved by the
FDA?”
© Scott Evans, Ph.D.
2
1
Introduction to Biostatistics, Harvard Extension School
Where to begin?
N = (Total Budget / Cost per patient)?
Hopefully not!
© Scott Evans, Ph.D.
3
Introduction to Biostatistics, Harvard Extension School
Where to begin?
ƒ Understand the research question
ƒ Learn about the application and the problem.
ƒ Learn about the disease and the medicine.
ƒ Crystal Ball
ƒ Visualize the final analysis and the statistical
methods to be used.
© Scott Evans, Ph.D.
4
2
Introduction to Biostatistics, Harvard Extension School
Where to begin?
ƒ Analysis determines sample size.
Sample size calculations are based
upon the planned method of
analysis.
ƒ If you don’t know how the data will be
analyzed (e.g., 2-sample t-test), then
you cannot accurately estimate the
sample size.
© Scott Evans, Ph.D.
5
Introduction to Biostatistics, Harvard Extension School
Sample Size Calculation
ƒ Formulate a PRIMARY research
question.
ƒ Identify:
1. A hypothesis to test (write down H0
and HA), or
2. A quantity to estimate (e.g., using
confidence intervals)
© Scott Evans, Ph.D.
6
3
Introduction to Biostatistics, Harvard Extension School
Sample Size Calculation
ƒ Determine the endpoint or outcome
measure associated with the hypothesis
test or quantity to be estimated.
ƒ How do we “measure” or “quantify” the
responses?
ƒ Is the measure continuous, binary, or a timeto-event?
© Scott Evans, Ph.D.
7
Introduction to Biostatistics, Harvard Extension School
Sample Size Calculation
ƒ Based upon the PRIMARY outcome
ƒ Other analyses (i.e., secondary
outcomes) may be planned, but the
study may not be powered to detect
effects for these outcomes.
© Scott Evans, Ph.D.
8
4
Introduction to Biostatistics, Harvard Extension School
Sample Size Calculation
ƒ Two strategies
ƒ Hypothesis Testing
ƒ Estimation with Precision
© Scott Evans, Ph.D.
9
Introduction to Biostatistics, Harvard Extension School
Sample Size Calculation Using
Hypothesis Testing
ƒ
By far, the most common approach.
ƒ
The idea is to choose a sample size such that
both of the following conditions simultaneously
hold:
ƒ
If the null hypothesis is true, then the probability of
incorrectly rejecting is (no more than) α
ƒ
If the alternative hypothesis is true, then the
probability of correctly rejecting is (at least) 1-β =
power.
© Scott Evans, Ph.D.
10
5
Introduction to Biostatistics, Harvard Extension School
Reality
Test
Result
Ho True
Ho False
Reject Ho
Type I error
(α)
Power
(1-β)
Do not
reject Ho
1-α
Type II error
(β)
© Scott Evans, Ph.D.
11
Introduction to Biostatistics, Harvard Extension School
Determinants of Sample Size:
Hypothesis Testing Approach
ƒ α
ƒ β
ƒ An “effect size” to detect
ƒ Estimates of variability
© Scott Evans, Ph.D.
12
6
Introduction to Biostatistics, Harvard Extension School
What is Needed to Determine
the Sample-Size?
ƒ α
ƒ Up to the investigator or FDA regulation
(often = 0.05)
ƒ How much type I (false positive) error
can you afford?
© Scott Evans, Ph.D.
13
Introduction to Biostatistics, Harvard Extension School
What is Needed to Determine the
Sample-Size?
ƒ 1-β (power)
ƒ Up to the investigator (often 80%-90%)
ƒ How much type II (false negative) error
can you afford?
ƒ Not regulated by FDA
© Scott Evans, Ph.D.
14
7
Introduction to Biostatistics, Harvard Extension School
Choosing α and β
ƒ Weigh the cost of a Type I error versus a Type II
error.
ƒ In early phase clinical trials, we often do not want to
“miss” a significant result and thus often consider
designing a study for higher power (perhaps 90%) and
may consider relaxing the α error (perhaps 0.10).
ƒ In order to approve a new drug, the FDA requires
significance in two Phase III trials strictly designed
with α error no greater than 0.05 (Power = 1-β is often
set to 80%).
© Scott Evans, Ph.D.
15
Introduction to Biostatistics, Harvard Extension School
Effect Size
ƒ The “minimum difference (between groups) that
is clinically relevant or meaningful”.
ƒ Not readily apparent
ƒ Requires clinical input
ƒ Often difficult to agree upon
ƒ Note for noninferiority studies, we identify the
“maximum irrelevant or non-meaningful difference”.
© Scott Evans, Ph.D.
16
8
Introduction to Biostatistics, Harvard Extension School
Estimates of Variability
ƒ Often obtained from prior studies
ƒ Explore the literature and data from ongoing
studies for estimates needed in calculations
ƒ Consider conducting a pilot study to
estimate this
ƒ May need to validate this estimate later
© Scott Evans, Ph.D.
17
Introduction to Biostatistics, Harvard Extension School
Other Considerations
ƒ 1-sample vs. 2-sample
ƒ Independent samples or paired
ƒ 1-sided vs. 2-sided
© Scott Evans, Ph.D.
18
9
Introduction to Biostatistics, Harvard Extension School
Example: Cluster Headaches
ƒ
A experimental drug is being compared with placebo for
the treatment of cluster headaches.
ƒ
The design of the study is to randomize an equal number of
participants to the new drug and placebo.
ƒ
The participants will be administered the drug or matching
placebo. One hour later, the participants will score their
pain using the visual analog scale (VAS) for pain.
ƒ
A continuous measure ranging from 0 (no pain) to 10 (severe
pain).
© Scott Evans, Ph.D.
19
Introduction to Biostatistics, Harvard Extension School
Example: Cluster Headaches
ƒ The planned analysis is a 2-sample ttest (independent groups) comparing
the mean VAS score between
groups, one hour after drug (or
placebo) initiation
ƒ H0: μ1=μ2 vs. HA: μ1≠μ2
© Scott Evans, Ph.D.
20
10
Introduction to Biostatistics, Harvard Extension School
Example: Cluster Headaches
ƒ It is desirable to detect differences as small as 2
units (on the VAS scale).
ƒ Using α=0.05 and β=0.80, and an assumed
standard deviation (SD) of responses of 4 (in both
groups), 63 participants per group (126 total) are
required.
ƒ
STATA Command: sampsi 0 2, sd(4) a(0.05) p(.80)
ƒ Note: you just need a difference of 2 in the first two numbers
ƒ
http://newton.stat.ubc.ca/~rollin/stats/ssize/n2.html
© Scott Evans, Ph.D.
21
Introduction to Biostatistics, Harvard Extension School
Example: Part 2
ƒ Let’s say that instead of measuring
pain on a continuous scale using the
VAS, we simply measured
“response” (i.e., the headache is
gone) vs. non-response.
© Scott Evans, Ph.D.
22
11
Introduction to Biostatistics, Harvard Extension School
Example: Part 2
ƒ The planned analysis is a 2-sample
test (independent groups) comparing
the proportion of responders, one
hour after drug (or placebo) initiation
ƒ H0: p1=p2 vs. HA: p1≠p2
© Scott Evans, Ph.D.
23
Introduction to Biostatistics, Harvard Extension School
Example: Part 2
ƒ It is desirable to detect a difference in
response rates of 25% and 50%.
ƒ Using α=0.05 and β=0.80,
ƒ STATA Command: sampsi 0.25 0.50, a(0.05) p(.80)
ƒ 66 per group (132 total) w/ continuity correction
ƒ http://newton.stat.ubc.ca/~rollin/stats/ssize/b2.html
ƒ 58 per group (116 total) without continuity correction
© Scott Evans, Ph.D.
24
12
Introduction to Biostatistics, Harvard Extension School
Notes for Testing Proportions
ƒ One does not need to specify a variability
since it is determined from the proportion.
ƒ The required sample size for detecting a
difference between 0.25 and 0.50 is
different from the required sample size for
detecting a difference between 0.70 and
0.95 (even though both are 0.25
differences) because the variability is
different.
ƒ This is not the case for means.
© Scott Evans, Ph.D.
25
Introduction to Biostatistics, Harvard Extension School
Caution for Testing Proportions
ƒ Some software computes the sample size for
testing the null hypothesis of the equality of two
proportions using a “continuity correction” while
others calculate sample size without this
correction.
ƒ Answers will differ slightly, although either
method is acceptable.
ƒ STATA uses a continuity correction
ƒ The website does not
© Scott Evans, Ph.D.
26
13
Introduction to Biostatistics, Harvard Extension School
Sample Size Calculation Using
Estimation with Precision
ƒ Not nearly as common, but equally as
valid.
ƒ The idea is to estimate a parameter with
enough “precision” to be meaningful.
ƒ E.g., the width of a confidence interval is
narrow enough
© Scott Evans, Ph.D.
27
Introduction to Biostatistics, Harvard Extension School
Determinants of Sample Size:
Estimation Approach
ƒ α
ƒ Estimates of variability
ƒ Precision
ƒ E.g., The (maximum) desired width of a
confidence interval
© Scott Evans, Ph.D.
28
14
Introduction to Biostatistics, Harvard Extension School
Example: Evaluating a Diagnostic
Examination
ƒ It is desirable to estimate the sensitivity
of an examination by trained site nurses
relative to an oral medicine specialist for
the diagnosis of Oral Candidiasis (OC) in
HIV-infected people.
ƒ Precision: It is desirable to estimate the
sensitivity such that the width of a 95%
confidence interval is 15%.
© Scott Evans, Ph.D.
29
Introduction to Biostatistics, Harvard Extension School
Example: Evaluating a Diagnostic
Examination
ƒ Note: sensitivity is a proportion
ƒ The (large sample) CI for a proportion is:
⎡
⎢
⎢
⎢
⎣
pˆ −za/ 2
ˆp(1− pˆ) ˆ
ˆp(1− pˆ) ⎤⎥
, p+za/ 2
,⎥
n
n ⎥⎦
© Scott Evans, Ph.D.
30
15
Introduction to Biostatistics, Harvard Extension School
Example: Evaluating a Diagnostic
Examination
ƒ We wish the width of the CI to be <0.15
ƒ Using an estimated proportion of 0.25 and
α=0.05, we can calculate n=129.
ƒ Since sensitivity is a conditional probability, we
need 129 that are OC+ as diagnosed by the oral
health specialist. If the prevalence of OC is
~20%, then we would need to enroll or screen
~129/(0.20)=645.
© Scott Evans, Ph.D.
31
Introduction to Biostatistics, Harvard Extension School
Sensitivity Analyses
ƒ Sample size calculations require assumptions
and estimates.
ƒ It is prudent to investigate how sensitive the
sample size estimates are to changes in these
assumptions (as they may be inaccurate).
ƒ Thus, provide numbers for a range of scenarios
and various combinations of parameters (e.g., for
various values combinations of α, β, estimates of
variance, effect sizes, etc.)
© Scott Evans, Ph.D.
32
16
Introduction to Biostatistics, Harvard Extension School
Example: Sample Size Sensitivity Analyses
for the Study of Cluster Headaches
μ1
μ2
SD Power=80% Power=90%
0
2
3.5
49
65
0
2
4.0
63
85
0
2
4.5
80
107
0
3
3.5
22
29
0
3
4.0
28
38
0
3
4.5
36
48
© Scott Evans, Ph.D.
33
Introduction to Biostatistics, Harvard Extension School
Effects of Determinants
ƒ In general, the following increases the required
sample size (with all else being equal):
ƒ Lower α
ƒ Lower β
ƒ Higher variability
ƒ Smaller effect size to detect
ƒ More precision required
© Scott Evans, Ph.D.
34
17
Introduction to Biostatistics, Harvard Extension School
Caution
ƒ In general, higher sample size implies
higher power.
ƒ Does this mean that a higher sample size
is always better?
ƒ Not necessarily. Studies can be very
costly. It is wasteful to power studies to
detect between-group differences that are
clinically irrelevant.
© Scott Evans, Ph.D.
35
Introduction to Biostatistics, Harvard Extension School
Sample Size Adjustments
ƒ Complications (e.g., loss-to-follow-up, poor
adherence, etc.) during clinical trials can impact
study power.
ƒ This may be less of a factor in lab experiments.
ƒ Expect these complications and plan for them
BEFORE the study begins.
ƒ Adjust the sample size estimates to account for these
complications.
© Scott Evans, Ph.D.
36
18
Introduction to Biostatistics, Harvard Extension School
Complications that Decrease
Power
ƒ
Missing data
ƒ
Poor Adherence
ƒ
Multiple tests
ƒ
Unequal group sizes
ƒ
Use of nonparametric testing (vs. parametric)
ƒ
Noninferiority or equivalence trials (vs. superiority trials)
ƒ
Inadvertent enrollment of ineligible subjects or subjects that
cannot respond
© Scott Evans, Ph.D.
37
Introduction to Biostatistics, Harvard Extension School
Adjustment for Lost-to-Follow-up
ƒ
Loss-to-Follow-Up (LFU) refers to when a participants
endpoint status is not available (missing data).
ƒ
If one assumes that the LFU is non-informative or
ignorable (i.e., random and not related to treatment), then
a simple sample size adjustment can be made.
ƒ
This is a very strong assumption as LFU is often associated
with treatment. The assumption is further difficult to
validate.
ƒ
Researchers need to consider the potential bias of examining
only subjects with non-missing data.
© Scott Evans, Ph.D.
38
19
Introduction to Biostatistics, Harvard Extension School
Adjustment for Lost-to-Follow-up
ƒ Calculate the sample size N.
ƒ Let x=proportion expected to be lost-to-followup.
ƒ Nadj=N/(1-x)
ƒ Note: no LFU adjustment is necessary if you
plan to impute missing values. However, if you
use imputation, an adjustment for a “dilution
effect” may be warranted.
© Scott Evans, Ph.D.
39
Introduction to Biostatistics, Harvard Extension School
Adjustment for Poor Adherence
ƒ Adjustment for the “dilution effect” due to
poor adherence or the inclusion (perhaps
inadvertently) of subjects that cannot
respond:
ƒ Calculate the sample size N.
ƒ Let x=proportion expected to be non-adherent.
ƒ Nadj=N/(1-x)2
© Scott Evans, Ph.D.
40
20
Introduction to Biostatistics, Harvard Extension School
Inflation Factor for Non-adherence
Proportion nonAdherent
0.05 0.10 0.20 0.30 0.50
Inflation Factor
1.11 1.23 1.56 2.04 4.00
© Scott Evans, Ph.D.
41
Introduction to Biostatistics, Harvard Extension School
Adjustment for Unequal Allocation
ƒ When comparing groups, power is
maximized when groups sizes are equal
(with all else being equal)
ƒ There may be other reasons however, to
have some group sizes larger than others
ƒ E.g., having more people on an experimental
therapy (rather than placebo) to obtain more
safety information of the product
© Scott Evans, Ph.D.
42
21
Introduction to Biostatistics, Harvard Extension School
Adjustment for Unequal Allocation
ƒ Adjustment for unequal allocation in two groups:
ƒ Let QE and QC be the sample fractions such that
QE+QC=1.
ƒ Note power is optimized when QE=QC=0.5
ƒ Calculate sample size Nbal for equal sample sizes (i.e.,
QE=QC=0.5)
ƒ Nunbal=Nbal ((QE-1 +QC-1)/4)
© Scott Evans, Ph.D.
43
Introduction to Biostatistics, Harvard Extension School
Adjustment for Nonparametric Testing
ƒ Most sample-size calculations are performed
expecting use of parametric methods (e.g., ttest).
ƒ This is often done because formulas (and software) for
these methods are readily available
ƒ However, parametric assumptions (e.g.,
normality) do not always hold.
ƒ Thus nonparametric methods may be required.
© Scott Evans, Ph.D.
44
22
Introduction to Biostatistics, Harvard Extension School
Adjustment for Nonparametric Testing
ƒ Pitman Efficiency
ƒ Applicable for 1 and 2 sample t-tests
ƒ Method
ƒ Calculate sample size Npar.
ƒ Nnonpar = Npar /(0.864)
© Scott Evans, Ph.D.
45
Introduction to Biostatistics, Harvard Extension School
Example: Cluster Headaches
ƒ
Recall the cluster headache example in which the required
sample size was 126 (total) for detecting a 2 unit (VAS
scale) difference in means.
ƒ
If we expect 10% of the participants to be non-adherent
then an appropriate inflation is needed
ƒ
ƒ
126/(1-0.1)2=156
If we further expect that we will have to perform a
nonparametric test (instead of a t-test) due to nonnormality, then further inflation is required:
ƒ
ƒ
156/(0.864)=181
Round to 182 to have an equal number (81) in each group
© Scott Evans, Ph.D.
46
23
Introduction to Biostatistics, Harvard Extension School
Adjustment: Noninferiority/Equivalence
Studies
ƒ Calculate sample size for standard
superiority trial but reverse the roles
of α and β.
ƒ Works for large sample binary and
continuous data.
ƒ Does not work for time-to-event data.
© Scott Evans, Ph.D.
47
Introduction to Biostatistics, Harvard Extension School
More Adjustments?
ƒ Adjustments are needed if:
ƒ You plan interim analyses
ƒ Group sequential designs
ƒ You have more than one primary test to be conducted
ƒ Multiple comparison adjustments
ƒ
E.g., Bonferroni (if 2 tests or comparisons are to be made,
then power each at α/2.
ƒ Additional adjustments may be needed for
stratification, blocking, or matching.
© Scott Evans, Ph.D.
48
24
Introduction to Biostatistics, Harvard Extension School
Sample Size Re-estimation
ƒ Hot Topic in clinical trials
ƒ Re-estimating sample size based on
interim data
ƒ Complicated
ƒ Must be done carefully to maintain scientific
integrity and blinding.
© Scott Evans, Ph.D.
49
25