Download Report

Sample Size Determination
for Clinical Trials with
Two Correlated Time-to-Event
Co-primary Endpoints
The 7th
IASC-ARS Joint
Taipei Symposium
2011
Academia Sinica,
Taipei, Taiwan,
December 16-20,
2011
Toshimitsu Hamasaki, PhD
Osaka University Graduate School of Medicine
Scott Evans, PhD
Harvard University School of Public Health
Tomoyuki Sugimoto, PhD
Hirosaki University Graduate School of Mathematical Science
Takashi Sozu, PhD
Kyoto University School of Public Health
This research is financially supported by the following research grants from the MEXT Grant-in-Aid for Scientific Research (C)
(No. 23500348), Pfizer Health Research Foundation, Japan and Statistical and Data Management Center of the Adult AIDS
Clinical Trials Group grant 1 U01 068634
1. Introduction
Background and Objectives
3
Clinical Trials with Multiple Endpoints
Background
z
z
In clinical trials, historically, a single outcome is selected as the
primary endpoint and is used as the basis for the trial design including
sample size determination, as well as for interim monitoring and final
analyses.
Many recent clinical trials become more complex, utilizing more than
one primary endpoints
z
z
z
z
Oncology
E1: Time until clinical progression
E2: Time to death
Prevention of Mother-to-Child HIV/Hepatitis B Transmission
E1: Time to infant HIV infection
E2: Time to Hepatitis B infection
Cardiovascular Disease Therapy
E1: Time until the first of MI, Stroke, or death
E2: Time until hospitalization or death
The rationale for this is that the assessment of a an intervention using a
single endpoint may not provide a comprehensive picture of the
intervention’s effects.
4
Strategies for Multiple Endpoints
Background
T1) significance on all
endpoints being sufficient for
proof of effect
z Each hypothesis should be
rejected at the same significance
level
z No adjustment is needed to
control type I error
z Type II error increases as the
number of outcomes to be tested
increases
z “Multiple Co-Primary Endpoints”
(Hung, Wang, 2009)
T2) significance on at least one
endpoint being sufficient for
proof of effect with a
prespecified ordering or nonordering of outcomes
z Type I error increases as the
number of outcomes to be tested
increases
z An adjustment to control type I
error is required
Hung HMJ, Wang SJ (2009). J Biopharm Statist 19, 1-11.
5
Arising Natural Questions
Background
z How large a sample should be for T1 and T2?
z Is there any considerable overestimation or underestimation in the
sample size when the correlation is ignored?
z Is there any considerable reduction or increase in the sample size when
the correlation is taken account into the sample size calculation ?
6
Our Research Focus
Objectives
z To discuss the power and sample size determination for superiority
comparative clinical trials with two possibly correlated time-to-events
endpoints to be evaluated as primary variables for the design and
analysis, with paying more attention to T1
z To consider a simpler approach that assumes that the time-to-event
outcomes are exponentially distributed
z Sugimoto et al (2011) discuss an approach to sizing clinical trials with
two correlated time-to-event outcomes based on the log-rank statistics.
z
Implementing the method requires technical knowledge, sophisticated
programming skill, and expensive computations
z We will focus on hazard ratio : results of difference in hazard rates are
very similar as seen in those of hazard ratios
Sugimoto T, Hamasaki T, Sozu T (2011). In Abstract of the 7th International Conference on
Multiple Comparison Procedure, 121, Washington DC, USA, August 29-September 1, 2011.
7
Co-Primary Endpoints Sample Sizing
Related Research
All Continuous Normal Endpoints
Xiong et al (2005, Controlled Clinical Trials), Sozu et al (2006, Japanese Journal of
Biometric Scoiety), Eaton, Muirhead (2007, Journal of Statistical Planning and
Inference), Senn, Bretz (2007, Pharmaeutical Statistics), Hung, Wang (2009, Journal of
Biopharmaceutical Statistics); Sozu, Sugimoto, Hamasaki (2010, Statistics in
Medicine; 2011, Journal of Biopharmaceutical Statistics); Sugimoto, Sozu,
Hamasaki (2011, Pharmaceutical Statistics); Kordzakhia, Siddiqui, Huque (2010,
Statistics in Medicine)
All Binary Endpoints
Song (2009, Computational Statistics and Data Analysis), Sozu, Sugimoto, Hamasaki
(2010, 2011), Hamasaki, Evans (2011, presented at 2011 Symposium on Applied
Statistics)
All Time-to-Event Endpoints
Sugimoto, Hamasaki, Sozu (2011, presented at MPC2011)
Mixed Endpoints
Sozu, Sugimoto, Hamasaki (2010, presented at IBC2010, mixed continuous and
binary endpoints), Sugimoto, Sozu, Hamasaki (2011, presented at MPC2011, mixed
binary and time-to-event endpoints)
8
Outline
1. Background and Objectives
2. Comparing log-transformed Hazard ratios (HR) from Two Correlated
Exponential Time-to-Event Endpoints
z Statistical Settings
z Conjunctive Power and Sample Size Calculation
Without Censoring/Limited Recruitment and Censoring
3. Behaviors of Sample Size and Empirical Power
z Bivariate Exponential Distributions
Clayton Copula/Positive Stable Copula/Fatal-Shock Model
4. Further Developments
5. Summary
* Result for difference in hazard rates is available.
2. Required Sample Size to Compare
Hazard Ratio from Two Correlated
Exponential Time-to-Event Endpoints
Statistical Setting
Conjunctive Power and Sample Size Calculation
10
Statistical Settings
Trial Design, Endpoints Distribution
Total
Sample
Size
Time-to-Event
Endpoint 1
Test
Treatment
TT1i ∼ Exp(λT1 )
nT = rN
N
N = nT + nC
TT2i ∼ Exp(λT2 )
corr[TT1i , TT2i ] = ρ T > 0
nT : nC = r :1 − r
Control
Treatment
Endpoint 2
TC1j ∼ Exp(λC1 )
nC = (1 − r ) N
TC2j ∼ Exp(λC2 )
corr[TC1j , TC2j ] = ρC > 0
z Randomized, control, superiority clinical trials for two treatment
comparison with two time to event endpoints
z TTik , TCjk follow the exponential distribution with constant hazard rates
λTk , λCk (k = 1, 2; i = 1,… , nT ; j = 1,… , nC )
11
Statistical Settings
Distribution of log Hazard Ratio (HR)
Assumption
z Participants are followed until the event of interest
z No participant is lost to follow-up
Distributions for large sample
z log-transformed hazard rates Æ Approximately normal-distributed
(
(
⎧ log λˆTk ∼ N log λTk , nT−1
⎪
approx
⎨
−1
⎪⎩ log λˆCk ∼ N log λCk , nC
approx
z
)
)
log-transformed hazard ratioÆ Approximately normal-distributed
(
∼ N ( logψ
logψˆ k = log λˆTk − log λˆCk
logψˆ1 ∼ N logψ 1 , nT−1 + nC−1
logψ k = log λTk − log λCk
logψˆ 2
approx
approx
2
)
)
, nT−1 + nC−1
Collett D (2003). Modelling Survival Data in Medical Research. 2nd Edition. Chapman & Hall
Gross AJ, Clark VA.(1975). Survival Distributions John Wiley & Sons.
12
Statistical Setting
Joint Distribution of log HRs
Joint distribution of the two log-transformed HRs for large sample
(logψˆ1 , logψˆ 2 ) ∼
N 2 (μ , Σ)
approx
⎛ σ 12 σ 12 ⎞
⎛ logψ 1 ⎞
μ =⎜
⎟ Σ=⎜
2 ⎟
log
ψ
σ
σ
2⎠
⎝
2 ⎠
⎝ 21
⎧ 2 1 ⎛1
1 ⎞
σ
=
+
k = k′
⎪ k
⎜
⎟
N ⎝ r 1− r ⎠
⎪
⎨
⎪σ = 1 ⎛ ρT + ρ C ⎞ k ≠ k ′
⎪⎩ kk ′ N ⎜⎝ r 1 − r ⎟⎠
Correlation between the two log-transformed HRs for large sample
ρ HR = corr ⎡⎣logψˆ1 , logψˆ 2 ⎤⎦
≈ r ρ T + (1 − r ) ρ C
ρ HR = ρ
Common correlation
ρ = ρT = ρC
Continuous Endpoints
mean difference ρ D =
Binary Endpoints
ρ RD
risk difference
relative risk
ρ RR
ρ
≤ρ
≤ρ
13
Statistical Setting
Hypothesis, Statistics and Rejection Region
Hypothesis for a joint significance
− zα
Z2
− zα
⎧ H1 : logψ 1 < 0 and logψ 2 < 0
⎨
logψ 2 ≥ 0
⎩ H 0 : logψ 1 ≥ 0 or
Test statistics for hypothesis
Rejection Region of
Z k = logψˆ k
H0
⎡⎣{Z1 < − zα } ∩ {Z 2 < − zα }⎤⎦
1
N
1 ⎞
⎛1
+
⎜ r 1− r ⎟
⎝
⎠
Significant level for hypothesis
testing α
is the upper α th percent point
of the standard normal
distribution
zα
Z1
14
Overall Power and Sample Size
Without Censoring
Sample size
Overall power for showing a joint
statistical significance
⎡2
⎤
1 − β = Pr ⎢∩ {Z k < − zα }⎥
⎣ k =1
⎦
N NC
⎡2
⎤
≈ Pr ⎢∩ Z k* > ck ⎥
⎣ k =1
⎦
{
Z k* =
− logψˆ k + logψ k
1
N
1 ⎞
⎛1
⎜ r + 1− r ⎟
⎝
⎠
}
ck = zα +
if N is an interger
⎧⎪ N
=⎨
⎪⎩[ N ] + 1 otherwise
N is the smallest value
z
satisfying the overall power
logψ k
1
N
1− β
1 ⎞
⎛1
⎜ r + 1− r ⎟
⎝
⎠
“Conjunctive Power” or
“Complete Power” (Senn, Bretz, 2007)
Senn S, Bretz F (2007). Pharm Statist 6, 161-170.
Φ 2 ( −c1 , −c2 ρ HR )
Distribution function
of standard bivariate
normal distribution
z
[ N ] is the greatest integer
less than N
15
Asymptotic Variance for HR
Limited Recruitment and Censoring
T0
0
Recruitment period
T
Follow-up period
T − T0
z Participants are recruited for study over an interval zero to T0
z All recruited participants are followed to time of the terminal event
or time to T (T > T0 )
Asymptotic variance of log-transformed HR for large sample
⎧ 1 ⎛1
1 ⎞
+
⎪
⎜ r 1− r ⎟
N
φ
λ
(
)
⎝
⎠
k
⎪
var ⎡⎣logψˆ k ⎤⎦ ≈ ⎨
⎞
1
⎪1 ⎛ 1
+
⎪ N ⎜ rφ (λ ) (1 − r )φ (λ ) ⎟
Tk
Ck ⎠
⎩ ⎝
λk = r λTk + (1 − r )λCk φ (λk ) = 1 −
Homogeneous variance
Null hypothesis
heterogeneous variance
Alterative hypothesis
exp ( −λk T + λk T0 ) − exp ( −λk T )
λk T0
16
Conjunctive Power and Sample Size
Limited Recruitment and Censoring
Over power for showing a joint
statistical significance
1 − β = Φ 2 ( −c1 , −c2 ρ HR )
⎛
⎜ zα
⎜
ck = ⎝
⎞
1 ⎛1
1 ⎞
log
+
+
ψ
k ⎟
⎜
⎟
⎟
Nφ (λk ) ⎝ r 1 − r ⎠
⎠
1
N
r
if N is an interger
⎧⎪ N
=⎨
⎪⎩[ N ] + 1 otherwise
+
λC2k
1 ⎞
⎛1
≥ λk2 ⎜ +
⎟
1− r
⎝ r 1− r ⎠
logψ k
ck′ = zα +
1
N
⎛ 1
⎞
1
+
⎜
⎟
(
)
(1
)
(
)
−
r
φ
λ
r
φ
λ
T
C
k
k
⎝
⎠
Sample size
N CN
λT2k
⎛ 1
⎞
1
+
⎜
⎟
(
)
(1
)
(
)
φ
λ
φ
λ
r
r
−
Tk
Ck ⎠
⎝
Simplified Sample size
N
*
CN
if N is an interger
⎧⎪ N
=⎨
⎪⎩[ N ] + 1 otherwise
Æ Improving the approximation
17
Conjunctive Power
Limited Recruitment and Censoring
Conjunctive Power
z The overall power increases as
the correlation toward one.
z The lowest overall power is
when the correlation is zero and
the two hazard ratios are equal,
with equal hazard rates between
control groups
ψ 2 = 0.50
ψ 2 = 0.556
0.80
0.75
ψ 2 = 0.625
0.70
ψ 2 = 0.667
0.65
0.60
0.0
0.2
0.4
0.6
0.8
Corrrelation
T0 = 2.0 T = 5.0
ψ 1 = 0.667 λC1 = 0.5 λC2 = 0.5
α = 0.025 1 − β = 0.8 r = 0.5
1.0
3. Behaviors of Sample Size and
Empirical Power
Bivariate Exponential Distributions
Sample Size Behavior
Empirical Power for Log-Rank Test
19
Models for Correlation
Bivariate Exponential Distributions
1. Clayton Copula Model (Clayton, 1976)
S 0 (u , v;θ ) = (u −θ + v −θ − 1) −1 θ
z
z
0 ≤θ
Times are positively associated 0 ≤
Late dependency
θ:
Association Parameter
θ:
Association Parameter
ρ <1
2. Positive Stable Copula Model (Hougaard, 1984)
S 0 (u , v;θ ) = exp[−{(− log u )1 θ + (− log v)1 θ }θ ] 0 ≤ θ ≤ 1
z
z
Times are positively associated 0 ≤
Early dependency
ρ <1
3. Fatal-Shock Model/Marshall-Olkin’s Model (Marshall-Olkin, 1967)
⎧exp{−θ1u − (θ 2 + θ12 )v} 0 ≤ u ≤ v
⎩exp{−(θ1 + θ12 )u − θ 2 v} 0 ≤ v ≤ u
S 0 (u , v; λ12 ) = ⎨
z
z
The range is restricted 0 ≤
Linear dependency
θ1 , θ 2 , θ12 : Hazard Parameter
ρ < min ( λ1 λ2 , λ2 λ1 )
Clayton DG.(1976). Biometrika 65, 141-151.
Hougaard P.(1984). Biometrika 71, 75-83
Marshall AW, Olkin I (1967). J Amer Statist Assoc 62, 30-44
20
Relationship between Two Endpoints
Bivariate Exponential Distributions
8.0
ρ = 0.3
ρ = 0.0
ρ = 0.5
ρ = 0.95
ρ = 0.8
TIME 2
Clayton
6.0
4.0
2.0
6.0
TIME 2
Positive
Stable
0.0
8.0
4.0
2.0
20
40
60
80
4.0
6.0
8.0
6.0
TIME 2
Fatal-Shock
0.0
8.0 0 0
4.0
2.0
0.0
0.0
2.0
TIME 1
0.0
2.0
4.0
TIME 1
6.0
8.0 0.0
2.0
4.0
TIME 1
6.0
8.0
0.0
2.0
4.0
TIME 1
6.0
8.0
0.0
2.0
4.0
6.0
8.0
TIME 1
λT1 λC1 = λT2 λC2
21
Sample Size Behavior
Limited Recruitment and Censoring
Total sample size required
550
ψ 1 = 0.667 ψ 2 = 0.667
ψ 1 = 0.667 ψ 2 = 0.625
ψ 1 = 0.667 ψ 2 = 0.50
*
N CN
N CN
500
450
400
T0 = 2.0 T = 5.0
λC1 = 0.5 λC2 = 0.5
α = 0.025 1 − β = 0.8 r = 0.5
350
0.0
0.2
0.4
0.6
Correlation
0.8
1.0 0.0
0.2
0.4
0.6
Correlation
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Correlation
z All of the sample sizes decrease as correlation goes toward one. However,
the degree of decrease is smaller as the difference between the hazard
ratios is larger
z The largest values for all the sample sizes are commonly observed when
equal hazard ratio and zero-correlation
*
z The value of N CN is always lager than that of N CN
22
Empirical Power for Log-Rank Test
Clayton Copula Model
Empirical Conjunctive Power
0.90
ψ 1 = 0.667 ψ 2 = 0.667
ψ 1 = 0.667 ψ 2 = 0.625
ψ 1 = 0.667 ψ 2 = 0.50
0.85
0.80
T0 = 2.0 T = 5.0
0.75
λC1 = 0.5 λC2 = 0.5
α = 0.025 1 − β = 0.8 r = 0.5
0.70
0.0
0.2
0.4
0.6
0.8
Correlation
*
N CN
N CN
1.0 0.0
0.2
0.4
Correlation
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Correlation
z All of the empirical powers decrease as correlation goes toward one
z In particular the powers are less than the desired power 0.8 as correlation
is greater than approximately 0.4 while the empirical powers are greater
than the desired power of 0.8 when the correlation is less than around 0.4
*
z The empirical power of N CN
is always better than that of N CN
* 100,000 Monte-Carlo Trials
23
Empirical Power for Log-Rank Test
Positive Stable Copula Model
Empirical Conjunctive Power
0.90
ψ 1 = 0.667 ψ 2 = 0.667
ψ 1 = 0.667 ψ 2 = 0.625
ψ 1 = 0.667 ψ 2 = 0.50
0.85
0.80
T0 = 2.0 T = 5.0
0.75
λC1 = 0.5 λC2 = 0.5
α = 0.025 1 − β = 0.8 r = 0.5
0.70
0.0
0.2
0.4
0.6
0.8
Correlation
*
N CN
N CN
1.0 0.0
0.2
0.4
Correlation
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Correlation
z All of the empirical powers do not much change with correlation and they
are attained at the desired power of 0.8
*
z The empirical power of N CN
is always slightly larger than that of N CN
* 100,000 Monte-Carlo Trials
24
Empirical Power for Log-Rank Test
Fatal-Shock Model
Empirical Conjunctive Power
0.90
ψ 1 = 0.667 ψ 2 = 0.667
ψ 1 = 0.667 ψ 2 = 0.625
ψ 1 = 0.667 ψ 2 = 0.50
0.85
0.80
T0 = 2.0 T = 5.0
0.75
λC1 = 0.5 λC2 = 0.5
α = 0.025 1 − β = 0.8 r = 0.5
0.70
0.0
0.2
0.4
0.6
0.8
Correlation
*
N CN
N CN
1.0 0.0
0.2
0.4
Correlation
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Correlation
z All of the empirical powers do not much change with correlation and they
are attained at the desired power of 0.8
*
z The empirical power of N CN
is always slightly larger than that of N CN
* 100,000 Monte-Carlo Trials
4. Further Developments
At Least One Statistical Significance
Non-Inferiority Hypothesis
Mixed Binary and Time-to-Event Endpoints
26
At Least One Statistical Significance
Power for Bonferroni Adjustment
Overall power for showing statistical significance for at least one
endpoint with Bonferroni adjustment
⎡2
⎤
1 − β = 1 − Pr ⎢∩ Z k > − zα 2 ⎥
⎣ k =1
⎦
{
1.0
Ratio of Total Sample Size Required
ψ 2 = 0.625
0.8
0.7
ψ 2 = 0.556
0.6
ψ 2 = 0.50
0.5
“Disjunctive power” or “Minimal power”
(Senn, Bretz, 2007).
1.7
ψ 2 = 0.667
0.9
Disjunctive Power
}
0.4
0.3
0.2
0.1
1.6
ψ 2 = 0.667
1.5
ψ 2 = 0.625
1.4
1.3
ψ 2 = 0.556
1.2
ψ 2 = 0.50
1.1
ψ 1 = 0.667 λC1 = 0.5 λC2 = 0.5
α = 0.025 1 − β = 0.8 r = 0.5
1.0
0.0
0.0
0.2
0.4
0.6
Corrrelation
0.8
1.0
T0 = 2.0 T = 5.0
0.0
0.2
0.4
0.6
Correlation
0.8
1.0
27
Non-Inferiority Hypothesis
Power and Sample Size
NI hypothesis
⎧ H1 : logψ 1 < log M 1
⎨
⎩ H 0 : logψ 1 ≥ log M 1
and logψ 2 < log M 2
logψ 2 ≥ log M 2
or
⎧ M 1 Non-inferiority
⎨
margin
⎩M 2
Test statistics
Z k = ( logψˆ k − log M k )
1
N
1 ⎞
⎛1
+
⎜ r 1− r ⎟
⎝
⎠
Overall power for showing a joint statistical significance
(Heterogeneous variance)
1 − β = Φ 2 (−c1 , −c2 ρ HR )
logψ k − log M k
ck = zα +
1
N
⎛ 1
⎞
1
+
⎜
⎟
(
)
(1
)
(
)
−
φ
λ
φ
λ
r
r
T
C
k
k
⎝
⎠
28
Binary and Time-to-Event Outcomes
Correlation
Correlation between hazard ratio and relative risk
⎡
pˆ T ⎤
λˆT
corr ⎢ log
, log
⎥≈−
ˆ
ˆ
p
λC
⎢⎣
C⎥
⎦
(1 − r ) ρT λT pT qT + r ρC λC pC qC
{(1 − r ) λT2 + rλC2 }{(1 − r ) pT qT + rpC qC }
Binary endpoint
Time-to-Endpoint
YTi ∼ Bin(nT , pT )
STi ∼ Exp(λT )
YCj ∼ Bin(nC , pC )
SCj ∼ Exp(λC )
E[YTi ] = pT var[YTi ] = pT qT
E[ STi ] = λT−1 var[ STi ] = λT−2
E[YTj ] = pC var[YCj ] = pC qC
E[ SCj ] = λC−1 var[ SCj ] = λC−2
z One of issues is how to define the correlation: a use of correlation
form the joint distribution as a limiting distribution of Copulas
5. Summary
30
Summary
z We described the power and sample size determination for
comparative clinical trials with two correlated time-to-event
endpoints to be evaluated as primary variables.
z A simpler approach that assumes that the time-to-event endpoints
are exponentially distributed.
z Displaying significance on both endpoints for proof of an acceptable
efficacy profile
z The method may work when the dependency structure is early or
linear one. While a careful use of the method is recommended when
the late high dependency is observed.
z Our research is restricted to “two treatment comparison and two
time-to-event endpoints”
z The result from two endpoints gains the insight into more than two
endpoints
z The extension of the result to more than two hazard ratios is not
difficult although other issues will arise.
31
Thank you for your kind attention
If you have any questions, please e-mail to
hamasakt@medstat.med.osaka-u.ac.jp