Sample Size and Power for the Comparison of Cost and Effect Henry Glick Applications of Statistical Considerations in Health Economic Evaluations ISPOR 13th International Meeting May 4, 2008 Goal of Sample Size Calculation • Prior to the literature that described confidence intervals for cost-effectiveness ratios, sample size was commonly based on the larger of the sample sizes needed for estimating pre-specified cost and effect differences – i.e., what sample size was required to identify a $1000 difference in costs, and what was required to identify a 10% reduction in mortality • Current sample size calculations are based on the number of study subjects needed to rule out that the therapy is unacceptable (equivalent to ruling out that the net monetary benefits of the intervention are less than 0) Sample Size Formula, Common SDs • Sample size for NMB uses the following formula: n= { (α+β)2 2 sdc2 + 2 W 2 sd2q - 2W ρ (2sdc2 )0.5 (2sd2q )0.5 } (W ΔQ − ΔC)2 where n = n/group; tα/2 and tβ = t-statistics for α and β errors; sd = standard deviation for cost (c) and effect (q); W = maximum willingness to pay one wishes to rule out; and ρ = correlation of the difference in cost and effect http://www.uphs.upenn.edu/dgimhsr/stat%20samps.htm 1 Sample Size Formula, SDs Differ • When the SDs differ, the formula becomes: n= { 2 2 2 0.5 (t α/2 + t β )2 (sdc0 + sd2c1 ) + W 2 (sd2q0 + sd2q1 ) - 2W ρ(sdc0 + sdc1 ) (sd2q0 + sd2q1 )0.5 } (WQ − C)2 where n = n/group; tα/2 and tβ = t-statistics for α and β errors; sd = standard deviation for cost (c) and effect (q); W = maximum willingness to pay one wishes to rule out; and ρ = correlation of the difference in cost and effect Correlation Between Costs and Effects $93,500 $46,200 $27,500 Incremental Costs 45000 $22,670 30000 $11,545 15000 0 0.00 0.50 Point Estimate CER: $27,500 Means and S.D.s identical Correlations: 0.95; -0.95 1.00 1.50 2.00 Incremental QALYS Correlation Between Costs and Effects • All else equal, the required sample size is less when the therapies have a Win/Lose (positive) correlation – As the effectiveness increases, the cost increases (e.g., stroke care) • All else equal, the required sample size is greater when the therapies have a Win/Win (negative) correlation – As the effectiveness increases, the cost decreases (e.g., asthma care) • Extreme values of correlation between costs and effects can have dramatic effects on the confidence interval for the cost effectiveness ratio/NMB and thus on the sample size required to demonstrate value for the cost 2 Where to Obtain the Necessary Data? • When therapies are already in use: Expected differences in outcomes and standard deviations can be derived from feasibility studies or from records of patients – Potential sources • Medical charts of administrative data sets • Patient logs of their health care resource use • Asking patients and experts about the kinds of care received by those with the condition under study – In addition, at least one study has suggested that the simple correlation between costs and effects observed in these data may be an adequate proxy for the measure of correlation used for estimating sample size Obtaining Data for Novel Therapies • For novel therapies, information about the magnitude of the incremental costs and outcomes may not be available – May need to be generated by assumption – Data on the standard deviations for those who receive usual care/placebo may be obtained from feasibility studies or patient records • One may assume that the standard deviation will apply equally to both treatment groups, or one may make alternative assumptions about their relative magnitudes – The correlation also would be obtained from such data ssizeprg.do • quietly do ssizeprg • Contains 4 “immediate form” PROGRAMS related to sample size and power to detect NMB differences that are greater than 0 – The command do ssizeprg simply loads these programs; it does not calculate anything • Documentation program: ssizeprgdoc 3 ssizeprg.do (cont.) • Programs for calculating sample size and power – cess1i: Calculates sample size under the assumption that the standard deviations for cost and effect are common between the 2 treatment groups – cess2i: Calculates sample size under the assumption that the standard deviations for cost and effect differ between the 2 treatment groups – cepow1i: Calculates power to detect NMB greater than 0 under the assumption of common standard deviations – cepow2i: Calculates power to detect NMB greater than 0 under the assumption that the standard deviations differ ssizeprg.do (cont.) • All 4 programs presume two arm trials and a common sample size for both treatment groups • These programs yield results that are identical to those derived from the NHB formula in: Willan AR. Analysis, sample size, and power for estimating incremental net health benefit from clinical trial data. Control Clin Trials 2001;22:228-237 ssizeprgdoc: cess1i * PROGRAM: CESS1I * cess1i is used to estimate sample size when one assumes * there are common standard deviations for cost and effect * between the 2 treatment groups (SDs, not SEs for the difference * in cost and effect). * COMMAND LINE: cess1i [diffc] [diffe] [sdc] [sde] [corr] [wtp] [alpha] [beta] * The 8 arguments are all numbers ** `1' Difference in costs ** `2' Difference in effects ** `3' Standard deviation, costs (assumed the same for both groups) ** `4' Standard deviation, effects (assumed the same for both groups) ** `5' Correlation, difference in costs and effects ** `6' Maximum willingness to pay ** `7' Two-tailed alpha level (e.g., 0.05) ** `8' One-tailed beta level (e.g., 0.80) 4 ssizeprgdoc: cess1i (cont.) • Saved results (scalars) * r(diffc) * r(diffq) * r(sd_c) * r(sd_e) * r(rho) * r(wtp) * r(alpha) * r(beta) * r(nmb) * r(sampsize) Implementing cess1i • Suppose the expected difference in cost is 25; the expected difference in QALYs is 0.05; the expected SDs for cost and QALYs are 1000 and 0.195, respectively; the expected correlation of the difference is -0.1; your maximum WTP is 75,000; and you want a 2-tailed alpha of .05 and a 1-tailed beta of 0.8: – Point estimate = 25 / 0.5 = 500 Implementing cess1i (cont.) . cess1i 25 .05 1000 .195 -.1 75000 .05 .8 SAMPLE SIZE CALCULATION (Common SD Costs and Effects) Assumptions Difference in costs: Difference in effects: 25 .05 Standard deviation, costs: Standard deviation, effects: Correlation, difference in costs and effects: 1000 .195 -.1 Willingness to pay: Two-tailed alpha level: One-tailed beta level: 75000 .05 .8 Expected NMB: 3725 *** SAMPLE SIZE PER GROUP *** 246 5 Implementing cess1i (cont.) . return list scalars: r(diffc) r(diffq) r(sd_c) r(sd_e) r(rho) r(wtp) r(alpha) r(beta) r(nmb) r(sampsize) = = = = = = = = = = 25 .05 1000 .195 -.1 75000 .05 .8 3725 246 Calculate Sample Sizes • Compare the sample sizes required for the following expected results: – 25 0.05 1000 0.195 -0.1 75000 0.05 0.8 – 25 0.05 2000 0.195 -0.1 75000 0.05 0.8 – 25 0.05 1000 0.390 -0.1 75000 0.05 0.8 • What is happening? Sample Size Calculations Sample Size Parameters 25 0.05 1000 0.195 -0.1 75000 0.05 0.8 Sample Size 246 25 0.05 2000 0.195 -0.1 75000 0.05 0.8 253 25 0.05 1000 0.390 -0.1 75000 0.05 0.8 976 6 Sample Size Often More Sensitive to SDq than to SDc • The sample size formula is symmetric for the SDs of cost and effect except for the following: 2 2 (sdc0 + sdc1 ) + W 2 (sd2q0 + sd2q1 ) • Changes in the square of the QALY SD are weighted by the square of WTP; changes in the square of the cost SD are unweighted – When WTP is substantially greater than SD for cost, percentage changes in the QALY SD will have a greater effect on sample size than will equivalent percentage changes in cost SD Calculate Sample Sizes (II) • Compare the sample sizes required for the following expected results: – 25 0.05 1000 0.195 -0.5 75000 0.05 0.8 – 25 0.05 2000 0.195 0.5 75000 0.05 0.8 • What is happening? Sample Size Calculations Sample Size Parameters Sample Size 25 0.05 1000 0.195 -0.5 75000 0.05 0.8 260 25 0.05 1000 0.195 +0.5 75000 0.05 0.8 227 • Holding all else equal, when the correlation of the difference in cost and effect is negative, one needs a larger sample than when the correlation of the difference is positive 7 Calculate Sample Sizes (III) • Compare the sample sizes required for the following expected results: – 25 0.05 1000 0.195 -0.1 900 0.05 0.8 – 25 0.05 1000 0.195 -0.1 100 0.05 0.8 • What is happening? When WTP ~ PE, NMB ~ 0, Sample Size Sample Size Parameters Sample Size 25 0.05 1000 0.195 -0.1 900 0.05 0.8 41,831 25 0.05 1000 0.390 -0.1 100 0.05 0.8 39,412 • At willingnesses to pay of 900 and 100, the expected value of NMB approaches 0 (when WTP = 900, NMB = 20; when WTP = 100, NMB = -20 • Power to detect a difference is lowest as NMB appoaches 0 Checking Your Sample Size Calculation • Based on your original design criteria, the sample size formula indicated you need 246 per group • You decide to use ceapowersimulator to double check the sample size – The program draws random samples of size 492 with the appropriate means, sds, and correlation quietly do ceapowersimulator ceapowersimulator 25 .05 1000 .195 -.1 75000 .05 246 • When you look at the results, you find that 99.8% of the point estimates from your repeated samples -- which look very much like the cloud of points we plot on the CE plane -- are acceptable 8 Distribution of Point Estimates 300 WTP: 75, 000 Dfi f er ence n i Cost 150 0 - 150 - 300 - 0. 02 - 0. 00 0. 02 N = 246 / gr oup 0. 04 0. 06 0. 08 Dfi f er ence n i Q ALYs 0. 10 0. 12 • Are we using the wrong formulas or are we looking at the wrong outcome of our simulation? Point Estimates Address the Wrong Question • What are we trying to insure when we calculate sample size with an alpha of 0.05 and a 1-beta of 0.8? • While 99.8% of the point estimates satisfy our willingness to pay of 75,000 per QALY, in only 79.9% of repeated experiments, do the 95% CI allow us to be 95% confident that the therapy is good value • Implication: Sample size calculations are about CI in repeated experiments, they aren’t about the distribution of point estimates from repeated experiments Experiments that Do and Do Not Yield Confidence 300 WTP: 75,000 Difference in Cost 150 0 -150 -300 -0.02 -0.00 N = 246 / group 0.02 0.04 0.06 0.08 Difference in QALYs 0.10 0.12 9 ssizeprgdoc: cess2i * PROGRAM: CESS2I * cess2i is used to assess sample size when one * assumes there are Rx-specific standard deviations * for the 2 treatment groups' costs and effects (SDs, • not SEs for the difference in costs and effects) * COMMAND LINE: cess2i [diffc] [diffe] [sdc0] 9sdc1 [sde0] [sde1] [corr] [wtp] [alpha] [beta] * The 10 arguments are all numbers * `1' Difference in costs * `2' Difference in effects * `3' Standard deviation, costs, group 0 * `4' Standard deviation, costs, group 1 * `5' Standard deviation, effects, group 0 * `6' Standard deviation, effects, group 1 * `7' Correlation, difference in costs and effects * `8' Willingness to pay * `9' Two-tailed alpha level (e.g., 0.05) * `10' One-tailed beta level (e.g., 0.80) ssizeprgdoc: cess2i (cont.) * Saved results (scalars) * r(diffc) * r(diffq) * r(sd_c0) * r(sd_c1) * r(sd_e0) * r(sd_e1) * r(rho) * r(wtp) * r(alpha) * r(beta) * r(nmb) * r(sampsize) Implementing cess2i • Suppose the expected difference in cost is 25; the expected difference in QALYs is 0.05; the expected SDs for cost are 800 and 1200; the expected SDs for QALYs are 0.19 and 0.20; the expected correlation of the difference is -0.1; your maximum WTP is 75,000; and you want a 2-tailed alpha of .05 and a 1-tailed beta of 0.8: 10 Implementing cess2i (cont.) . cess2i 25 .05 800 1200 .19 .20 -.1 75000 .05 .8 SAMPLE SIZE CALCULATION (Different SD, Costs and Effects) Assumptions Difference in costs: Difference in effects: 25 .05 Standard deviation, costs, group 0: Standard deviation, costs, group 1: Standard deviation, effects, group 0: Standard deviation, effects, group 1: Correlation, difference in costs and effects: 800 1200 .19 .2 -.1 Ceiling ratio: Two-tailed alpha level: One-tailed beta level: 75000 .05 .8 Expected NMB: 3725 *** SAMPLE SIZE PER GROUP *** 247 Implementing cess2i (cont.) . return list scalars: r(diffc) r(diffq) r(sd_c0) r(sd_c1) r(sd_e0) r(sd_e1) r(rho) r(wtp) r(alpha) r(beta) r(nmb) r(sampsize) = = = = = = = = = = = = 25 .05 800 1200 .19 .2 -.1 75000 .05 .8 3725 247 Calculate Sample Sizes • Calculate the sample size for the case when the SDs for cost and QALYs were 500, 1500, 0.145 and 0.245 • Calculate the sample size for common SDs of 1035 and .201825 (3.5% increases over the original example) 11 Separate SDs Tend to Increase Sample Size Sample Size Parameters Sample Size 25 .05 500 1500 0.145 0.245 -.1 75000 .05 .8 263 25 .05 1035 .201825 -.1 75000 .05 .8 264 ssizeprgdoc: cepow1i • PROGRAM: CEPOW1i * cepow1i is used to assess power when one assumes * that the 2 treatment groups have common standard * deviations for costs and effects (SDs, not SEs for • the difference in cost and effect) * COMMAND LINE: cepow1i [diffc] [diffe] [sdc] [sde] [corr] [wtp] [alpha] • [sampsize] * The 8 arguments are all numbers * `1' Difference in costs * `2' Difference in effects * `3' Standard deviation, costs (assumed the same for both groups) * `4' Standard deviation, effects (assumed the same for both groups) * `5' Correlation, difference in costs and effects * `6' Willingness to pay * `7' Two-tailed level (e.g., 0.05) * `8' Sample size per group ssizeprgdoc: cepow1i • Saved results (scalars) * r(diffc) * r(diffq) * r(sd_c) * r(sd_e) * r(rho) * r(wtp) * r(alpha) * r(sampsize) * r(nmb) * r(power) 12 Implementing cepow1i • Suppose the expected difference in cost is 25; the expected difference in QALYs is 0.05; the expected SDs for cost and QALYs are 1000 and 0.195, respectively; the expected correlation of the difference is -0.1; your maximum WTP is 75,000; you want a 2-tailed alpha of .05; and the current sample size plans are for 246 per group Implementing cepow1i (cont.) . cepow1i 25 .05 1000 .195 -.1 75000 .05 246 POWER CALCULATION (Common SD Costs and Effects) Assumptions Difference in costs: Difference in effects: 25 .05 Standard deviation, costs: Standard deviation, effects: Correlation, difference in costs and effects: 1000 .195 -.1 Willingness to pay: Two-tailed alpha level: Sample size per group 75000 .05 246 Expected NMB: 3725 *** POWER TO DETECT DIFFERENCE *** .799 Implementing cpow1i (cont.) . return list scalars: r(diffc) r(diffq) r(sd_c) r(sd_e) r(rho) r(wtp) r(alpha) r(sampsize) r(nmb) r(power) = = = = = = = = = = 25 .05 1000 .195 -.1 75000 .05 246 3725 .799 13 Power Table (Example 1) Power for WTP = 75,000 Sample Size 150 0.584 200 0.714 246 0.799 300 0.871 350 0.916 Power Graph: 25 .05 1000 .195 -.1 WTP .05 250 1.00 0.80 Power 0.10 0.60 0.08 0.05 0.40 0.03 0.20 0.00 0.00 0 25000 50000 0 250 500 75000 750 1000 1250 1500 100000 125000 150000 WTP ssizeprgdoc: cepow2i • PROGRAM: CEPOW2I * cepow2i is used to assess power when one assumes * there are Rx-specific standard deviations for for the * 2 treatment groups' costs and effects (SDs, not SEs * for the difference in costs and effects) * COMMAND LINE: cepow2i [diffc] [diffe] [sdc0] 9sdc1 [sde0] [sde1] [corr] [wtp] [alpha] [sampsize] * The 10 arguments are all numbers * `1' Difference in costs * `2' Difference in effects * `3' Standard deviation, costs, group 0 * `4' Standard deviation, costs, group 1 * `5' Standard deviation, effects, group 0 * `6' Standard deviation, effects, group 1 * `7' Correlation, difference in costs and effects * `8' Willingness to pay * `9' Two-tailed alpha level (e.g., 0.05) * `10’ Sample size 14 ssizeprgdoc: cepow2i • Saved results (scalars) * r(diffc) * r(diffq) * r(sd_c0) * r(sd_c1) * r(sd_e0) * r(sd_e1) * r(rho) * r(wtp) * r(alpha) * r(sampsize) * r(nmb) * r(power) Implementing cepow2i • Suppose the expected difference in cost is 25; the expected difference in QALYs is 0.05; the expected SDs for cost are 800 and 1200; the expected SDs for QALYs are 0.19 and 0.20; the expected correlation of the difference is -0.1; your maximum WTP is 75,000; you want a 2-tailed alpha of .05; and the current sample size plans are for 246 per group Implementing cepow2i (cont.) . cepow2i 25 .05 800 1200 .19 .20 -.1 75000 .05 246 POWER CALCULATION (Different SD, Costs and Effects) Assumptions Difference in costs: Difference in effects: 25 .05 Standard deviation, costs, group 0: Standard deviation, costs, group 1: Standard deviation, effects, group 0: Standard deviation, effects, group 1: Correlation, difference in costs and effects: 800 1200 .19 .2 -.1 Ceiling ratio: Two-tailed alpha level: Sample Size: 75000 .05 246 Expected NMB: *** POWER TO DETECT DIFFERENCE *** 3725 .799 15 Implementing cpow2i (cont.) . return list scalars: r(diffc) r(diffq) r(sd_c0) r(sd_c1) r(sd_e0) r(sd_e1) r(rho) r(wtp) r(alpha) r(sampsize) r(nmb) r(power) = = = = = = = = = = = = 25 .05 800 1200 .19 .2 -.1 75000 .05 246 3725 .799 Sample Size and Power Table (Example 1) Sample Size For 80% power Power for N = 246 50,000 251 .0792 75,000 246 0.799 100,000 244 0.803 150,000 242 0.806 200,000 241 0.807 WTP • For this type of experiment, as one increases the WTP, sample size decreases and power increases Cost Minimization • Suppose we are performing what we expect will be a cost-minimization study (cost savings but no difference in effect) – We expect a cost savings of 1250 and difference in QALYs of 0.0; we expect an SD for cost of 5500 and for effect of 0.25; we expect the correlation of the difference to be 0.5; and we want an alpha and beta of 0.05 and 0.8, respectively 16 Calculate Sample Size and Power • Sample size – Calculate the required sample size for a WTP of 50,000 – Calculate the required sample size for a WTP of 200,000 • Power: Assuming a sample size of 1183 – Calculate power for a WTP of 50,000 – Calculate power for a WTP of 200,000 • What happened? Sample Size Calculations Sample Size / Power Parameters Sample Size -1250 0 5500 .25 .5 50000 .05 .8 1183 -1250 0 5500 .25 .5 200000 .05 .8 22,658 Power -1250 0 5500 .25 .5 50000 .05 1183 0.8 -1250 0 5500 .25 .5 200000 .05 1183 0.093 Sample Size and Power Table Sample Size for 80% power Power for N = 1185 50,000 1183 0.801 75,000 2800 0.445 100,000 5202 0.267 150,000 12,360 0.137 200,000 22,658 0.094 WTP • For this kind of experiment, as WTP increases, the required sample size increases and power decreases • Why? 17 Ca Significantly Different from Cb and Qa Not Significantly Different from Qb C = -1250; SE = 492; Q = 0; SE = .027; S.D. for effectiveness = 0.30 500 = 0.5; DOF = 498 1000 Difference in Costs 0 -500 UL: -8,388 -1000 -1500 -2000 -2500 LL: 26,610 -3000 -3500 -0.080 -0.040 0.000 0.040 0.080 Difference in QALYs Dominance • Similar kinds of issues can arise if you design your trial with the idea that you will document dominance – In other words, you may be in a situation in which as you increase the WTP, the sample size decreases and power increases (e.g., if the lower limit remains in the lower right quadrant, and the upper limit is moving up into the upper right quadrant – Or you may be in a situation where as the WTP increases, the sample size increases and power decreases (if the lower limit rotates into the lower left quadrant) Summary • We’ve provided you with programs to calculate sample size and power for the comparison of cost and effect • In many cases, as one’s WTP increases, the necessary sample size will be decreased and power will increase (a pattern 1 experiment) • In other cases, as one’s WTP increases, the necessary sample size will be increase and power will decrease (a pattern 2 experiment) 18
© Copyright 2025