How to Reduce the Number of Animals by Improving Experimental Design and Statistics in Drug Development Michael FW Festing c/o Understanding Animal Research, 25 Shaftsbury Av. London, UK. michaelfesting@aol.com 1 "...the standard of design of experimental investigations is poor and the basic principles of design are widely ignored... Mead (1990) The Design of Experiments, Cambridge Univ. Press 2 1 Poor agreement between animal and human responses Intervention Human results Animal results (metaanalysis) Agree? Corticosteroids for head injury No improvement Improved nurological outcome n=17 No Antofibrinolytics for Reduces blood loss surgery Too little good quality data n=8 No Thrombolysis with TPA for acute ischaemic stroke Reduces death Reduces death but publication bias and overstatement (n=113) Yes Tirilazad for stroke Increases risk of death Reduced infarct volume and improved behavioural score n=18 No Corticosteroids for premature birth Reduces mortality Reduces mortality n=56 Yes Bisphosphonates for osteoperosis Increase bone density Increase bone density n=16 Yes 3 Perel et al (2007) BMJ 334:197-200 Good experimental design z z Saves animals, so easier to justify ethically Saves time and money z Badly designed experiments wasteful and may give invalid results 4 2 Types of experiment z Pilot study z z Exploratory experiment z z z z z Logistics and preliminary information Aim is to provide data to generate hypotheses May “work” or “not work” Often many outcomes Statistical analysis may be problematical (many characters measured, data snooping). p-values may not be correct Confirmatory experiment z z z Clear specification of aims of the experiment Simple formal hypothesis stated a priori. Choice of model, treatments and dependent variables 5 A well designed experiment z Absence of bias z z High power z z z z z Low noise (uniform material, blocking, covariance) High signal (sensitive subjects, high dose) Large sample size Wide range of applicability z z Correct experimental unit, randomisation, blinding Replicate over other factors (e.g. sex, strain): factorial designs Simplicity Amenable to a statistical analysis 6 3 Experimental Unit The smallest division of the experimental material such that any two experimental units can receive different treatments Unit of randomisation Unit of statistical analysis 7 The animal as the experimental unit Animals individually treated. May be individually housed or grouped N=8 8 4 A cage as the Experimental Unit. Treatment in water or diet. Animals can not receive different treatments. N=4 9 An animal for a period of time: repeated measures or crossover design Animal 1 2 3 N=16 Treatment 1 Treatment 2 10 5 Teratology: mother treated, young measured Mother is the experimental unit. N=2 11 Aim: to detect strain differences in diurnal pattern of blood alcohol) ELD group ELD group Single cage of 8 mice killed at each time point (288 mice in total) 12 6 Randomisation Minimises the chance of a systematic difference between groups causing bias Method: Physical, using cards Spread sheet Original Randomised Animal number 1 2 1 1 3 2 1 3 3 1 1 4 2 2 5 2 1 6 2 2 7 2 1 8 3 3 9 3 2 10 3 3 11 3 1 12 13 Randomisation, blinding and cage assignment Cage 1 2 Original Randomised Animal 1 2 1 1 3 2 1 3 3 1 1 4 2 2 5 2 1 6 2 2 7 2 1 8 3 3 9 3 2 10 3 3 11 3 1 12 2 3 3 4 3 1 2,X 3,X 3,X 2,3,3,1 2,1,2,1 1, 2, 3 1, 2, 3 1,1 1,1 2,2 2,2 1,1,1,1,1 2,2,2,2 etc individually housed etc individual + companion etc Grouped at random etc Randomised block Two/box, box ExpU etc By treatment, box is ExpU 14 7 Failure to randomise and/or blind leads to more “positive” results Blind/not blind odds ratio 3.4 (95% CI 1.7-6.9) Random/not random odds ratio 3.2 (95% CI 1.3-7.7) Blind Random/ not blind random odds ratio 5.2 (95% CI 2.0-13.5) 290 animal studies scored for blinding, randomisation and positive/negative outcome, as defined by authors Bebarta et al 2003 Acad. emerg. med. 10:684-687 15 A well designed experiment z Absence of bias z z High power z z z z z Large sample size Low noise Good signal Wide range of applicability z z Identify the experimental unit, randomisation, blinding Replicate over other factors (e.g. sex, strain) Simplicity Amenable to a statistical analysis 16 8 Sample size determination z Power analysis: Mathematical combination of six variables z z z z z Use for clinical trials (e.g. simple but expensive) Difficult to use for complex designs Needs estimate of standard deviation Subjective estimate of effect size of clinical interest (signal) Resource equation: Law of diminishing returns z z Quick, Easy, Approximate Good for inexpensive complex non-clinical designs 17 Power analysis: the variables Effect size of scientific interest (signal) Chance of a false positive result. Significance level (0.05) Sample size Sidedness of statistical test (usually 2-sided) Power of the Experiment (8090%?) Variability of the experimental material (noise) 18 9 Group size and Signal/noise ratio Bad 140 Power 90% 80% 120 Group size 100 80 Neutral 60 Good 40 20 0 0 0.5 1 1.5 2 2.5 3 Signal/noise ratio Effect size (Std. Devs.) Assuming 2-sample, 2 sided t-test and 5% significance level 19 Comparison of two anaesthetics for dogs under clinical conditions (Vet. Anaesthes. Analges.) Unsexed healthy clinic dogs, • Weight 3.8 to 42.6 kg. • Systolic BP 141 (SD 36) mm Hg Assume: • a 20 mmHg difference between groups is of clinical importance, • a significance level of α=0.05 • a power=90% • a 2-sided t-test Signal/Noise ratio 20/36 = 0.56 (standardised effect size) δ = |μ1−μ2|/σ Required sample size 68/group 20 10 Power and sample size calculations using nQuery Advisor 21 A second paper described: • Male Beagles weight 17-23 kg • mean BP 108 (SD 9) mm Hg. • Want to detect 20mm difference between groups (as before) With the same assumptions as previous slide: Signal/noise ratio = 20/9 = 2.22 Required sample size 6/group 22 11 Summary for two sources of dogs: aim is to be able to detect a 20mmHg change in blood pressure Type of dog SDev Signal/noise Random dogs 36 Male beagles 9 0.56 2.22 Sample size/gp(1) 68 6 %Power (n=8) (2) 18 98 (1) Sample size: 90% power (2) Power, Sample size 8/group Assumes α=5%, 2-sided t-test and effect size 20mmHg 23 Inbred strains are more uniform. Does a new drug cause anaemia? Specify effect size (signal): Anaemia if RBC count* reduced by 0.50 (signal) Assume 5% significance and 2-sided test Previous data on outbred CD-1 mice: Mean RBC count 9.00 Std. Dev. 0.68 (noise) Signal/noise ration is 0.5/0.68 0.73 Previous data on inbred C57BL/6 mice:Mean RBC count 9.60 Std. Dev. 0.25 (noise) Signal/noise ratio is 0.5/0.25 2.00 * x 1012/l 24 12 Group size and Signal/noise ratio 140 Power 120 90% 80% Group Group size size 100 Using CD-1 mice 80 60 40 Using C57BL/6 20 0 0 0.5 1 1.5 2 2.5 3 Signal/noise ratio Effect size (Std. Devs.) Assuming 2-sample, 2 sided t-test and 5% significance level 25 Variation in kidney weight in 58 groups of rats 90 80 Variability 70 60 Mycoplasma 50 Outbred 40 F1 F2 30 20 10 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 Sample numbe r 26 Gartner,K. (1990), Laboratory Animals, 24:71-77. 13 Required sample sizes Type Genetics F1 hybrid 13.5 0.74 30 80 F2 hybrid 18.4 0.54 55 53 Outbred 20.1 0.49 67 46 Mycoplasma free 18.6 0.54 55 53 With Mycoplasma 43.3 0.23 298 14 Disease Std.Dev Signal*/ noise Sample Power** size Factor *signal is 10 units, two sided t-test, α=0.05, power = 80% ** Assuming fixed sample size of 30/group 27 Isogenic strains to control within-strain variation Outbred stocks Isogenic strains (inbred, F1) z z z z z z z Isogenic (animals identical) Homozygous, breed true (not F1) Phenotypically uniform Defined (quality control) Genetically stable Extensive background data with genetic profile Internationally distributed Like immortal clones of genetically identical individuals. Several hundred strains available. Most common rat strain F344 z z z z z z z Each individual different Do not breed true Phenotypically variable Not defined (no QC) Genetic drift can be rapid Validity of background data questionable. No genetic profile Not internationally distributed Stocks with same name will be different due to genetic drift and selection. Most common rat stock is “Sprague-Dawley” 28 14 The randomised block design: another method of controlling noise Treaments A, B & C B C A B1 A C B B2 B A C A C B B C A B3 B4 • • • • • • Randomisation is within-block Can be multiple differences between blocks Heterogeneous age/weight Different shelves/rooms Natural structure (litters) Split experiment in time B5 Common with in-vitro studies where the “experiment” (block) is repeated on several days. Should be more widely used in animal research. 29 A randomised block experiment Apoptosis score Analysed using a 2-way ANOVA without interaction 500 450 400 350 300 250 200 150 100 50 0 Control CGP STAU 365 398 421 1 423 432 459 2 Week 308 320 329 3 30 Treatment effect p=0.023 15 The Resource Equation method of determining sample size E= (Total number of animals)-(number of groups) 10<E<20 Student's t, 5% critical value The Resource Equation & Sample Size 12.0 E= (total numbers)-(number of groups) 9.5 10<E<20 7.0 4.5 2.0 0 5 10 15 20 25 30 35 31 Degrees of freedom A factorial design incorrectly analysed as four separate experiments E= (Total number of animals)-(number of groups). 10<E<20 8 mice per group, 8 treatment groups, 64 mice total. E=64-8 = 56 Alternative 3 mice per group 8 groups 24 mice total E=24-8 = 16 Saving:40 mice 32 16 Factorial designs: can increase signal Factorial design Treated Control E=16-4 = 12 Single factor design One variable at a time (OVAT) Treated Control Treated Control Treated Control E=16-2 = 14 E=16-2 = 14 E=16-2 = 14 33 Factorial designs (By using a factorial design)”.... an experimental investigation, at the same time as it is made more comprehensive, may also be made more efficient if by more efficient we mean that more knowledge and a higher degree of precision are obtainable by the same number of observations.” R.A. Fisher, 1960 34 17 Factorial designs z Any number of factors: z z z z Drug treatments, prior treatments, sexes, strains Any number of levels of each factor Can screen many variable for effect on character of interest Sub-group size can be quite small 35 Factorial: what do we mean by group size? 8 8 or 4? 8 or 2? 8 or 1? Trt. Ctrl. Trt. Ctrl. Trt. Ctrl. Trt. Ctrl. Single factor Inbred strain 2x2 Factorial 2x4 Factorial Randomised block 8 or ?? Trt. Ctrl. Outbred stock 36 18 Factorial designs for drug interactions Drug A Control Control Treated (1) a b ab Estimates a b axb Drug B Treated 37 Comparison: single outbred stock vs factorial with inbred strains Dose of chloramphenicol (mg/kg) 0 500 1000 1500 2000 2500 Outbred CD-1 8 8 8 8 8 8 CBA 2 2 2 2 2 2 C3H 2 2 2 2 2 2 BALB/c 2 2 2 2 2 2 C57BL 2 2 2 2 2 2 Inbred Festing,M.F.W.,et. al. (2001) Strain differences in haematological response to chloramphenicol succinate in mice: implications for toxicological research. Food and Chemical Toxicology, 39, 375-383. 38 19 Red blood cell counts Strain CBA CBA C3H C3H BALB/c BALB/c C57BL C57BL CD-1 CD-1 CD-1 CD-1 CD-1 CD-1 CD-1 CD-1 Control 10.57 9.88 8.49 7.87 10.10 10.08 9.60 9.56 9.10 10.27 9.01 7.76 8.42 8.83 10.01 8.65 1500mg/kg 8.33 8.51 7.40 7.51 8.95 9.29 9.81 9.83 Four inbred strains 8.90 8.26 7.45 8.50 8.71 7.79 8.67 8.19 One outbred stock 39 Counts following chloramphenicol at 1500mg/kg Red blood cell counts Strain N CD-1 16 0 9.01 Strain N 0 BALB/c 4 10.09 C3H 4 8.18 C57BL 4 9.58 CBA 4 10.23 Mean 16 9.51 Dose * strain Signal Noise 1500 (Difference) (SD) Signal/noise p 8.31 0.70 0.68 1.03 0.058 Signal Noise 1500 (Difference) (SD) 9.12 0.97 0.25 7.46 0.72 0.25 9.82 (0.24) 0.25 8.42 1.81 0.25 8.70 0.81 0.25 Signal/noise p 3.88 2.88 (0.96) 7.24 3.24 <0.001 <0.001 40 20 Example of a factorial compared with a single factor design Strain CBA CBA C3H C3H BALB/c BALB/c C57BL C57BL WBC Control Treated 1.90 0.40 2.60 0.20 2.10 0.40 2.20 0.40 1.60 1.30 0.50 1.40 2.30 0.80 2.20 1.10 CD-1 CD-1 CD-1 CD-1 CD-1 CD-1 CD-1 CD-1 3.00 1.70 1.50 2.00 3.80 0.90 2.60 2.30 1.90 1.90 3.50 1.20 2.30 1.00 1.30 1.60 Four inbred strains One outbred stock 41 WBC counts following chloramphenicol at 2500mg/kg White blood cell counts Strain N CD-1 16 0 2.23 Strain N 0 CBA 4 2.25 C3H 4 2.15 BALB/c 4 1.05 C57BL 4 2.25 Mean 16 1.93 Dose * strain Signal Noise 2500 (Difference) (SD) Signal/noise p 1.83 0.40 0.86 0.47 0.38 Signal Noise 1500 (Difference) (SD) 0.30 1.95 0.34 0.40 1.85 0.34 1.35 -0.30 0.34 0.95 1.30 0.34 1.20 0.73 0.34 Signal/noise p 5.73 5.44 (0.88) 3.82 2.15 <0.001 <0.001 42 21 A factorial randomised block experiment to detect the effect of BHA on liver EROD activity Festing MF (2003) Principles: the Need for Better Experimental Design. Trends Pharmacol Sci 24: pp 341-345. Block 2 Block 1 Treated Control Treated Control A/J 129/Ola NIH BALB/c 43 The two blocks were separated by approximately 3 months A real experiment to detect the effect of BHA on liver EROD activity Block 2 Block 1 Treated Control Treated Control A/J 129/Ola NIH BALB/c 18.7 17.9 19.2 26.3 7.7 16.7 8.4 14.4 12.0 9.8 9.7 Mean 14.7 19.8 6.4 6.7 8.1 6.0 The two blocks were separated by approximately 3 months Mean 11.3 (diff 3.4) 44 22 Effects of BHA on liver EROD activity in four mouse strain (a 2x4 factorial randomised block experiment) EROD activity 25 Control BHA 20 Treatment p<0.001 Strain p=0.05 Strain x Treatment, p=0.03 Std. Dev. 1.6 15 10 5 0 A/J 129/Ola NIH BALB/c A/J 129/Ola NIH BALB/c 2 mice per mean (16 total), done as a randomised block design. 45 A well designed experiment z Absence of bias z z High power z z z z z Low noise (uniform material, blocking, covariance) High signal (sensitive subjects, high dose) Large sample size Wide range of applicability z z Correct experimental unit, randomisation, blinding Replicate over other factors (e.g. sex, strain): factorial designs Simplicity Amenable to a statistical analysis 46 23 Conclusions z z z Scope for improvement z Experiments often poorly designed z Many scientists have little training in experimental design and statistics Common errors: z Failure to identify Experimental unit z Failure to randomize and use blinding z Lack of knowledge of sample size determination z Poor understanding of effects of variation z Failure to use/understand randomized block designs z Failure to understand factorial designs Greater investment in training would save animals, money and time 47 Festing MF (2003) Principles: the Need for Better Experimental Design. Trends Pharmacol Sci 24: pp 341-345. 48 24 49 An animal room as the Experimental Unit Does the presence of rats affect breeding performance of mice? Pups born per litter 15 with without 10 N=33 5 BALB/c B6/JN B6/N CD1 CF1 CFW DBA/2 FVB Strain/stock 50 25 An animal room for a period of time: repeated measures, within-subject, crossover or randomised block design Anima rooml 1 2 3 4 N=16 with rats without rats 51 Some factors (e.g. strain, sex) can not be randomised so special care is needed to ensure comparability Six cages of 7-9 mice of each strain: error bars are SEMs "CBA mice showed greater variability in body weights than TO mice..." Outbred TO (8-12 weeks commercial) Inbred CBA (12-16 weeks Home bred) 52 26 Body weight of mice housed 1, 2, 4 or 8 per cage Chvedoff et al (1980) Arch.Toxicol. Suppl 4:435 Mice/cage 8 SD=2.9 4 SD=3.2 2 SD=3.9 1 SD=5.8 35 45 55 Weight 53 The consequences of variability Specification: Assume a treated and a control group Effect size to be detected of 5g (the signal) or more A 90% power A 5% significance level & a 2-sided t-test. Number/cage 1 2 4 8 Mean 46.0 44.7 42.6 42.2 SD 5.8 3.9 3.2 2.9 Signal/ noise 0.86 1.28 1.56 1.72 Estimated group size 30 14 10 9 54 27 Chloramphenicol toxicity in mice: Outbred CD1, 8 mice per level Difference from control Std. Devs. Signal/ noise inratio 8 HCT HGB LYMPH NEUT PLT RBC RETICS WBC 7 Effect size detectable with 90% power and 5% significance level, 2 sided 6 5 4 3 2 1 0 0 500 1000 1500 2000 2500 dose (mg/kg) Re-drawn from Festing et al (2001) Fd. Chem.Tox. 39:375 55 Chloramphenicol toxicity in mice: 4 strains, 8 mice per level HCT HGB LYMPH NEUT PLT RBC RETICS WBC DifferenceSignal/ from control in Std. Devs. noise ratio 9 8 Effect size detectable with 90% power and 5% significance level, 2 sided 7 6 5 4 3 2 1 0 0 500 1000 1500 2000 2500 Dose (mg/kg) 56 28 Chloramphenicol toxicity in mice: 4 strains, 8 mice per level HCT HGB LYMPH NEUT PLT RBC RETICS WBC DifferenceSignal/ from control in Std. Devs. noise ratio 9 8 Effect size detectable with 90% power and 5% significance level, 2 sided 7 6 5 4 3 2 1 0 0 500 1000 1500 2000 2500 Dose (mg/kg) 57 Mistakes in this experiment: 1. The cage is the experimental unit so there are 36, not 288 experimental units to detect differences in diurnal pattern of 2. Aim: The authors lookedstrain at the results before deciding the statistical analysis alcohol) 3. blood They should have done a pilot study and then eliminated two of the treatments 4. A t-test is not the correct method of analysis ELD group ELD group Single cage of 8 mice killed at each time point (288 mice in total) 58 29 Chloramphenicol toxicity in mice: Outbred CD1, 8 mice per level Difference from control Std. Devs. Signal/ noise inratio 8 HCT HGB LYMPH NEUT PLT RBC RETICS WBC 7 Effect size detectable with 90% power and 5% significance level, 2 sided 6 5 4 3 2 1 0 0 500 1000 1500 2000 2500 dose (mg/kg) Re-drawn from Festing et al (2001) Fd. Chem.Tox. 39:375 59 30
© Copyright 2024