Haplotype evidence Train the trainers workshop Apr 21, 2015 Mikkel Meyer Andersen mikl@math.aau.dk Department of Mathematical Sciences Aalborg University Denmark Themes Haplotype evidence MM Andersen mikl@math.aau.dk Introduction Hypotheses Match probability Estimators I Evidential weight Discrete Laplace I Importance of explicitly stating hypotheses Conclusion I Methods for calculating match probability Mixture separation 35 Train the trainers workshop Haplotype evidence MM Andersen mikl@math.aau.dk Introduction Hypotheses Match probability Estimators Introduction Discrete Laplace Mixture separation Conclusion 35 Train the trainers workshop Evidential weight Haplotype evidence Hp (prosecutor’s hypothesis): ’The suspect left the Y-chromosome DNA in the crime stain.’ MM Andersen mikl@math.aau.dk 2 Hd (defence attorney’s hypothesis): ’A random man left the Y-chromosome DNA in the crime stain.’ Introduction Hypotheses Match probability Estimators E: Evidence (e.g. DNA profile from crime scene) Discrete Laplace Mixture separation Conclusion Likelihood ratio = LR = P (E | Hp ) P (E | Hd ) Non-match: LR = Match: 0 =0 P (E | Hd ) LR = 1 P (E | Hd ) (Ideal situation, no errors, etc.) Lineage markers (Y-STR/mtDNA): Loci are not independent ⇒ No product rule 35 Train the trainers workshop Evidential weight Haplotype evidence Hp (prosecutor’s hypothesis): ’The suspect left the Y-chromosome DNA in the crime stain.’ MM Andersen mikl@math.aau.dk 2 Hd (defence attorney’s hypothesis): ’A random man left the Y-chromosome DNA in the crime stain.’ Introduction Hypotheses Match probability Estimators E: Evidence (e.g. DNA profile from crime scene) Discrete Laplace Mixture separation Conclusion Likelihood ratio = LR = P (E | Hp ) P (E | Hd ) Non-match: LR = Match: 0 =0 P (E | Hd ) LR = 1 P (E | Hd ) (Ideal situation, no errors, etc.) Lineage markers (Y-STR/mtDNA): Loci are not independent ⇒ No product rule 35 Train the trainers workshop Match probability Haplotype evidence MM Andersen mikl@math.aau.dk 3 Introduction Hypotheses Match probability Estimators P (E | Hd )? Discrete Laplace Mixture separation Conclusion 1. Formulation of Hd 2. Estimating P (E | Hd ) Focus on Y-STR, but many of the same challenges with mtDNA 35 Train the trainers workshop ISFG recommendations Haplotype evidence ISFG recommendations of Y-STR usage from 2006 (http://www.isfg.org/Publication;Gusmao2006): MM Andersen mikl@math.aau.dk 4 Introduction Hypotheses Mostly nomenclature, allele designation, locus selection. Match probability Estimators Recommendations on the estimation of Y-STR haplotype frequencies and estimation of the weight of the evidence of Y-STR typing will be presented separately as guidelines for the interpretation of forensic genetic evidence. I I I Discrete Laplace Mixture separation Conclusion Highly wanted guidelines Problem 1: Singletons (haplotypes only observed once) are common (a lot of rare variants) Problem 2: Population substructure (some haplotypes common in local areas, but not in country as a whole) 35 Train the trainers workshop Sparsity of Y-STRs Haplotype evidence 19,630 samples n = 1 (singletons) n = 2 (doubletons) n=3 n=4 n=5 n=6 n=7 n=8 n=9 n = 10 n = 11 ... n ∈ (30, 40] ... n ∈ (100, 515] MM Andersen mikl@math.aau.dk Forensic marker set MHT 9 loci SWGDAM 11 loci PPY12 12 loci Yfiler 17 loci PPY23 23 loci 6,083 (31.0%) 8,495 (43.3%) 9,092 (46.3%) 15,263 (77.8%) 18,237 (92.9%) 1,131 435 226 114 86 63 43 29 31 22 1,227 436 199 101 85 51 50 29 21 24 1,260 416 196 106 85 50 41 34 24 28 1,064 256 94 63 21 12 12 9 4 5 531 64 16 6 2 2 1 13 11 7 1 8 4 4 5 Introduction Hypotheses Match probability Estimators Discrete Laplace Mixture separation Conclusion 1 Purps J, Siegert S, et al. (2014). A global analysis of Y-chromosomal haplotype diversity for 23 STR loci. Forensic Science International: Genetics, Volume 12, 2014, p. 12-23. 35 Train the trainers workshop Haplotype evidence MM Andersen mikl@math.aau.dk Introduction Hypotheses Match probability Hypotheses Estimators Discrete Laplace Mixture separation Conclusion 35 Train the trainers workshop Match probability Haplotype evidence MM Andersen mikl@math.aau.dk P (E | Hd ) P (E | Hd , I) Introduction 6 Hypotheses Match probability I: Additional information (same for Hp and Hd ), e.g. reference database Estimators Discrete Laplace Mixture separation Elaborating Hd : I Hd : ’A random man left the Y-chromosome DNA in the crime stain.’ I Hd : ’A random man (from the population of which the reference database is a random sample) left the Y-chromosome DNA in the crime stain.’ Conclusion Population? What is that? Does it matter? Yes. 35 Train the trainers workshop Population substructure Haplotype evidence Population substructure: Population is a collection of subpopulations. Haplotypes are more common in some subpopulations than in others. MM Andersen mikl@math.aau.dk Introduction 7 Hypotheses Match probability Estimators Discrete Laplace Subpop1 Subpop2 ··· Mixture separation Conclusion Subpopr Population Coloured squares represent haplotypes. We have a sample from the population without substructure information. 35 Train the trainers workshop Population substructure Haplotype evidence MM Andersen mikl@math.aau.dk What if the random man (the true perpetrator, under Hd ) and the suspect is from the same subpopulation? I I Introduction 8 Hypotheses Match probability Hd : ’A random man (from the population, with substructure, of which the reference database is a random sample) left the Y-chromosome DNA in the crime stain.’ Estimators Discrete Laplace Mixture separation Conclusion Hd : ’A random man (from the population, with substructure, of which the reference database is a random sample) – that originate from the same subpopulation as the suspect – left the Y-chromosome DNA in the crime stain.’ We assume that the random man and the suspect originate from same subpopulation, but we do not know which 35 Train the trainers workshop Match probability Hd : ’A random man (from the population, with substructure, of which the reference database is a random sample) – that originate from the same subpopulation as the suspect – left the Y-chromosome DNA in the crime stain.’ Haplotype evidence MM Andersen mikl@math.aau.dk Introduction 9 Hypotheses Match probability I I I I I I I Estimators In this subpopulation, the haplotype may be more frequent than in the population as a whole One approach (the Balding-Nichols model): P (E | Hd ) = θ + (1 − θ)ph θ (theta) (0 < θ < 1) Discrete Laplace Mixture separation Conclusion Population parameter (related to the variability of haplotype frequencies in different subpopulations) Not haplotype specific (an average) because that is a simple model and we have a chance to estimate it Need to be estimated separately using databases from at least two subpopulations ph : Population frequency of h (0 < ph < 1) 35 Train the trainers workshop Match probability Haplotype evidence MM Andersen mikl@math.aau.dk Match probability for population with substructure (Balding-Nichols model): Introduction 10 Hypotheses Match probability Estimators P (E | Hd ) = θ + (1 − θ)ph Discrete Laplace Mixture separation Note, that Conclusion θ + (1 − θ)ph > θ and θ + (1 − θ)ph > ph as θ + (1 − θ)ph = θ + ph − θph = ph + (1 − ph )θ (and both θ and ph is between 0 and 1). 35 Train the trainers workshop Population substructure Haplotype evidence MM Andersen mikl@math.aau.dk Introduction 11 ··· Hypotheses Match probability Estimators Discrete Laplace Mixture separation Subpop1 Subpop2 Subpopr Conclusion Population Coloured squares represent haplotypes. If a random man and the suspect belong to the same subpopulation, they are expected to share a haplotype more often than a random database sample from the population would represent. 35 Train the trainers workshop Population substructure: Examples Haplotype evidence MM Andersen mikl@math.aau.dk Introduction 12 Match probability Example 1: Danish reference database. We assume no population substructure (haplotype distribution same in cities and small islands). I Hd : ’A random Dane left the Y-chromosome DNA in the crime stain.’ I Use population frequency, ph , based on Danish reference database (and no θ correction) Hypotheses Estimators Discrete Laplace Mixture separation Conclusion 35 Train the trainers workshop Population substructure: Examples Haplotype evidence MM Andersen mikl@math.aau.dk Introduction Example 2: Danish reference database. We assume population substructure (such that haplotype distribution may differ e.g. in cities and small islands). I Hd : ’A random Dane originating from the same small island, Bornholm, as the suspect left the Y-chromosome DNA in the crime stain.’ I Use θ correction: θ + (1 − θ)ph with known θ and population frequency, ph , based on Danish reference database 13 Hypotheses Match probability Estimators Discrete Laplace Mixture separation Conclusion 35 Train the trainers workshop Population substructure: Examples Haplotype evidence MM Andersen mikl@math.aau.dk Introduction 14 Match probability Example 3: Reference database from Bornholm (small Danish island). We assume no population substructure (haplotype distribution same in cities and small islands). I Hd : ’A random man from Bornholm left the Y-chromosome DNA in the crime stain.’ I Use population frequency, ph , based on the reference database from Bornholm (and no θ correction) Hypotheses Estimators Discrete Laplace Mixture separation Conclusion 35 Train the trainers workshop θ (theta) correction Haplotype evidence MM Andersen mikl@math.aau.dk Introduction 15 Hypotheses Match probability Estimators Discrete Laplace θ (theta) correction is a remedy for not knowing (or having information about) the population substructure Mixture separation Conclusion 35 Train the trainers workshop Estimating θ (theta) Haplotype evidence MM Andersen mikl@math.aau.dk Introduction 16 Hypotheses Match probability I E.g. use geographical information I Sample what we believe to be subpopulations (populations without substructure), e.g. islands, cities (or even countries) separately (at the right level) I Estimators Discrete Laplace Mixture separation Conclusion θ between countries may be different from θ between cities/islands in one country 35 Train the trainers workshop Estimating θ (theta) Bruce Weir, personal communication. Simple estimation (a lot of assumptions, e.g. equal weighted subpopulations). I r : Number of subpopulations I ni : Size of reference database from i’th subpopulation (i = 1, 2, . . . , r ) I nih : Number of times haplotype h is observed in reference database from i’th subpopulation Haplotype evidence MM Andersen mikl@math.aau.dk Introduction 17 Hypotheses Match probability Estimators Discrete Laplace Mixture separation Conclusion mi = X 1 nih (nih − 1) and ni (ni − 1) mij = h r mW = 1X mi r and mB = i=1 θˆ = r −1 mW −mB r 1−mB W −mB 1 − 1r m1−m B 1 X nih njh ni nj h r −1 X r X 2 mij r (r − 1) i=1 j=i+1 large r ≈ mW − mB 1 − mB 35 Train the trainers workshop ISFG recommendations Haplotype evidence MM Andersen mikl@math.aau.dk ISFG recommendations of Y-STR usage from 2006 (http://www.isfg.org/Publication;Gusmao2006): Introduction 18 Hypotheses Match probability Individual laboratories must establish relevant, regional Y-STR haplotype databases. Estimators Discrete Laplace Mixture separation Most of the databases provide haplotype frequency estimates for larger regions [...]. However, pooling of different regions is only valid if there is no population substructure [...]. Conclusion Population substructure has been shown in a number of regional groups within the same (but not between different) major U.S. populations and also in some European groups. 35 Train the trainers workshop Population frequency database sample based on Haplotype evidence MM Andersen mikl@math.aau.dk Introduction 19 Hypotheses Match probability I I Estimators Database sampling must be truely random (not convenience sampling!) Discrete Laplace Mixture separation Conclusion Do not search for and exclude close relatives after randomly sampling individuals (underrepresentation of common haplotypes) 35 Train the trainers workshop Haplotype evidence MM Andersen mikl@math.aau.dk Introduction Hypotheses Match probability Match probability Estimators Discrete Laplace Mixture separation Conclusion 35 Train the trainers workshop Match probability Haplotype evidence I I I I Match probability is ph ph : population frequency (normally estimated based on sample, the reference database) Introduction Hypotheses 20 I I Match probability Estimators Hd refers to a population with substructure (random man and suspect assumed to originate from the same, unknown/unidentifiable, subpopulation): I I MM Andersen mikl@math.aau.dk Hd refers to a population with no substructure: Discrete Laplace Mixture separation Conclusion Match probability is θ + (1 − θ)ph θ: Population parameter (related to the variability of haplotype frequencies in different subpopulations) ph : population frequency (normally estimated based on sample, the reference database, from population with substructure (collection of subpopulations)) θ estimated with a collection of reference databases (counting pairs of match within and between) 35 Train the trainers workshop Match probability Haplotype evidence MM Andersen mikl@math.aau.dk Introduction Hypotheses Match probability for populations with substructure = θ+(1−θ)ph 21 Match probability Estimators Discrete Laplace Mixture separation I If ph is really small (compared to θ), θ + (1 − θ)ph ≈ θ I If ph is really large (compared to θ), θ + (1 − θ)ph ≈ ph θ = 0.001 θ = 0.003 ph = 1/100,000 = 0.00001 0.0010099 0.0030099 Conclusion ph = 1/100 = 0.01 0.01099 0.01297 35 Train the trainers workshop Haplotype evidence MM Andersen mikl@math.aau.dk Introduction Hypotheses Match probability Population frequency estimators Estimators Discrete Laplace Mixture separation Conclusion 35 Train the trainers workshop Estimators Haplotype evidence MM Andersen mikl@math.aau.dk I I I Precise (low prediction error) – difficult (many measures, need to know true frequency – from simulated populations?) Does it work for all datasets, also for those only consisting of singletons? Statistical model: Guaranteed behaviour (e.g. probabilities sum to 1) I I I Introduction Hypotheses Match probability 22 Estimators Discrete Laplace Mixture separation Conclusion Assign probability to all possible haplotypes (e.g. for mixture LR) Probability mass 1 to be distributed among all possible haplotypes Difficult to avoid wasting probability mass on improbable haplotypes 35 Train the trainers workshop Add suspect’s haplotype Haplotype evidence MM Andersen mikl@math.aau.dk Introduction Hypotheses I Include in dataset (new observation) I I Additional information: Under Hd , suspect considered as a random (wrongly accused) individual from the population; the haplotype is just another random sample Not all LRs are for criminal cases (paternity, immigration, etc.) I Old dataset: D − of size n I New dataset: D of size n + 1 Match probability 23 Estimators Discrete Laplace Mixture separation Conclusion 35 Train the trainers workshop Count method(s) Haplotype evidence MM Andersen mikl@math.aau.dk I Count method: P(X = x) = (nx + 1)/(n + 1) I I nx x∈D n+1 1 n+1 I P I for x 6∈ D Corrected count estimators: I I I Introduction nx : Number of times x is observed in the dataset 1 nx = 0: P(X = x) = n+1 = P x∈D nx = n+1 n+1 = 1, hence P(X = x) = 0 Hypotheses Match probability 24 Estimators Discrete Laplace Mixture separation Conclusion Brenner’s κ (CH Brenner (2010) / HE Robbins (1968)): 1 by 1 − κ, where κ is the singleton propotion Deflate n+1 Generalised Good (IJ Good (1953), G Cereda/R Gill): http://arxiv.org/abs/1502.02406 and http://arxiv.org/abs/1502.04083 Count methods (both original and corrected): Useful for observed haplotypes, not for unobserved 35 Train the trainers workshop Haplotype evidence MM Andersen mikl@math.aau.dk Introduction Hypotheses Match probability The Discrete Laplace method Estimators Discrete Laplace Mixture separation Conclusion 35 Train the trainers workshop Motivation Haplotype evidence MM Andersen mikl@math.aau.dk I I Haplotype probability distribution (statistical model) Enables a wide range of inferences using one model: I I I I I I Haplotype frequency estimation (observed and unobserved) Mixtures (e.g. separation and LR) Cluster analysis (not shown today) ... Introduction Hypotheses Match probability Estimators 25 Discrete Laplace Mixture separation Conclusion Not a new ad-hoc tool for each task A statistical model gives desirable properties: I P(x): Probability mass function Consistent: X P(x) = 1 I P(x) > 0 for all x ∈ H I x∈H 35 Train the trainers workshop Model Haplotype evidence MM Andersen mikl@math.aau.dk Introduction Hypotheses Match probability Estimators I Y-STR: Loci not statistically independent I Our approach: Condition on central haplotypes to obtain (assumed) independency between loci (caused by ’private mutations’) 26 Discrete Laplace Mixture separation Conclusion 35 Train the trainers workshop Discrete Laplace distribution Haplotype evidence Discrete Laplace distributed X ∼ DL(p, µ): I Dispersion parameter 0 < p < 1 and I Location parameter µ ∈ Z = {. . . , −2, −1, 0, 1, 2, . . .} Probability mass function: f (X = x; p, µ) = MM Andersen mikl@math.aau.dk Introduction Hypotheses Match probability 1 − p |x−µ| ·p 1+p for x ∈ Z Estimators 27 Discrete Laplace Mixture separation Perfectly homogeneous population with 1-locus haplotypes: Conclusion 0.4 0.2 0.0 f(X = x; p = 0.3, µ = 13) P(X = x) = f (X = x; p, µ) 8 9 10 11 12 13 14 15 x, e.g. Y−STR allele 16 17 18 35 Train the trainers workshop Statistical model for Y-STR haplotypes Haplotype evidence MM Andersen mikl@math.aau.dk Introduction Perfectly homogeneous population with r -locus haplotypes: Hypotheses Match probability P(X = (x1 , x2 , . . . , xr )) = r Y Estimators f (xk ; pk , µk ) 28 Discrete Laplace Mixture separation k =1 Conclusion I µ ~ = (µ1 , µ2 , . . . , µr ): Central haplotype ~p = (p1 , p2 , . . . , pr ): Discrete Laplace parameters (one for each locus) I Mutations happen independently across loci (relative to µ ~) I 35 Train the trainers workshop Statistical model for Y-STR haplotypes Haplotype evidence MM Andersen mikl@math.aau.dk Non-homogeneous population with c subpopulations and r -locus haplotypes: P(X = (x1 , x2 , . . . , xr )) = c X j=1 τj r Y Introduction Hypotheses Match probability f (xk ; pjk , µjk ) k =1 Estimators 29 Discrete Laplace Mixture separation Conclusion I τj : A priori probability Pc for originating from the j’th subpopulation ( j=1 τj = 1) I µ~j = (µj1 , µj2 , . . . , µjr ): Central haplotype for the j’th subpopulation p~j = (pj1 , pj2 , . . . , pjr ): Parameters for all loci at the j’th subpopulation Parameter estimation from observations using R library disclapmix I I 35 Train the trainers workshop Data and fit Haplotype evidence 0.5 ● Introduction 0.3 ● Hypotheses 0.2 Match probability Estimators 30 0.1 Probability 0.4 MM Andersen mikl@math.aau.dk 0.0 ● ● ● ● ● 6 7 8 9 10 11 12 13 14 Discrete Laplace Mixture separation ● ● Conclusion ● ● 15 16 DYS392 c: Number ofP subpopulations c P(X = x) = j=1 τj f (x; pj , µj ) 35 Train the trainers workshop Data and fit Haplotype evidence ● Observations Estimated (c = 1) MM Andersen mikl@math.aau.dk Introduction 0.3 ● Hypotheses 0.2 Match probability Estimators 30 0.1 Probability 0.4 0.5 ● 0.0 ● ● ● ● ● 6 7 8 9 10 11 12 13 14 Discrete Laplace Mixture separation ● ● Conclusion ● ● 15 16 DYS392 c: Number ofP subpopulations c P(X = x) = j=1 τj f (x; pj , µj ) P(DYS392 = x) = 1 · f (x; p = 0.41, µ = 11) 35 Train the trainers workshop Data and fit Haplotype evidence ● Observations Estimated (c = 2) MM Andersen mikl@math.aau.dk Introduction 0.3 ● Hypotheses 0.2 Match probability Estimators 30 0.1 Probability 0.4 0.5 ● 0.0 ● ● ● ● ● 6 7 8 9 10 11 12 13 14 Discrete Laplace Mixture separation ● ● Conclusion ● ● 15 16 DYS392 c: Number ofP subpopulations c P(X = x) = j=1 τj f (x; pj , µj ) P(DYS392 = x) = 0.519 · f (x; p = 0.004, µ = 11) + 0.481 · f (x; p = 0.179, µ = 13) 35 Train the trainers workshop Data and fit Haplotype evidence ● Observations Estimated (c = 3) MM Andersen mikl@math.aau.dk Introduction 0.3 ● Hypotheses 0.2 Match probability Estimators 30 0.1 Probability 0.4 0.5 ● 0.0 ● ● ● ● ● 6 7 8 9 10 11 12 13 14 Discrete Laplace Mixture separation ● ● Conclusion ● ● 15 16 DYS392 c: Number ofP subpopulations c P(X = x) = j=1 τj f (x; pj , µj ) 35 Train the trainers workshop Data and fit Haplotype evidence Observations Estimated (c = 3) ● MM Andersen mikl@math.aau.dk Introduction 0.3 ● Hypotheses 0.2 Match probability Estimators 30 0.1 Probability 0.4 0.5 ● 0.0 ● ● ● ● ● 6 7 8 9 10 11 12 13 14 Discrete Laplace Mixture separation ● ● Conclusion ● ● 15 16 DYS392 c: Number ofP subpopulations c P(X = x) = j=1 τj f (x; pj , µj ) µ ˆj τˆj I 3 subpopulations: I Observed vs expected: Allele Observed Expected 11 0.5248 0.5248 11 52% 12 0.0567 0.0567 13 46% 13 0.3322 0.3315 14 2% 14 0.0714 0.0715 15 0.0083 0.0089 35 Train the trainers workshop Haplotype evidence MM Andersen mikl@math.aau.dk Introduction Hypotheses Match probability Mixture separation Estimators Discrete Laplace Mixture separation Conclusion 35 Train the trainers workshop Mixture separation Haplotype evidence MM Andersen mikl@math.aau.dk Yfiler trace, 15 loci (DYS385a/b removed): Introduction Locus Alleles DYS19 DYS389I DYS389II’ DYS390 DYS391 DYS392 DYS393 DYS438 DYS439 DYS437 DYS448 DYS456 DYS458 DYS635 Y GATA H4 14, 15 13, 14 16, 17 24, 26 10, 11 11, 13 13 11, 12 10, 11 14, 15 19, 20 15, 16 14, 18 23 12, 13 Hypotheses Match probability Estimators Discrete Laplace 31 Mixture separation Conclusion 213−1 = 4,096 possible contributor pairs 35 Train the trainers workshop Mixture separation Haplotype evidence MM Andersen mikl@math.aau.dk Introduction Danish Loci n Singletons I I I Somali German Hypotheses Match probability DEN (21) DEN (15) DEN (10) SOM (10) GER (7) 21 181 181 (100%) 15 181 164 (90.6%) 10 181 112 (61.9%) 10 201 56 (27.9%) 7 3,443 662 (19.2%) Estimators Discrete Laplace 32 Mixture separation Conclusion For each dataset, 550 mixtures were simulated ˆ i,1 )P(h ˆ i,2 ) ˆi = P(h i’th contributor pair ci = {hi,1 , hi,2 }, find p ˆi values (highest to Order all pairs according to the p lowest) 35 Train the trainers workshop Mixture separation Haplotype evidence Probabiliy Rank ≤ 1 Rank ≤ 5 Rank ≤ 10 Random ≤ 10 DEN (21) DEN (15) DEN (10) SOM (10) GER (7) 13% 33% 42% 26% 55% 69% 45% 84% 93% 72% 94% 98% 53% 89% 97% 0.03% 0.78% 12.15% 26.79 % 53.93% MM Andersen mikl@math.aau.dk Introduction Hypotheses Match probability Estimators Discrete Laplace P(True rank ≤ x) 33 DEN (21) DEN (15) SOM (10) GER (7) Mixture separation Conclusion DEN (10) 1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 x Ranking Discrete Laplace Random 35 Train the trainers workshop Haplotype evidence MM Andersen mikl@math.aau.dk Introduction Hypotheses Match probability Concluding remarks Estimators Discrete Laplace Mixture separation Conclusion 35 Train the trainers workshop The discrete Laplace method Haplotype evidence I I Sound statistical properties Applications I I I Computationally feasible I Open source software: R libraries disclap and disclapmix (and fwsim for simulating populations) Criticism I I I I I Introduction Estimation of Y-STR haplotype population frequencies Mixture analysis Cluster analysis (not shown today) I I MM Andersen mikl@math.aau.dk Hypotheses Match probability Estimators Discrete Laplace Mixture separation 34 Conclusion 35 Train the trainers workshop Intermediate alleles (e.g. 10.2) Duplications (e.g. DYS385a/b) – general problem for matches: 14,15 = 14,15? Copy number variation (e.g. Yfiler Plus) Central haplotypes difficult to estimate (curse of dimensionality) Maybe too much probability mass on unobserved haplotypes Conclusion Haplotype evidence MM Andersen mikl@math.aau.dk Introduction Hypotheses I Match probability is of great interest and is difficult I Population substructure (important but difficult to get correct value for a particular case) For a matching profile (e.g. Y23 or Yfiler Plus), use only subset (e.g. 10 loci) for LR calculations? I I Match probability Estimators Discrete Laplace Mixture separation 35 Conclusion 35 Train the trainers workshop Easier to validate statistical models for 10 locus haplotypes than for 27 locus haplotypes
© Copyright 2024