Background Robust Models Simulation Discussion Accounting for Complex Sample Designs via Mixture Models Michael Elliott1 1 University of Michigan School of Public Health August 2008 Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Talk Outline 1 Background Design-Based Inference Model-Based Inference 2 Robust Models Finite Normal Mixture Models Bayesian Density Estimation 3 Simulation 4 Discussion Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Design-Based Inference Model-Based Inference Design-Based Inference Randomization or “design-based” inference is standard for sample survey data. Treat population values Y = (Y1 , ..., YN ) as fixed, and sampling indicators I = (I1 , ..., IN ) as random. Goal is to make inference about a population quantity Q(Y). Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Design-Based Inference Model-Based Inference Design-Based Inference Randomization or “design-based” inference is standard for sample survey data. Treat population values Y = (Y1 , ..., YN ) as fixed, and sampling indicators I = (I1 , ..., IN ) as random. Goal is to make inference about a population quantity Q(Y). Consider estimator qˆ(y, I) where EI|Y (ˆ q (y, I)) ≈ Q(Y) and variance estimator of qˆ(y, I) vˆ (Yinc , I) where EI|Y (ˆ v (y, I)) ≈ VarI|Y (ˆ q (y, I)) (Hansen and Hurwitz 1943; Kish 1965; Cochran 1977.) Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Design-Based Inference Model-Based Inference Model-Based Inference Model-based approach posits p(Y | θ). Superpopulation: θ fixed Bayesian: θ ∼ p(θ) Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Design-Based Inference Model-Based Inference Bayesian Survey Inference Focus on inference about Q(Y) based on p(Ynobs | y): p(Ynobs | y) = R R p(Y) = p(y) p(Y | θ)p(θ)dθ = p(y) p(Ynobs | y, θ)p(y | θ)p(θ)dθ = p(y) Z p(Ynobs | y, θ)p(θ | y)dθ (Ericson 1969; Scott 1977; Rubin 1987). Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Design-Based Inference Model-Based Inference Design Inference vs. Bayesian Inference Randomization approach has substantial advantages. Y treated as fixed → no need for distributional assumptions. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Design-Based Inference Model-Based Inference Design Inference vs. Bayesian Inference Randomization approach has substantial advantages. Y treated as fixed → no need for distributional assumptions. In scientific surveys the distribution of I is (largely) under the control of the investigator. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Design-Based Inference Model-Based Inference Design Inference vs. Bayesian Inference Randomization approach has substantial advantages. Y treated as fixed → no need for distributional assumptions. In scientific surveys the distribution of I is (largely) under the control of the investigator. “Automatically” account for sample design in inference. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Design-Based Inference Model-Based Inference Design Inference vs. Bayesian Inference Randomization approach does not always work well. Inefficient (Basu 1971) Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Design-Based Inference Model-Based Inference Design Inference vs. Bayesian Inference Randomization approach does not always work well. Inefficient (Basu 1971) Small-area estimation (Ghosh and Lahiri 1988) Non-response (Little and Rubin 2002) Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Design-Based Inference Model-Based Inference Design Inference vs. Bayesian Inference Randomization approach does not always work well. Inefficient (Basu 1971) Small-area estimation (Ghosh and Lahiri 1988) Non-response (Little and Rubin 2002) Lack of consistent reference distribution (Little 2004) Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Design-Based Inference Model-Based Inference Design Inference vs. Bayesian Inference Bayesian approach avoids “inferential schizophrenia” (Little 2004) Doesn’t rely on asymptotics Focus on prediction of unsampled elements Does require noninformative sampling: P(I | Y) = P(I) or, more generally, unconfounded sampling P(I | Y) = P(I | y) (Rubin 1987). Maintaining this assumption requires: Probability sample Model p(Y) attentive to design features and robust enough to sufficiently capture all aspects of the distribution of Y relevant to Q(Y). Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Design-Based Inference Model-Based Inference Accommodating Survey Weights in a Model Stratify data by probabilities of inclusion h = 1, ..., H and allow interaction between model quantities of interest and probabilities of inclusion yih ∼ N(µh , σ 2 ) X Y | y ∼ N(N −1 {nh yh +(Nh −nh )yˆ h }, (1−n/N)σ 2 /n), yˆ h = E (µh | y) h Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Design-Based Inference Model-Based Inference Accommodating Survey Weights in a Model Stratify data by probabilities of inclusion h = 1, ..., H and allow interaction between model quantities of interest and probabilities of inclusion yih ∼ N(µh , σ 2 ) X Y | y ∼ N(N −1 {nh yh +(Nh −nh )yˆ h }, (1−n/N)σ 2 /n), yˆ h = E (µh | y) h Flat prior on µh → yˆ h = y h recovers fully-weighted estimator. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Design-Based Inference Model-Based Inference Accommodating Survey Weights in a Model Stratify data by probabilities of inclusion h = 1, ..., H and allow interaction between model quantities of interest and probabilities of inclusion yih ∼ N(µh , σ 2 ) X Y | y ∼ N(N −1 {nh yh +(Nh −nh )yˆ h }, (1−n/N)σ 2 /n), yˆ h = E (µh | y) h Flat prior on µh → yˆ h = y h recovers fully-weighted estimator. Degenerate prior on µh at µ → yˆ h = y recovers unweighted estimator. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Design-Based Inference Model-Based Inference Accommodating Survey Weights in a Model Stratify data by probabilities of inclusion h = 1, ..., H and allow interaction between model quantities of interest and probabilities of inclusion yih ∼ N(µh , σ 2 ) X Y | y ∼ N(N −1 {nh yh +(Nh −nh )yˆ h }, (1−n/N)σ 2 /n), yˆ h = E (µh | y) h Flat prior on µh → yˆ h = y h recovers fully-weighted estimator. Degenerate prior on µh at µ → yˆ h = y recovers unweighted estimator. Assigning a proper prior µh ∼ N(µ, τ 2 ) (Holt and Smith 1979) compromises between fully-weighted and unweighted 2 estimator: yˆ h = wh y h + (1 − wh )˜ y , wh = n nτh2τ+σ2 , h P −1 P nh nh y˜ = y . h n τ 2 +σ 2 h n τ 2 +σ 2 h h h Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Design-Based Inference Model-Based Inference Accommodating Survey Weights in a Model Elliott and Little (2000) extend to consider µ ∼ N(f (h, β), Σ): Adding structure to the mean and variance increases robustness of estimation of population mean when stratum mean strongly associated with probability of selection, though efficiency gains over design-based estimator of Y when stratum means are weakly associated with probability of selection are reduced. A Bayesian smoothing spline estimator of the mean is quite robust but can still can yield efficiency gains. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Design-Based Inference Model-Based Inference Accommodating Survey Weights in a Model Developing models to accommodate survey weights for more complex population quantities such as population regression parameters more challenging. Elliott (2007) and Huang and Elliott (2008) extend weight stratum models to linear and generalized linear regression models by allowing for interactions between weight strata and regression parameters. Efficiency gains are possible over design-based regression estimators Proliferation of parameters can make practical implementation difficult, if number of covariates large and sample size modest. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Finite Normal Mixture Models Bayesian Density Estimation Finite Normal Mixture Models A simple finite normal mixture model without covariates: Yi | Ci = c, µc , σc2 ∼ N(µc , σc2 ), C = 1, ..., K Ci = c | π1 , ..., πK ∼ MULTI (1; π1 , ..., πK ) Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Finite Normal Mixture Models Bayesian Density Estimation Finite Normal Mixture Models Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Finite Normal Mixture Models Bayesian Density Estimation Application to Complex Sample Data Maintain robustness of design-based approach? Use of models that include a large number of classes to model highly non-normal data. Increased efficiency of model-based approach? If the data suggest that a small number (or single) class of normal data is sufficient. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Finite Normal Mixture Models Bayesian Density Estimation Normal Regression Mixture Model for Complex Sample Design Data Yi | xi , Ci = c, β c , σ 2 ∼ N(x0i β c , σ 2 ), C = 1, ..., K Ci = c | α, γ, πi ∼ MULTI (1; p1 , ..., pK ), ηij = Φ(γj −f (πi , α)) for ηij = j X pk , j = 1, ..., K − 1 k=1 where γ1 = 0 to avoiding aliasing with the α parameters. Accounts for regression model misspecification and skewness and overdispersion in the residual errors term Fits simple, highly efficient models when the data allow. f (πi , α) could be simple parametric form (e.g., linear in π), or non-parametric (e.g., linear P-spline). Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Finite Normal Mixture Models Bayesian Density Estimation Normal Mixture Model Priors To ensure a proper posterior, we utilize conjugate priors of the form p(β c ) ∼ N(β 0 , Σ0 ) p(σc2 ) ∼ Inv − χ2 (a, s) p(α) ∼ N(α0 , Ω0 ) p(γj ) ∼ UNI (0, A) By choosing relatively non-informative values for the prior parameters, we should be able to avoid influencing the results of the inference to an untoward degree. Draws from p(β, σ 2 , α, γ | y) are obtained using a Gibbs sampling algorithm. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Finite Normal Mixture Models Bayesian Density Estimation Normal Mixture Model Posterior Predictive Distribution Using simulations (β rep , (σ 2 )rep , αrep , γ rep ) from p(β, σ 2 , α, γ | y, x), we obtain PNfrom p(B | y, x) where PN a simulation 0 −1 B is the population slope ( i=1 xi xi ) i=1 xi yi : rep picrep = Φ(γcrep − f (πi , αrep )) − Φ(γc−1 − f (πi , αrep )) rep p rep φ((yi − x0i β rep c )/σc ) p˜icrep = PK ic rep rep 0 rep c=1 pic φ((yi − xi β c )/σc ) yˆirep = K X p˜icrep x0i β rep c c=1 B rep = (X 0 WX )−1 X 0 W yˆ Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Finite Normal Mixture Models Bayesian Density Estimation Normal Mixture Model Shortcomings A shortcoming of the above mixture model is that the number of classes K must be known in advance. Could let K be a random variable and either obtain yi | xi , K analytically or include draw of K in Gibbs sampler via reversible jump algorithm (Green 1995). Alternatively, could use a Bayesian non-parametric approach that avoids pre-specification of the number of mixture classes. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Finite Normal Mixture Models Bayesian Density Estimation Bayesian Density Estimation Model We can generalize the normal mixture model as Z f (yi | xi ) = N(yi | φi )Gxi (φi ) Previously Gxi was multinomial, but in Dunson et al. (2007) Gx is an element in an uncountable collection of probability measures Gx ∼ DP(αG0,x ), where DP denotes a Dirichlet process (Ferguson 1973) centered at base measure G0 with precision α. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Finite Normal Mixture Models Bayesian Density Estimation Dirichlet process mixture models In standard DP mixture models (MacEachren 1994) Gx ≡ G ∼ DP(αG0 ). Expressing the Dirichlet process in “stick breaking” form , we have ∞ X πh ∼ BETA(1, α) G= πh δθh , Qh−1 π h l=1 h=1 where δθ is degenerate at θ and {θh } are atoms generated from G0 . Use of a Polya urn scheme (Blackwell and MacQueen 1973) integrates out the infinite dimensional G to obtain X α 1 φi | φ(i) , α ∼ G0 + δφj α+n−1 α+n−1 j6=i Thus DP mixture models cluster subjects into K ≤ n classes for which θ = (θ1 , ..., θK ), where θh are sampled independently from G0 . The induced prior on K grows with n and α. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Finite Normal Mixture Models Bayesian Density Estimation Dirichlet process mixture models Under the normal model, the posterior predictive distribution of yi | xi at a given draw of φ = (β, σ 2 ), K , and S denoting the configuration of φ into K distinct values is given by K X α N(yi | xi , β0 , σ02 ) + πh N(yi | xi , βh , σh2 ) α+n h=1 where πh = nh /(α + n) and β0 and σ02 are further independent draws from G0 . This forces the conditional posterior predictive distribution of y to be linear in x: E (yirep | xi , φ, K , S) = K X h=0 Michael Elliott πh xi βh = xi β, β = K X πh βh . h=0 Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Finite Normal Mixture Models Bayesian Density Estimation Dirichlet process mixture models Dunson et al. (2007) thus extends the DP mixture model to allow to DP prior to depend on covariates: Gx = n X j=1 γj exp(−ψ || x − xj ||) bj (x)Gx∗j , bj (x) = Pn , Gx∗j ∼ DP(αG0 ) γ exp(−ψ || x − x ||) l l l DP prior is now itself a mixture of DP-distributed random basis measures at each covariate value, with the weights given by b(x). Subjects with xi close to x will have basis distributions with a high weight in Gx , with γ controlling the degree to which Gx loads across multiple draws from DP(αG0 ) Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Finite Normal Mixture Models Bayesian Density Estimation Dirichlet process mixture models Priors on the parameters governing the DP mixture prior are as follows γj | κ ∼ GAMMA(κ, nκ), log κ ∼ N(µκ , σκ2 ) log ψ ∼ N(µψ , σψ2 ) Assuming a constant variance σ 2 across the mixture components, we have G0 ≡ N(β, Σ), β ∼ N(β0 , Vβ0 ), Σ−1 ∼ W (ν0 , (ν0 Σ0 )−1 ) σ −2 ∼ GAMMA(a, b) An MCMC algorithm can be implemented as an Dunson et al. (2007). Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Finite Normal Mixture Models Bayesian Density Estimation Posterior Predictive Distribution of Regression Parameters Posterior predictive distribution at xi conditional on a draw of all other model components given by wi0 (xi )N(yirep , x0i β, σ 2 + x0i Σxi ) + k X wih (xi )N(yirep , x0i βh , σ 2 ) h=1 wi0 (xi ) will be larger for larger α and when the ith subject is currently assigned to a cluster with relatively few members. wih (xi ) will be larger for smaller α and when the ith subject covariates are closer in Euclidian distance to other subjects current assigned to cluster h. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Finite Normal Mixture Models Bayesian Density Estimation Posterior Predictive Distribution of Regression Parameters Posterior predictive distribution of the population regression slope given by multivariate normal with mean (X 0 W ∗ X )−1 X 0 W ∗ y˜ , y˜i = wi0 x0i β + K X wih x0i βh h=1 and variance (X 0 W ∗ X )−1 ( X i K X ∗ 2 2 0 xi Σxi )x0i wi xi (σ ( wih2 ) + wi0 ) (X 0 W ∗ X )−1 h=0 where wi∗ are the survey case weights. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Simulation Model Yi | Xi , σ 2 ∼ N(α0 + 10 X αh (Xi − h)+ , σ 2 ), h=1 Xi ∼ UNI (0, 10), i = 1, . . . , N = 20000. P(Ii = 1 | Hi ) = πh ∝ (1 + Hi )Hi Hi = dXi e Elements (Yi , Xi ) had ≈1/55th the selection probability when 0 ≤ Xi ≤ 1 as when 9 ≤ Xi ≤ 10. n = 200 elements were sampled without replacement for each of 50 simulations. αC = (0, 0, 0, 0, .5, .5, 1, 1, 2, 2, 4): bias important for σ 2 small. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Estimation procedures Fully-weighted, unweighted, crude trimming weight (maximum normalized value of 3). 2-class and 3-class mixture model. ˆβ , βˆ = (X 0 X )−1 X 0 y and V ˆβ = σ ˆ 2 (X 0 X )−1 β0 = βˆ Σ0 = n2 V P 2 −1 0ˆ 2 for σ ˆ = (n − p) i (yi − xi β) . α0 = 0, Ω0 = diag (1000), and A = 10. f (πi , α) = α0 + α1 wi , wi = πi−1 . Bayes density model α = .1 (limit mixture components) log κ ∼ N(−2.5, 1), log ψ ∼ N(log(30), .5) ˆβ ), Σ−1 ∼ W (2, .5I ) β ∼ N(0, V −2 σ ∼ GAMMA(.1, .1) Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Posterior predictive median of yi under 3-class model ● −10 ● 0 2 4 6 8 40 80 Variance=100 ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● y ● 0 ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ● ●● ●● −40 10 y 30 Variance=10 10 2 4 6 ● 2 4 200 ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ●● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● 6 8 y 0 ● ● ● ● ● ● 10 Variance=10000 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●● ● ● ●● ● ● −300 0 Variance=1000 ● 8 x −100 y 50 100 x 10 x ● 2 4 6 8 10 x Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Posterior predictive median of yi under Bayesian density model −10 ● ● 2 4 6 Variance=100 y 40 80 8 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●●● ● ● ● ● ●● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●●● ● ●● ● ● ● ●● ● −40 10 y 30 Variance=10 ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● 10 2 4 x 2 4 6 8 10 x 200 y ● 0 ● −300 0 ● ● ● ●● ● ● ● ● 8 10 Variance=10000 ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ●●●●● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ●●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ●● ●● ● ● ● ●●● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● −100 y 50 100 Variance=1000 ● ● 6 x ● ● ● ● ● ● ● ● ●●● ●●● ● ● ●● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ●● ● ● ● ● ●●● ●● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ●● ● ●● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● 2 4 6 8 10 x Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Simulation Results Estimator UNWT FWT TWT3 MWT2 MWT3 BDWT RMSE relative to FWT Variance log10 1 2 3 4 4.09 2.16 1.03 0.54 1 1 1 1 1.61 1.06 0.69 0.69 3.22 1.69 0.86 0.58 1.25 1.31 0.99 0.73 0.97 0.92 0.87 0.84 Michael Elliott True Coverage Variance log10 1 2 3 4 0 2 92 96 76 80 94 84 40 66 96 86 22 44 84 90 42 50 88 86 88 96 100 100 Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Simulation Results Despite similarity in plots of posterior medians of population regression slopes, 2- and 3-class model not sufficiently robust. Bayes density estimator is robust and has moderate efficiency gains over design-based estimator when model misspecification is largely absent. Bayes density estimator has substantial coverage gains over design-based estimator model misspecification is present. DP much slower and more difficult to implement. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Discussion Extensions to generalized linear models possible either by embedding normal model in a latent variable context (e.g., probit modeling), or via alternative base distributions. Quantile estimation or quantile regression may yield greater dividends, since heteroscedasticity easily accounted for in mixture models. Valuable if partial covariate information available for entire population. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion Discussion Take-home message: Advances in statistical modeling during the past 10-15 years are beginning to allow development of models sufficiently robust to compete with design-based approaches. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion References Basu, D. (1971). An Essay on the Logical Foundations of Survey Sampling, Foundations of Statistical Inference, eds. V.P. Godambe and D.A. Sproot, Toronto: Holt, Rinehard, and Winston. Blackwell, D., MacQueen, J.B. (1973). Ferguson distributions via Polya urn schemes. The Annals of Statistics, 1, 353-355. Cochran, W.G. (1977). Sampling Techniques, 3rd Ed., New York: Wiley. Dunson, D.B., Pillai, N., Park, J-H (2007). Bayesian density regression. Journal of the Royal Statistical Society, B69, 163-183. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion References Elliott, M.R., and Little, R.J.A. (2000). Model-Based Alternatives to Trimming Survey Weights. Journal of Official Statistics, 16, 191-209. Elliott, M.R. (2007). Bayesian Weight Trimming for Generalized Linear Regression Models. Survey Methodology, 33, 23-34. Ericson, W.A. (1969). Subjective Bayesian Models in Sampling Finite Populations. Journal of the Royal Statistical Society, B31, 195-234. Ferguson,T.S. (1973). A Bayesian analysis of some non-parametric problems. The Annals of Statistics, 1, 209-230. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion References Ghosh, M., and Lahiri, P. (1988). Bayes and Empirical Bayes Analysis in Multistage Sampling Statistical Decision Theory and Related Topics, 1, 195-212. Green, P.J. (1995). Reversible Jump Markov Chain Monte Carlo Computation and Bayesian model determination. Biometrika, 82, 711-732. Hansen, M.H., and Hurwitz, W.N. (1943). On the Theory of Sampling from Finite Populations. The Annals of Mathematical Statistics, 14, 333-362. Holt, D., Smith, T.M.F. (1979). Poststratification. Journal of the Royal Statistical Society, A142, 33-46. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion References Kish, L. (1965). Survey Sampling, New York: Wiley. Little, R.J.A., Rubin, D.B. (2002). Statistical Analysis with Missing Data, 2n d Ed., New York: Wiley. Little, R.J.A. (2004). To Model or Not to Model? Competing Modes of Inference for Finite Population Sampling. Journal of the American Statistical Association, 99, 546-556. MacEachern, S.N. (1994). Estimating Normal Means with a Conjugate Style Dirichlet Process Prior. in Statistics: Simulation and Computation, 23, 727-741 Rubin, D.B. (1987). Multiple Imputation for Non-response in Surveys, New York: Wiley. Michael Elliott Accounting for Complex Sample Designs via Mixture Models Background Robust Models Simulation Discussion References Scott, A.J. (1977). Large Sample Posterior Distributions in Finite Populations. The Annals of Mathematical Statistics, 42, 1113-1117. Zheng, H., and Little, R. J. A. (2003). Penalized Spline Model-based Estimation of the Finite Population Total from Probability-proportional-to-size Samples. Journal of Official Statistics, 19, 99-117 Zheng, H., and Little, R. J. A. (2005). Inference for the Population Total from Probability-proportional-to-size Samples Based on Predictions from a Penalized Spline Nonparametric Model. Journal of Official Statistics, 21, 1-20. Michael Elliott Accounting for Complex Sample Designs via Mixture Models
© Copyright 2025