How to find the shape of a banana? Ivan Mizera, University of Alberta Vancouver, June 2009 Gratefully acknowledging the support of the NSERC of Canada Prologue: boys and girls... 1 ... or that quantiles are important Univariate quantiles standard and nonstandard applications: Parzen (2004) quantile regression: Koenker (2005) Multivariate quantiles Serfling (2002), Koenker (2005), Wei (2008) Depth Hodges (1955), Tukey (1975), ... , Zuo and Serfling (2000) Special case (often preceding the general): “medians” ... , Small (1990) Some multivariate quantile proposals: 2 3.8 4.0 4.2 4.4 4.6 Multivariate normal contours 1.0 1.5 2.0 2.5 3.0 “The Choice of a Real Statistician” 3 Minimization of a L1-norm-type functions 0.8 0.3 0.6 0.2 0.4 0.1 0.2 0 0 −0.1 −0.2 −0.2 −0.4 −0.3 −0.6 −0.8 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −0.4 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 various adjustments needed, in particular for affine equivariance 4 Minimization of the volume of simplices “Oja depth” 5 Halfspace depth 4.6 4.4 4.2 4.0 log of height (in centimeters) 4.8 Envelope of directional quantile lines (p = .1) 1.5 2.0 2.5 3.0 log of weight (in kilograms) “Directional quantile envelopes” 6 Directional quantile envelopes 4.6 4.4 4.2 4.0 3.8 log of height (in centimeters) 4.8 Directional quantile envelopes (p = 0.1 x 2^(-5:2)) 1.0 1.5 2.0 2.5 3.0 log of weight (in kilograms) Halfspace (Tukey) depth contours 7 Various desiderata Old: - connection to univariate case - allow for analogs of medians, L-statistics, etc. - equivariance properties (affine, orthogonal) - ease of computation New: - interpretability It’s always some nested contours - but what do they mean?? - introduction of covariates (multivariate quantile regression) Wei (2008), Kong and Mizera (2008) http://arxiv.org/abs/0805.0056v1 So, how about “curved” situations (banana)? See 1., 2., 3., 4., 5., 6. below 8 1. Transform the coordinates 80 60 40 log of height (in centimeters) 100 Directional quantile envelopes (p = .0125, 0.025, .05, .1, .2, .4) 5 10 15 20 log of weight (in kilograms) (depth contours fitted to logs and then backtransformed) 9 80 60 40 nepal$height 100 Original data 5 10 15 20 nepal$weight “As There Were Created” 10 4.2 4.0 3.8 log(nepal$height) 4.4 4.6 And logged data 0.5 1.0 1.5 2.0 2.5 3.0 log(nepal$weight) “Single Most Useful Transformation” 11 That’s it Tukey’s ladder of transformations, Box-Cox maybe - the approach probably having most sense of them all - but often hard to get - and quite limited to the two-dimensional case 12 2. Quantile regression in polar coordinates Wei (2008): JASA, 103(108), Figure B.1, page 409, right upper panel - shows (undesirable) sensitivity on the selection of the center of the polar coordinates 13 3. transparency intentionally left blank 14 4. transparency intentionally left blank 15 5. “Delaunay depth”: 8 6 4 2 0 −2 −4 −3 −2 −1 0 1 2 3 Izem, Souvaine, Rafalin (2006) 16 - how far are we from the boundary? 8 6 4 2 0 −2 −4 −3 −2 −1 0 1 2 3 Problems: what is “boundary”? what is “far”? 17 Perspective: somewhat off... 3 2 1 0 3 2 1 0 −1 −2 −3 8 6 4 2 0 −2 −4 18 Trick: preliminary density estimation 12 10 8 6 4 2 0 −2 −4 −3 −2 −1 0 1 2 3 Add “undata”, and do the weighted version of the original 19 Isn’t it better? 8 6 4 2 0 −2 −4 −3 −2 −1 0 1 2 3 Really? If yes, what does it say? 20 New perspective 0.2 0.15 0.1 0.05 0 3 2 1 0 −1 −2 −3 8 6 4 2 0 −2 −4 And doesn’t it depend too much on the triangulation and undata? 21 6. α-shapes: erase all empty discs Edelsbrunner, Kirkpatrick and Seidel (1983), ... A survey: Edelsbrunner (200?); use: “persistent homology” A related concept: Hall, Park and Turlach (2002) 22 Some other greek letter Adapted to quantilistic needs: (i) Erase not discs, but “semi-infinite” extensions: paraboloids, for instance (ii) Not only empty, but containing less than a prescribed proportion k/n of data mass (not every boundary-seeking method suitable for this) - “a curvilinear depth” - recall depth contour: erase all halfspaces that contain less than k = 0, 1, 2, . . . points - for increasing k, we have nested contours - in fact, it is “curved depth”, with all ensuing properties - in particular, well-defined population analog - hopefully with some meaning too... - ...and efficient algorithm... - however, some “bandwith” selection inevitable 23 Science: 24 Science: - the exploration of impasses 24 Science: - the exploration of impasses - to see which are not 24 References Edelsbrunner, H., Kirkpatrick, D. G. and Seidel, R. (1983). On the shape of a set of points in the plane. IEEE Trans. Inform. Theory 29 551–559. Hall, P., Park, B. U. and Turlach, B. A. (2002). Rolling-ball method for estimating the boundary of the support of a point-process intensity. Ann. I. H. Poincar´ e 6 959–971. Hodges, J. L., Jr (1955). A bivariate sign test. Ann. Math. Statist. 26 523–527. Koenker, R. (2005). Quantile regression. Cambridge University Press, Cambridge. Parzen, E. (2004). Quantile probability and statistical data modeling. Statist. Sci. 19 652–662. Serfling, R. (2002). A depth function and a scale curve based ons spatial quantiles. In Statistical Data Analysis Based on the L1 -Norm and Related Methods (Y. Dodge, ed.) 25–38. Birkh¨ auser Verlag, Basel. Small, C. G. (1990). A survey of multidimensional medians. International Statistical Review 58 263–277. Tukey, J. W. (1975). Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians (Vancouver, B. C., 1974), Vol. 2 523–531. Canad. Math. Congress, Quebec. Wei, Y. (2008). An approach to multivariate covariate-dependent quantile contours with application to bivariate conditional growth charts. J. Amer. Statist. Assoc. 103 397–409. Zuo, Y. and Serfling, R. (2000). On the performance of some robust nonparametric location measures relative to a general notion of multivariate symmetry. J. Statist. Plann. Inference 84 55–79. 25
© Copyright 2025