Level sets estimation of random compact sets P. Heinrich, R. S. Stoica and C. V. Tran Universit´e Lille 1 - Laboratoire Paul Painlev´e Workshop in Spatial Statistics and Image Analysis in Biology Avignon, May 9-11 2012 Introduction : motivating example Level sets : a tool for compact random sets averaging Estimation of level sets Examples of application Conclusions and perspectives A practical application (1) Pattern detection in spatial data : ◮ the data d : image analysis, epidemiology, galaxy catalogues ◮ detect and characterise the pattern “hidden” in the data : objects, cluster pattern or filamentary network ◮ hypothesis : the pattern is the outcome γ of a stochastic process Γ ◮ possible solution in this context : probabilistic modelling and maximisation A practical application (2) Gibbs modelling framework ◮ Markov random fields, marked point processes, etc. ◮ general structure of the probability density : h(γ|θ) = exp [−Ud (γ|θ) − Ui (γ|θ)] α(θ) and also the necessary mathematical details so that everything is well defined ... A practical application (3) Gibbs modelling framework (continued) ◮ Ud (γ|θ) : this term is related to the objects location in the data field (inhomogeneous process) ◮ Ui (γ|θ) : this term is related to the object interaction and to the morphology of the pattern (prior model, regularisation term) ◮ α(θ) : normalisation constant (not always available analytically) ◮ pattern estimator : γ b = arg max{h(γ|θ)} = arg min{Ud (γ|θ) + Ui (γ|θ)} γ∈Ω γ∈Ω (1) A practical application (4) Some concluding remarks ◮ simulated annealing algorithm : convergence towards the uniform distribution on the solution sub-space given by (1) ◮ the model parameters are not always known ... ◮ the convergence is difficult to be stated ◮ ... or the solution is not always unique (continuous models and/or priors on the model parameters) ◮ ⇒ a real need to average the obtained solution in order to obtain a much more robust solution Idea : use level sets as a tool for averaging random patterns Level sets : basic notions and definitions (1) Random compact sets and coverage function : ◮ ◮ ◮ (Ω, A, P) : probability space (W = [0, 1]d , B, ν) : measure space (... where the data field leaves) with B the corresponding Borel σ−algebra and ν the Lebesgue measure C : the class of compact sets in W A random compact set Γ in W is a random map from Ω to C that is measurable in the sense ∀C ∈ C, {ω : Γ(ω) ∩ C 6= ∅} ∈ A The coverage function is given by : p(w ) = P(w ∈ Γ) Level sets : basic notions and definitions (2) Level or Quantile sets : for α ∈ [0, 1] the (deterministic) α−level set is Qα = {w ∈ W : p(w ) > α} or for simplicity {p > α}. Vorob’ev expectation : the Borel set EV Γ such that ν (EV Γ) = E [ν (Γ)] and {p > α∗ } ⊂ EV Γ ⊂ {p ≥ α∗ }, where α∗ = inf{α ∈ [0, 1] : ν (Qα ) ≤ E [ν (Γ)]}. The Vorob’ev expectation is the α∗ −level set that matches the mean volume of Γ. Some known results and properties (1) ν(W ) 6 F− (α0 ) ⊂ F (α0 ) • - 0 α0 α1 α2 1 Figure: Behaviour of function F (α) = ν (Qα ) Remarks : ◮ F is c`adl`ag with constant regions (plateaux) ◮ constant regions of p(w ) ⇒ discontinuities of ν (Qα ) ◮ constant regions of ν (Qα ) ⇒ discontinuities of p(w ) Some known results and properties (2) Vorob’ev expectation : ◮ it is unique provided F (α) = ν (Qα ) = ν ({p > α}) is continuous at α∗ ; then we have EV Γ = {p ≥ α∗ } ◮ it minimises B → E [ν (B△Γ)] under the constraint ν (B) = E [ν (Γ)], where △ is the symmetric difference (Molchanov, 05). More generally, on level sets : ◮ p(w ) not always available in an analytical closed form ◮ the level sets cannot be computed for all the points w ∈ W ⇒ discretisation should be considered Plug-in estimation (1) Definition ◮ consider n i.i.d. copies Γ1 , Γ2 , . . . , Γn of Γ ◮ the empirical counterpart of p(w ) n pn (w ) = 1X 1{w ∈Γi } n i =1 ◮ the plug-in estimator Qn,α = {pn > α} Plug-in estimation (2) Properties : the problem was deeply studied in the literature ◮ ◮ some references : (Molchanov, 87, 90, 98), (Cuevas, 97, 06) and many others L1 −consistency under weak assumptions → p(w ) does not need to be continuous ◮ Hausdorff distance : similar consistency results using some extra assumptions ◮ rates of convergence and asymptotic normality : regularity conditions on p(w ) Aim of our work ◮ plug-in estimator that takes into account the discretisation effects ◮ estimator for the Vorob’ev expectation → its definition contains another quantity that need approximation ... A new level-set estimator (1) Discretisation : for any Borel set B in W and r ∈ 2−N , its corresponding grid approximation is G Br = [w , w + r )d . w ∈B∩r Zd Regularity : the “upper box counting dimension” of ∂B is dimbox (∂B) = lim sup r →0 log Nr (∂B) , − log r with Nr (∂B) = Card{w ∈ r Zd : [w , w + r )d ∩ ∂B 6= ∅}. A new level-set estimator (2) Proposition Assume that dimbox (∂B) < d. For all ε > 0, there exists rε such that 0 < r < rε ⇒ ν (B r △B) ≤ r d−dimbox (∂B)−ε . Proposition Assume that dimbox (∂Γ) ≤ d − κ with probability one for some κ > 0. For all α such that ν ({p = α}) = 0, (i) with probability 1, r lim ν Qn,α △Qα = 0 r →0 n→∞ (ii) for all ε > 0, 2 r E ν Qn,α △Qα ≤ r κ + 2e−2nε + F (α − ε) − F (α + ε). The proof is an extension of the result in (Cuevas, 06). Vorob’ev expectation estimator (1) Kovyazin’s mean : the empirical counter-part of the Vorob’ev expectation. That is the Borel set Kn such that n ν (Kn ) = 1X ν (Γi ) n i =1 and {pn > α∗n } ⊂ Kn ⊂ {pn ≥ α∗n }, where α∗n = inf{α ∈ [0, 1] : ν ({pn > α}) ≤ ν (Kn }.) Theorem Assume that ν ({p = α∗ }) = 0. Then, with probability one, lim ν (Kn △EV Γ) = 0. n→∞ The proof revisits the result given by (Kovyazin, 86). Vorob’ev expectation estimator (2) Grid approximation of Kn : this is the estimator we propose. That is the Borel set Kn,r such that {pn > α∗n,r }r ⊂ Kn,r ⊂ {pn ≥ α∗n,r }r , where α∗n,r = inf{α ∈ [0, 1] : ν ({pn > α}r ) ≤ ν (Kn )}. Some remarks : ◮ quite strong assumption : ν (Kn ) is computed exactly ... ◮ an alternative idea may consider directly the discretisation of Γ or Kn , but this does not guarantee a mean volume equal to ν (Kn ) ... ◮ still, in practice ... Consistency of the Vorob’ev estimator Theorem Assume that dimbox (∂Γ) ≤ d − κ with probability one for some κ > 0, and that ν ({p = α∗ }) = 0 and ν ({p = β ∗ }) = 0 with β ∗ = sup{α ∈ [0, 1] : ν ({p > α}) ≥ E [ν (Γ)]}. Then, we have almost surely lim ν (Kn,r △EV Γ) = 0. r →0 n→∞ Proof. We write that ν (Kn,r △EV Γ) ≤ ν (Kn,r △Kn ) + ν (Kn △EV Γ) and use Theorem 1 and two lemmas to conclude. For technical details, a draft is available on demand ... Cosmic filaments : simulated annealing detection (Stoica, Martinez and Saar, 07,10) 12 10 8 6 4 2 100 90 80 70 50 60 50 40 40 30 30 20 a) 20 10 10 10 5 100 0 90 80 50 70 40 60 50 30 40 20 b) 30 10 20 0 10 Figure: a) Original data. b) Cylinder configuration detected. Cosmic filaments : level sets averaging 14000 12000 10000 8000 6000 4000 2000 a) 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 b) Figure: a) Behaviour of the level set volume. b) Estimated Vorob’ev expectation. Epidemiology (veterinary context) Disease : sub-clinical mastitis for diary herds ◮ points → farms location ◮ to each farm → disease score (continuous variable) ◮ clusters pattern detection : regions where there is a lack of hygiene or rigour in farm management 300 250 200 150 100 50 0 0 50 100 150 200 250 300 350 Figure: The spatial distribution of the farms outlines almost the entire French territory (INRA Avignon). Epidemiology : sub-clinical mastitis data (Stoica, Gay and Kretzschmar, 07) 1 300 0.9 50 0.8 250 0.7 100 200 0.6 150 0.5 150 0.4 200 100 0.3 250 0.2 50 0.1 300 a) 0 50 100 150 200 250 300 350 b) 0 50 100 150 200 250 300 Figure: Disease data scores and coordinates for the year 1996 : a) cluster pattern (disk configuration) detected ; b) Level sets. 0 Conclusion : ◮ estimator including the discretisation effects ◮ averaging the shape of the pattern ... Perspectives : ◮ ... provided the model is correct ... ◮ relax hypotheses ◮ what is the variance of the pattern ? Acknowledgements : this work was done together with wonderful co-authors and also with help of some very generous people ... Some of them are today with us :) ... GDR G´eom´etrie Stochastique Aim : ◮ network of scientists ◮ no obligations at all ... ◮ joining mathematicians sharing common research interests but not only ... also the scientists from the corresponding application domains ... Contact : ◮ Pierre.Calka@univ-rouen.fr ◮ web page : http://gdr-geostoch.math.cnrs.fr/
© Copyright 2025