Level sets estimation of random compact sets

Level sets estimation of random compact sets
P. Heinrich, R. S. Stoica and C. V. Tran
Universit´e Lille 1 - Laboratoire Paul Painlev´e
Workshop in Spatial Statistics and Image Analysis in Biology
Avignon, May 9-11 2012
Introduction : motivating example
Level sets : a tool for compact random sets averaging
Estimation of level sets
Examples of application
Conclusions and perspectives
A practical application (1)
Pattern detection in spatial data :
◮
the data d : image analysis, epidemiology, galaxy catalogues
◮
detect and characterise the pattern “hidden” in the data :
objects, cluster pattern or filamentary network
◮
hypothesis : the pattern is the outcome γ of a stochastic
process Γ
◮
possible solution in this context : probabilistic modelling and
maximisation
A practical application (2)
Gibbs modelling framework
◮
Markov random fields, marked point processes, etc.
◮
general structure of the probability density :
h(γ|θ) =
exp [−Ud (γ|θ) − Ui (γ|θ)]
α(θ)
and also the necessary mathematical details so that everything
is well defined ...
A practical application (3)
Gibbs modelling framework (continued)
◮
Ud (γ|θ) : this term is related to the objects location in the
data field (inhomogeneous process)
◮
Ui (γ|θ) : this term is related to the object interaction and to
the morphology of the pattern (prior model, regularisation
term)
◮
α(θ) : normalisation constant (not always available
analytically)
◮
pattern estimator :
γ
b = arg max{h(γ|θ)} = arg min{Ud (γ|θ) + Ui (γ|θ)}
γ∈Ω
γ∈Ω
(1)
A practical application (4)
Some concluding remarks
◮
simulated annealing algorithm : convergence towards the
uniform distribution on the solution sub-space given by (1)
◮
the model parameters are not always known ...
◮
the convergence is difficult to be stated
◮
... or the solution is not always unique (continuous models
and/or priors on the model parameters)
◮
⇒ a real need to average the obtained solution in order to
obtain a much more robust solution
Idea : use level sets as a tool for averaging random patterns
Level sets : basic notions and definitions (1)
Random compact sets and coverage function :
◮
◮
◮
(Ω, A, P) : probability space
(W = [0, 1]d , B, ν) : measure space (... where the data field
leaves) with B the corresponding Borel σ−algebra and ν the
Lebesgue measure
C : the class of compact sets in W
A random compact set Γ in W is a random map from Ω to C that
is measurable in the sense
∀C ∈ C,
{ω : Γ(ω) ∩ C 6= ∅} ∈ A
The coverage function is given by :
p(w ) = P(w ∈ Γ)
Level sets : basic notions and definitions (2)
Level or Quantile sets : for α ∈ [0, 1] the (deterministic) α−level
set is
Qα = {w ∈ W : p(w ) > α}
or for simplicity {p > α}.
Vorob’ev expectation : the Borel set EV Γ such that
ν (EV Γ) = E [ν (Γ)]
and
{p > α∗ } ⊂ EV Γ ⊂ {p ≥ α∗ },
where
α∗ = inf{α ∈ [0, 1] : ν (Qα ) ≤ E [ν (Γ)]}.
The Vorob’ev expectation is the α∗ −level set that matches the
mean volume of Γ.
Some known results and properties (1)
ν(W ) 6
F− (α0 )
⊂
F (α0 )
•
-
0
α0
α1
α2 1
Figure: Behaviour of function F (α) = ν (Qα )
Remarks :
◮
F is c`adl`ag with constant regions (plateaux)
◮
constant regions of p(w ) ⇒ discontinuities of ν (Qα )
◮
constant regions of ν (Qα ) ⇒ discontinuities of p(w )
Some known results and properties (2)
Vorob’ev expectation :
◮
it is unique provided F (α) = ν (Qα ) = ν ({p > α}) is
continuous at α∗ ; then we have
EV Γ = {p ≥ α∗ }
◮
it minimises
B → E [ν (B△Γ)]
under the constraint ν (B) = E [ν (Γ)], where △ is the
symmetric difference (Molchanov, 05).
More generally, on level sets :
◮
p(w ) not always available in an analytical closed form
◮
the level sets cannot be computed for all the points w ∈ W
⇒ discretisation should be considered
Plug-in estimation (1)
Definition
◮
consider n i.i.d. copies Γ1 , Γ2 , . . . , Γn of Γ
◮
the empirical counterpart of p(w )
n
pn (w ) =
1X
1{w ∈Γi }
n
i =1
◮
the plug-in estimator
Qn,α = {pn > α}
Plug-in estimation (2)
Properties :
the problem was deeply studied in the literature
◮
◮
some references : (Molchanov, 87, 90, 98), (Cuevas, 97, 06)
and many others
L1 −consistency under weak assumptions → p(w ) does not
need to be continuous
◮
Hausdorff distance : similar consistency results using some
extra assumptions
◮
rates of convergence and asymptotic normality : regularity
conditions on p(w )
Aim of our work
◮
plug-in estimator that takes into account the discretisation
effects
◮
estimator for the Vorob’ev expectation → its definition
contains another quantity that need approximation ...
A new level-set estimator (1)
Discretisation : for any Borel set B in W and r ∈ 2−N , its
corresponding grid approximation is
G
Br =
[w , w + r )d .
w ∈B∩r Zd
Regularity : the “upper box counting dimension” of ∂B is
dimbox (∂B) = lim sup
r →0
log Nr (∂B)
,
− log r
with
Nr (∂B) = Card{w ∈ r Zd : [w , w + r )d ∩ ∂B 6= ∅}.
A new level-set estimator (2)
Proposition
Assume that dimbox (∂B) < d. For all ε > 0, there exists rε such
that
0 < r < rε ⇒ ν (B r △B) ≤ r d−dimbox (∂B)−ε .
Proposition
Assume that dimbox (∂Γ) ≤ d − κ with probability one for some
κ > 0. For all α such that ν ({p = α}) = 0,
(i) with probability 1,
r
lim ν Qn,α
△Qα = 0
r →0
n→∞
(ii) for all ε > 0,
2
r
E ν Qn,α
△Qα ≤ r κ + 2e−2nε + F (α − ε) − F (α + ε).
The proof is an extension of the result in (Cuevas, 06).
Vorob’ev expectation estimator (1)
Kovyazin’s mean : the empirical counter-part of the Vorob’ev
expectation. That is the Borel set Kn such that
n
ν (Kn ) =
1X
ν (Γi )
n
i =1
and
{pn > α∗n } ⊂ Kn ⊂ {pn ≥ α∗n },
where
α∗n = inf{α ∈ [0, 1] : ν ({pn > α}) ≤ ν (Kn }.)
Theorem
Assume that ν ({p = α∗ }) = 0. Then, with probability one,
lim ν (Kn △EV Γ) = 0.
n→∞
The proof revisits the result given by (Kovyazin, 86).
Vorob’ev expectation estimator (2)
Grid approximation of Kn : this is the estimator we propose. That
is the Borel set Kn,r such that
{pn > α∗n,r }r ⊂ Kn,r ⊂ {pn ≥ α∗n,r }r ,
where
α∗n,r = inf{α ∈ [0, 1] : ν ({pn > α}r ) ≤ ν (Kn )}.
Some remarks :
◮
quite strong assumption : ν (Kn ) is computed exactly ...
◮
an alternative idea may consider directly the discretisation of
Γ or Kn , but this does not guarantee a mean volume equal to
ν (Kn ) ...
◮
still, in practice ...
Consistency of the Vorob’ev estimator
Theorem
Assume that dimbox (∂Γ) ≤ d − κ with probability one for some
κ > 0, and that ν ({p = α∗ }) = 0 and ν ({p = β ∗ }) = 0 with
β ∗ = sup{α ∈ [0, 1] : ν ({p > α}) ≥ E [ν (Γ)]}. Then, we have
almost surely
lim ν (Kn,r △EV Γ) = 0.
r →0
n→∞
Proof.
We write that
ν (Kn,r △EV Γ) ≤ ν (Kn,r △Kn ) + ν (Kn △EV Γ)
and use Theorem 1 and two lemmas to conclude. For technical
details, a draft is available on demand ...
Cosmic filaments : simulated annealing detection
(Stoica, Martinez and Saar, 07,10)
12
10
8
6
4
2
100
90
80
70
50
60
50
40
40
30
30
20
a)
20
10
10
10
5
100
0
90
80
50
70
40
60
50
30
40
20
b)
30
10
20
0
10
Figure: a) Original data. b) Cylinder configuration detected.
Cosmic filaments : level sets averaging
14000
12000
10000
8000
6000
4000
2000
a)
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
b)
Figure: a) Behaviour of the level set volume. b) Estimated Vorob’ev
expectation.
Epidemiology (veterinary context)
Disease : sub-clinical mastitis for diary herds
◮
points → farms location
◮
to each farm → disease score (continuous variable)
◮
clusters pattern detection : regions where there is a lack of
hygiene or rigour in farm management
300
250
200
150
100
50
0
0
50
100
150
200
250
300
350
Figure: The spatial distribution of the farms outlines almost the entire
French territory (INRA Avignon).
Epidemiology : sub-clinical mastitis data
(Stoica, Gay and Kretzschmar, 07)
1
300
0.9
50
0.8
250
0.7
100
200
0.6
150
0.5
150
0.4
200
100
0.3
250
0.2
50
0.1
300
a)
0
50
100
150
200
250
300
350
b)
0
50
100
150
200
250
300
Figure: Disease data scores and coordinates for the year 1996 : a) cluster
pattern (disk configuration) detected ; b) Level sets.
0
Conclusion :
◮
estimator including the discretisation effects
◮
averaging the shape of the pattern ...
Perspectives :
◮
... provided the model is correct ...
◮
relax hypotheses
◮
what is the variance of the pattern ?
Acknowledgements :
this work was done together with wonderful co-authors and also
with help of some very generous people ... Some of them are today
with us :) ...
GDR G´eom´etrie Stochastique
Aim :
◮
network of scientists
◮
no obligations at all ...
◮
joining mathematicians sharing common research interests but
not only ... also the scientists from the corresponding
application domains ...
Contact :
◮
Pierre.Calka@univ-rouen.fr
◮
web page : http://gdr-geostoch.math.cnrs.fr/