Workshop On Multivariate Analysis Today Programme and Book of

WOMAT
Workshop On Multivariate Analysis Today
Programme and Book of abstracts
Scientific organisers: Frank Critchley (OU), Bing Li (Penn State), Hannu Oja (Turku)
Local organisers: Sara Griffin, Tracy Johns, Radka Sabolova, Germain Van Bever
Contents
Programme
3
Talk abstracts
5
Yanyuan Ma: A Validated Information Criterion (VIC) to Find the Structural Dimension . . . .
5
Jo˜ao Branco: High dimensionality: the trouble with Mahalanobis distance . . . . . . . . . . . . .
5
Tim Cannings: Random projection ensemble classification . . . . . . . . . . . . . . . . . . . . . .
5
Kjersti Aas: Pair-copula constructions–even more flexible than copulas . . . . . . . . . . . . . . .
6
Sara Fontanella: A Bayesian approach to sparse latent variables modelling: Factor Analysis and
Multidimensional Item Response Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
Shahin Tavakoli: Dynamics of DNA Minicircles in Motion via Fourier Analysis of Functional
Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
Lutz Duembgen: New Algorithms for M -Estimation of Multivariate Scatter and Location . . . .
8
Jim Smith: Chain event graphs for discrete multivariate processes . . . . . . . . . . . . . . . . .
8
John Kent: Some new perspectives on partial least squares . . . . . . . . . . . . . . . . . . . . .
8
Poster abstracts
10
Comparison of statistical methods for multivariate outliers detection . . . . . . . . . . . . . . . . 10
On point estimation of the abnormality of a Mahalanobis distance . . . . . . . . . . . . . . . . . 11
Sparse Linear Discriminant Analysis with Common Principal Components . . . . . . . . . . . . . 12
Recovering Fisher linear discriminant subspace by Invariate Coordinate Selection . . . . . . . . . 13
Hilbertian Fourth Order Blind Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2
Programme
9:00
Yanyuan Ma (University of South Carolina): A Validated Information Criterion to Find the Structural Dimension
9:30
Jo˜ao Branco (CEMAT, IST, Lisbon): High dimensionality: the trouble
with Mahalanobis distance
10:00
Tim Cannings (Cambridge): Random projection ensemble classification
10:30
Coffee & Poster Session
11:00
Kjersti Aas (Norwegian Computing Centre): Pair-copula constructions–
even more flexible than copulas
11:30
Sara Fontanella (The Open University): A Bayesian approach to sparse
latent variables modelling: Factor Analysis and Multidimensional Item
Response Theory
12:00
Shahin Tavakoli (Cambridge): Dynamics of DNA Minicircles in Motion
via Fourier Analysis of Functional Time Series
12:30
Lutz Duembgen (Bern): New Algorithms for M -Estimation of Multivariate
Scatter and Location
13:00
Lunch and Poster Session
14:30
Jim Smith (Warwick): Chain event graphs for discrete multivariate processes
15:00
John Kent (Leeds): Some new perspectives on partial least squares
15:30
Roundtable Discussion
16:00
Tea and Departures
3
Talk abstracts
A Validated Information Criterion to Find the Structural Dimension
Yanyuan Ma, University of South Carolina
E-mail: yanyuanma@stat.sc.edu
A crucial component in performing sufficient dimension reduction is to determine the structural dimension of the reduction model. We propose a novel information criterion-based method to achieve this
purpose, whose special feature is that when examining the goodness-of-fit of the current model, we need to
obtain model evaluation by using an enlarged candidate model. Although the procedure does not require
estimation under the enlarged model with dimension k + 1, the decision on how well the current model
with dimension k fits relies on the validation provided by the enlarged model. This leads to the name
validated information criterion, calculated as VIC(k). The method is different from existing information
criteria based model selection methods. It breaks free from the dependence on the connection between
dimension reduction models and their corresponding matrix eigenstructures, which heavily relies on a
linearity condition that we no longer assume. Its consistency is proved and its finite sample performance
is demonstrated numerically. (Joint work with Xinyu Zhang.)
High dimensionality: the trouble with Mahalanobis distance
Jo˜ao Branco, Ana M. Pires, CEMAT, Instituto Superior Tecnico, Lisbon
The recent massive production of high-dimensional data has brought great difficulties and concomitant
challenges to statistics since its usual methods were not designed to cope with such kind of data. High
dimensionality triggers the curse of dimensionality and unexpected behaviour of some statistical tools may
surprise even those aware of the intricacies of multidimensional spaces with a large number of dimensions.
We look at the Mahalanobis distance, a tool that is crucial to the functioning of the traditional
multivariate statistical methods, and see how it progresses as p approaches n and when it is greater than
n. Can the Mahalanobis distance keep the fundamental role in high-dimensional spaces as it does in low
dimensional spaces (p n)? And if it does not what are the consequences? We will attempt to answer
these questions.
Random projection ensemble classification
Timothy I. Cannings and Richard J. Samworth, Statistical Laboratory, University of Cambridge
We introduce a very general method for high-dimensional classification, based on careful combination
of the results of applying an arbitrary base classifier to random projections of the feature vectors into a
lower-dimensional space. In one special case presented here, the random projections are divided into nonoverlapping blocks, and within each block we select the projection yielding the smallest estimate of the test
error. Our random projection ensemble classifier then aggregates the results of applying the base classifier
on the selected projections, with a data-driven voting threshold to determine the final assignment. We
provide theoretical understanding to justify the methodology, and a simulation comparison with several
other popular high-dimensional classifiers reveals its excellent finite-sample performance.
5
Pair-copula constructions–even more flexible than copulas
Kjersti Aas, Norwegian Computing Centre
A copula is a multivariate distribution with standard uniform marginal distributions. While the literature on copulas is substantial, most of the research is still limited to the bivariate case. However,
some years ago hierarchical copula-based structures were proposed as an alternative to the standard copula methodology. One of the most promising of these structures is the pair-copula construction (PCC).
The PCC modeling scheme is based on a decomposition of a multivariate density into a cascade of pair
copulae, applied on original variables and on their conditional and unconditional distribution functions.
Each pair copula can be chosen arbitrarily and the full model exhibit complex dependence patterns such
as asymmetry and tail dependence. In this talk I will give an introduction to pair-copula constructions
and apply the methodology to a 19-dimensional financial data set.
A Bayesian approach to sparse latent variables modelling: Factor
Analysis and Multidimensional Item Response Theory
Sara Fontanella, N. Trendafilov, P. Valentini, L. Fontanella
In the last decades, sparse modeling has inspired many studies in different research fields, such as
statistics, machine learning and bioinformatics. Its importance is due to the following main advantages:
first, it enhances the interpretability of the results; second, it reflects reality, as any real-world system is
sparse and third, predictive performance is improved, since the sparsity helps prevent overfitting.
In this work, we consider sparse modeling in the context of two multivariate statistical techniques:
Factor Analysis (FA) and Multidimensional Item Response Theory (MIRT). They are strongly related to
each other in terms of modeling despite the different types of data they are applied to.
FA is a well-known model-based multivariate technique used to describe observed continuous variables
by means of a smaller set of latent factors. Item response theory (IRT) models the probability for a correct
response (to a test, questionnaire, etc) as function of disjoint sets of parameters, related respectively to
the person and the item. MIRT is its multidimensional extension.
Both FA and MIRT suffer from solution/factor indeterminacy. In particular, the main issue to be
addressed is the rotational invariance of the final solution: for a given set of data, any orthogonal transformation of the matrix of parameters would produce the same covariance structure. In this context, we
show that the sparsity plays a double role: on one side it improves the interpretability of the results, while,
on the other side, it allows to overcome the rotational indeterminacy.
To this end, we follow a Bayesian approach to sparse modeling. The prior belief in sparsity is modeled
by a sparse-inducing prior distribution on the parameters. In this context, a popular choice is to apply
spike and slab priors, which present several computational advantages. A spike and slab prior assumes
that the parameters of interest are mutually independent with a two-point mixture distribution made up
of a degenerate distribution at zero (the spike), to provide strong shrinkage near zero and a uniform flat
distribution (the slab), to allow signals to escape strong shrinkage. The performances of the considered
methods are evaluated through simulation studies.
6
Dynamics of DNA Minicircles in Motion via Fourier Analysis of
Functional Time Series
Shahin Tavakoli, Statslab, University of Cambridge
We consider the problem of studying the dynamics of DNA minicircles that are vibrating in solution.
At a large scale, DNA minicircles are modelled as elastic rods, and the problem of understanding their dynamics can be recasted into the problem of estimating the second order structure of a stationary functional
time series (FTS). We tackle this problem by a frequency domain approach, where we estimate the spectral
density operators (or spectra) of the DNA minicircle. We then carry out hypothesis tests to compare the
spectra of two specific DNA minicircles. The comparison is broken down to a hierarchy of stages: at a
global level, we compare the spectral density operators of the two DNA minicircles, across frequencies and
curvelength, based on a Hilbert-Schmidt criterion; then, we localize any differences to specific frequencies;
and, finally, we further localize any differences along the length of the DNA minicircles, i.e. in physical
space. A hierarchical multiple testing approach guarantees control of the averaged false discovery rate
over the selected frequencies. In this sense, we are able to attribute any differences to distinct dynamic
(frequency) and spatial (curvelength) contributions.
Keywords. Functional Data Analysis; Spectral Analysis; DNA Minicircle; Molecular Dynamics;
Multiple Testing.
7
New Algorithms for M -Estimation of Multivariate Scatter and
Location
Lutz Duembgen, Bern
We present new algorithms for M-estimators of multivariate scatter and location and for symmetrized
M-estimators of multivariate scatter. The new algorithms are considerably faster than currently used
fixed-point and other algorithms. The main idea is to utilize local parametrizations of scatter matrices via
matrix exponentials with a corresponding second order Taylor expansion of the target functional and to
devise a partial Newton-Raphson procedure. In connection with symmetrized M-estimators we work with
incomplete U-statistics to accelerate our procedures initially.
This talk is based on joint work with Klaus Nordhausen (Turku) and Heike Schuhmacher (Bern).
Chain event graphs for discrete multivariate processes
Jim Smith, Warwick
Statistical models of multivariate discrete processes often need to express various hypotheses about
how events might unfold and associated hypotheses about the symmetries within these unfoldings. A
natural way to express such hypotheses is via a statistical model on a finite set of atoms structured around
collections of different probability trees with different symmetries. One such family is the class of Chain
Event Graphs. This family contains the class of discrete Bayes Nets as a very special case. It can be shown
that most inferential techniques used for Bayesian Networks readily translate to this new family because of
thier modular form. Furthermore because different models in the class can be associated with families of
polynomials, the inferential implications of one hypothesis against another can be elegantly analysed. In
this talk I will present some recent results associated with CEGs and the challenges they bring to effective
model choice. This is joint work with two PhD students, Christiane Gorgen and Rodrigo Collazo.
Some new perspectives on partial least squares
John Kent, Department of Statistics, University of Leeds
Partial least squares a regularization technique in high-dimensional multiple regression analysis. It has
sometimes had a somewhat dubious reputation in mainstream statistics. Part of the reason seems to be
that the methodology was originally proposed in terms of an algorithm, and only later was it noticed that
it can be viewed as an attempt to fit a particular statistical model, the Krylov model.
In this talk we describe how the Krylov model can be formulated most simply in the setting of inverse
regression and how the PLS estimator can be viewed as an approximate MLE for this model. We then
describe some comparisons with the exact MLE under this model.
8
Poster abstracts
Comparison of statistical methods for multivariate outliers detection
Aurore Archimbaud1 , Klaus Nordhausen2 & Anne Ruiz-Gazen1
1
2
Gremaq (TSE), Universit´e Toulouse 1 Capitole,
E-mail: aurore.archimbaud@ut-capitole.fr
anne.ruiz-gazen@tse-fr.eu
Department of Mathematics and Statistics, University of Turku,
E-mail: klaus.nordhausen@utu.fi
In this poster, we are interested in detecting outliers, like for example manufacturing defects, in multivariate numerical data sets. Several non-supervised methods that are based on robust and non-robust
covariance matrix estimators exist in the statistical literature. Our first aim is to exhibit the links between
three outliers detection methods: the Invariant Coordinate Selection method as proposed by Caussinus
and Ruiz-Gazen (1993) and generalized by Tyler et al. (2009), the method based on the Mahalanobis distance as detailed in Rousseeuw and Van Zomeren (1990), and the robust Principal Component Analysis
(PCA) method with its diagnostic plot as proposed by Hubert et al. (2005).
Caussinus and Ruiz-Gazen (1993) proposed a Generalized PCA which diagonalizes a scatter matrix
relative to another: V1 V2−1 where V2 is a more robust covariance estimator than V1 , the usual empirical
covariance estimator. These authors compute scores by projecting V2−1 -orthogonally all the observations on
some of the components and high scores are associated with potential outliers. We note that computing
euclidean distances between observations using all the components is equivalent to the computation of
robust Mahalanobis distances according to the matrix V2 using the initial data. Tyler et al. (2009)
generalized this method and called it Invariant Coordinate Selection (ICS). Contrary to Caussinus and
Ruiz-Gazen (1993), they diagonalize V1−1 V2 which leads to the same eigen elements but to different scores
that are proportional to each other. As explained in Tyler et al. (2009), the method is equivalent to a
robust PCA with a scatter matrix V2 after making the data spherical using V1 . However, the euclidean
distances between observations based on all the components of ICS corresponds now to Mahalanobis
distances according to V1 and not to V2 .
Note that each of the three methods leads to a score for each observation and high scores are associated
with potential outliers. We compare the three methods on some simulated and real data sets and show
in particular that the ICS method is the only method that permits a selection of the relevant components
for detecting outliers.
Keywords. Invariant Coordinate Selection; Mahalanobis distance; robust PCA.
Bibliography
[1] Caussinus, H. and Ruiz-Gazen, A. (1993), Projection pursuit and generalized principal component
analysis, In New Directions in Statistical Data Analysis and Robustness (eds S. Morgenthaler, E.
Ronchetti and W. A. Stahel), 35–46, Basel: Birkh¨auser.
[2] Hubert, M., Rousseeuw, P. J. and Vanden Branden, K. (2005), ROBPCA: a new approach to robust
principal component analysis, Technometrics, 47(1), 64–79.
[3] Rousseeuw, P. J. and Van Zomeren, B. C. (1990), Unmasking multivariate outliers and leverage points,
Journal of the American Statistical Association, 85(411), 633–639.
[4] Tyler, D. E., Critchley, F., D¨
umbgen, L. and Oja, H. (2009), Invariant coordinate selection, Journal
of the Royal Statistical Society: Series B (Statistical Methodology), 71(3), 549–592.
10
On point estimation of the abnormality of a Mahalanobis distance
Fadlalla G. Elfadaly1 , Paul H. Garthwaite1 & John R. Crawford2
1
The Open University
University of Aberdeen
Email: Fadlalla.Elfadaly@open.ac.uk
2
When a patient appears to have unusual symptoms, measurements or test scores, the degree to which
this patient is unusual becomes of interest. For example, clinical neuropsychologists sometimes need to
assess how a patient with some brain disorder or a head injury is different from the general population or
some particular subpopulation. This is usually based on the patient’s scores in a set of tests that measure
different abilities. Then, the question is “What proportion of the population would give a set of test scores
as extreme as that of the patient?” The abnormality of the patient’s profile of scores is expressed in terms
of the Mahalanobis distance between his profile and the average profile of the normative population. The
degree to which the patient’s profile is unusual can then be equated to the proportion of the population
who would have a larger Mahalanobis distance than the individual. This presentation will focus on forming
an estimator of this proportion using a normative sample. The estimators that are examined include plugin maximum likelihood estimators, medians, the posterior mean from a Bayesian probability matching
prior, an estimator derived from a Taylor expansion, and two forms of polynomial approximation, one
based on Bernstein polynomial and one on a quadrature method. Simulations show that some estimators,
including the commonly-used plug-in maximum likelihood estimators, can have substantial bias for small
or moderate sample sizes. The polynomial approximations yield estimators that have low bias, with the
quadrature method marginally to be preferred over Bernstein polynomials. Moreover, simulations of the
median estimators have a nearly zero median error. This latter estimator has much to recommend it when
unbiasedness is not of paramount importance, while the quadrature method is recommended when bias is
the dominant issue.
Keywords. Bernstein polynomials; Mahalanobis distance; median estimator; quadrature approximation; unbiased estimation.
11
Sparse Linear Discriminant Analysis with Common Principal
Components
Tsegay G. Gebru & Nickolay T. Trendafilov
Department of Mathematics and Statistics, The Open University, UK
Linear discriminant analysis (LDA) is a commonly used method for classifying a new observation into
one of g-populations. However, in high-dimensional classification problems the classical LDA has poor
performance. When the number of variables is much larger than the number of observations, the withingroup covariance matrix is singular which leads to unstable results. In addition, the large number of input
variables needs considerable reduction which nowadays is addressed by producing sparse discriminant
functions.
Here, we propose a method to tackle the (low-sample) high-dimensional discrimination problem by
using common principal components (CPC). LDA based on CPC is a general approach to the problem
because it does not need the assumption of equal covariance matrix in each groups. We find sparse CPCs
by modifying the stepwise estimation method proposed by Trendafilov (2010). Our aim is to find few important spare discriminant vectors which are easily interpretable. For numerical illustrations, the method
is applied on some known real data sets and compared to other methods for sparse LDA.
Bibliography
[1] Trendafilov, N.T. Stepwise estimation of common principal components. Computational Statistics and
Data Analysis 54:3446-3457, 2010.
12
Recovering Fisher linear discriminant subspace by Invariate
Coordinate Selection
Radka Sabolov´a1,2 , H. Oja3 , G. Van Bever1 & F. Critchley1 .
1
MCT Faculty, The Open University, Milton Keynes
2 Email: radka.sabolova@open.ac.uk
3 Turku University
It is a remarkable fact that, using any pair of scatter matrices, invariant coordinate selection (ICS) can
recover the Fisher linear discriminant subspace without knowing group membership, see [5]. The subspace
is found by using two different scatter matrices S1 and S2 and joint eigendecomposition of one scatter
matrix relative to another.
In this poster, we focus on the two group normal subpopulation problem and discuss the optimal choice
of such a pair of scatter matrices in terms of asymptotic accuracy of recovery. The first matrix is fixed
as the covariance matrix while the second one is chosen within a one-parameter family based on powers
of squared Mahalanobis distance, indexed by α ∈ R. Special cases of this approach include Fourth Order
Blind Identification (FOBI, see [1]) and Principal Axis Analysis (PAA, see [4]).
The use of two scatter matrices in discrimination was studied by [2] and later elaborated in [3], who
proposed generalised PCA (GPCA) based on a family of scatter matrices with decreasing weight functions
of a single real parameter β > 0. They then discussed appropriate choice of β, while concentrating on
outlier detection.
Their form of weight function and the consequent restriction to β > 0 implies downweighting outliers.
On the other hand, in our approach, considering any α ∈ R allows us also to upweight outliers. Further,
we may, in addition to the outlier case, study mixtures of subpopulations.
Theoretical results are underpinned by an extensive numerical study.
The UK-based authors thank the EPSRC for their support under grant EP/L010429/1.
Bibliography
[1] Cardoso, J.-F. Source Separation Using Higher Moments Proceedings of IEEE international conference
on acoustics, speech and signal processing 2109-2112.
[2] Caussinus, H. and Ruiz-Gazen, A. Projection pursuit and generalized principal component analyses
New direction in Statistical Data Analysis and Robustness 35-46.
[3] Caussinus, H., Fekri, M., Hakam, S. and Ruiz-Gazen, A. A monitoring display of multivariate outliers
Computational Statistics & Data Analysis, 2003, 44, 237–252.
[4] Critchley, F., Pires, A. and Amado, C. Principal Axis Analysis technical report, Open University,
2006.
[5] Tyler, D., Critchley, F., Dumbgen, L. and Oja, H. Invariant Co-ordinate Selection J. R. Statist. Soc.
B., 2009, 71, 549–592.
13
Hilbertian Fourth Order Blind Identification
Germain Van Bever1,2 , B. Li3 , H. Oja4 , R. Sabolov´a1 & F. Critchley1 .
1
MCT Faculty, The Open University, Milton Keynes
2 Email: germain.van-bever@open.ac.uk
3 Penn State University
4 Turku University
In the classical Independent Component (IC) model, the observations X1 , · · · , Xn are assumed to satisfy Xi = ΩZi , i = 1, . . . , n, where the Zi ’s are i.i.d. random vectors with independent marginals and Ω
is the mixing matrix. Independent component analysis (ICA) encompasses the set of all methods aiming
at unmixing X = (X1 , . . . , Xn ), that is estimating a (non unique) unmixing matrix Γ such that ΓXi ,
i = 1, . . . , n, has independent components. Cardoso ([1]) introduced the celebrated Fourth Order Blind
Identification (FOBI) procedure, in which an estimate of Γ is provided, based on the regular covariance
matrix and a scatter matrix based on fourth moments. Building on robustness considerations and generalizing FOBI, Invariant Coordinate Selection (ICS, [2]) was originally introduced as an exploratory tool
generating an affine invariant coordinate system. The obtained coordinates, however, are proved to be
independent in most IC models.
Nowadays, functional data (FD) are occurring more and more often in practice, and relatively few
statistical techniques have been developed to analyze this type of data (see, for example [3]). Functional
PCA is one such technique which focuses on dimension reduction with very little theoretical considerations.
We propose an extension of the FOBI methodology to the case of Hilbertian data, FD being the go-to
example used throughout. When dealing with distributions on Hilbert spaces, two major problems arise: (i)
the scatter operator is, in general, non-invertible and (ii) there may not exist two different affine equivariant
scatter functionals. Projections on finite dimensional subspaces and Karhunen-Lo`eve expansions are used
to overcome these issues and provide an alternative to FPCA. More importantly, we show that the proposed
construction is Fisher consistent for the independent components of an appropriate Hilbertian IC model
and enjoy the affine invariance property.
This work is supported by the EPSRC grant EP/L010429/1.
Keywords. Invariant Coordinate Selection; Functional Data; Symmetric Component Analysis; Independent Component Analysis.
Bibliography
[1] Cardoso, J.-F. (1989), Source Separation Using Higher Moments Proceedings of IEEE international
conference on acoustics, speech and signal processing 2109-2112.
[2] Tyler, D., Critchley, F., Dumbgen, L. and Oja, H. (2009) Invariant Co-ordinate Selection J. R. Statist.
Soc. B., 71, 549–592.
[3] Ramsay, J. and Silverman, B.W. (2006) Functional Data Analysis 2nd edn. Springer, New York.
14