Modelling sample selection using Archimedean copulas M D. S

Econometrics Journal (2003), volume 6, pp. 99–123.
Modelling sample selection using Archimedean copulas
M URRAY D. S MITH
Econometrics and Business Statistics, School of Economics and Political Science, Faculty of
Economics and Business, University of Sydney, Sydney NSW 2006, Australia
E-mail: Murray.Smith@econ.usyd.edu.au
Received: February 2002
Summary By a theorem due to Sklar, a multivariate distribution can be represented in terms
of its underlying margins by binding them together using a copula function. By exploiting
this representation, the ‘copula approach’ to modelling proceeds by specifying distributions
for each margin and a copula function. In this paper, a number of families of copula functions
are given, with attention focusing on those that fall within the Archimedean class. Members
of this class of copulas are shown to be rich in various distributional attributes that are desired
when modelling. The paper then proceeds by applying the copula approach to construct models for data that may suffer from selectivity bias. The models examined are the self-selection
model, the switching regime model and the double-selection model. It is shown that when
models are constructed using copulas from the Archimedean class, the resulting expressions
for the log-likelihood and score facilitate maximum likelihood estimation. The literature on
selectivity modelling is almost exclusively based on multivariate normal specifications. The
copula approach permits selection modelling based on multivariate non-normality. Examples
of self-selection models for labour supply and for duration of hospitalization illustrate the
application of the copula approach to modelling.
Keywords: Selectivity, Self-selection model, Switching regimes model, Double-selection
model, Copula, Sklar’s theorem, Copula representation, Copula approach, Families of copulas, Archimedean, Kendall’s τ .
1. INTRODUCTION
This article sets out to demonstrate the application of the ‘copula approach’ to model specification in the context of binary models designed to account for data selectivity, should it be
present. The binary models in question have had a long history of use in modelling selectivity in
microeconometrics. The self-selection model discussed in Sections 3 and 4. In Section 5, attention focuses on the application of the copula approach to higher-dimensional sample selection
models, such as the switching regimes model and the double-selection model.
Over the last 30 to 40 years, a large volume of literature on each of the aforementioned sample selection models has been built up in economics and econometrics; see, for example, Vella
(1998) for a recent survey. However, the vast majority of analyses have depended on the statistical
assumption of multivariate normality. Although ubiquitous throughout all facets of econometric
modelling, the adequacy of inference based on the assumption of multivariate normality has often
been questioned, and has often found to be wanting in the context of sample selection models.
Unfortunately, relaxing multivariate normality by replacing it with an alternative multivariate
c Royal Economic Society 2003. Published by Blackwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main
Street, Malden, MA, 02148, USA.
100
Murray D. Smith
distribution has received relatively little attention. In the main, this was because of the additional
computational burdens that were expected to arise. Instead, the literature developed by focusing
on semi-parametric and non-parametric versions of these models, where modelling improvements might be brought about by the use of flexible functions of parameters and the covariates of
the random variables; see, for example, the articles in the special edition by H¨ardle and Manski
(1993). The aim of this article is to return to the issue of replacing multivariate normality with
an alternative multivariate distribution (or, more precisely, a class of multivariate distributions).
The adverse computational consequences are, if anything, mitigated under the proposed method
of model specification: the so-called copula approach.
The copula approach is a modelling strategy whereby a joint distribution is induced by specifying marginal distributions, and a function that binds them together: the copula. The copula
parameterizes the dependence structure of the random variables, thereby capturing all of the
joint behaviour. This then frees the location and scale structures to be parameterized through the
margins, one at a time. Most importantly, the copula approach permits specifications other than
multivariate normality, although it does retain that distribution as a special case.
The copula approach is a relatively new method to economics and econometrics, with a small,
but growing pedigree. For example, Bouy´e et al. (2000) demonstrate applications of copulas to
models relevant to finance, paying particular attention to a number of estimation methods other
than maximum likelihood. In a time series context, Patton (2001) uses copulas conditioned on
past information to model exchange rates. Dardanoni and Lambert (2001) exploit the monotonicity properties (stochastic ordering) of bivariate copulas (in their case used to represent the joint
distribution of a country’s pre- and post-tax living standards) to perform paired cross-country
comparisons. The specification method suggested by Lee (1983) for modelling self-selection
provides an example of the copula approach, as will be shown in what follows.
As all multivariate distributions have a copula representation (Sklar’s theorem; see
Section 2), it might seem that the copula approach is nothing more than the reworking of an old
theme. Might the advantage derived by the copula approach simply be that econometricians are
better practiced at modelling univariate distributions than they are multivariate ones? The ideal,
of course, is to choose the right statistical model a priori , and hence the right copula. However,
when working with empirical data it is rare to have such insight. The specification problem is
further compounded in most sample selection models due to latency of the underlying utilitarian
variables, and the presence of covariates. When faced with such difficulties, it is advantageous to
have at hand a range of potential candidate models from which a preferred fit can emerge. Under
a copula approach, families of models can be constructed according to classes of copula functions: of particular interest here is the class of Archimedean copulas. Archimedean copulas can
display a range of distributional behaviour such as joint asymmetry, excess joint skewness and
joint kurtosis. When applied in the specification of selectivity models, relatively simple formulae
for likelihood and score functions result, thereby facilitating estimation by maximum likelihood
(ML hereafter). In Section 2, the basic elements of copula theory are presented, including those
for the class of Archimedean copulas.
In Section 4, two examples are presented of self-selection models. These relate to labour
supply and to duration of hospitalization, utilizing data from previous studies. The marginal
models in each example are parameterized according to the specification preferred by previous
authors, enabling attention to focus on the fit achieved by various copulas. In this article, standard information criteria (e.g. AIC and BIC) are used for copula choice as the members of the
Archimedean class are, in general, parametrically non-nested. Fortunately, due to fixity of the
margins, the number of parameters does not vary across estimated models, so that the aforec Royal Economic Society 2003
Modelling sample selection
101
mentioned information criteria are equivalent to choice based on the maximized value of the
log-likelihood function. In this article, model selection is an a posteriori consideration, focusing
on selection after estimation.
2. COPULA THEORY
2.1. Sklar’s theorem
With a view to the main result that is embodied in Sklar’s theorem, the copula for an n-dimensional multivariate distribution function F with given one-dimensional marginal distribution
functions F1 , . . . , Fn , is the function that binds together the margins in such a manner as to
form precisely the joint distribution function. The action performed by the copula implies that
it serves to represent the dependence characteristics that associate each of the underlying random variables, irrespective of the form the margins take. Yet another perspective on the copula
function concerns its close links to the multivariate uniform distribution (with margins that are
standard uniform); in fact, in this case the copula is equivalent to the joint distribution. Thus, one
use of copulas is in simulation (e.g. Clemen and Reilly (1999)).
To date, most uses of copula theory have concentrated on the study of the association between
random variables and, to a slightly lesser extent, the establishment of limiting (Fr´echet) bounds
on distributions. For details on the origins, evolution and properties of copula models and related
properties see Dall’Aglio (1991), Schweizer (1991) and Nelsen (1999).
The main result of interest here is a theorem due to Sklar (given in the following for the
bivariate case). Sklar’s theorem shows that there exists a copula function which acts to represent
the joint cdf of random variables in terms of its underlying one-dimensional margins. Let the
margins F1 (x1 ) and F2 (x2 ) denote, respectively, the cumulative distribution functions (cdf) of
the random variables X 1 and X 2 ; that is, Fi (xi ) = Pr(X i ≤ xi ), where xi ∈ R (i = 1, 2; R
denotes the extended real line R ∪ {−∞, +∞}), and let F(x1 , x2 ) = Pr(X 1 ≤ x1 , X 2 ≤ x2 )
denote the joint cdf. Then, for some two-place function C, the joint cdf has the representation
(e.g. Nelsen (1999, Theorem 2.3.3))
F(x1 , x2 ) = C(F1 (x1 ), F2 (x2 ))
(1)
where C is termed the copula function. The copula representation is a re-formulation of the joint
cdf that separates the margins F1 and F2 from their interaction. So while the copula function takes
as arguments the margins F1 and F2 in the representation (1), the function itself is independent
of those margins. The copula serves to capture the dependence characteristics that exist between
the random variables X 1 and X 2 . Nelsen (1999, Section 2.3) provides a proof of (1) that follows
the method given in Schweizer and Sklar (1983, Ch. 6) (where the multivariate version of the
theorem is proved).
2
If F1 and F2 are continuous functions, then (1) is unique for any (x1 , x2 ) ∈ R . On the other
hand, if either or both X 1 and X 2 are discrete random variables that take values on some lattice
of points , then (1) is unique provided (x1 , x2 ) ∈ , but not elsewhere; this does not cause any
great harm, since the region outside of the supporting lattice is rarely of interest. Implicit in (1) is
C(u, v) = 0 if either or both u and v are zero, and C(1, v) = v and C(u, 1) = u, where the pair
(u, v) ∈ I2 (I denotes the closed interval [0, 1] of the real line). Other terminology for the copula
includes ‘uniform representation’ (Kimeldorf and Sampson, 1975), and ‘dependence function’
(Galambos, 1978); in the mathematics literature, the copula is termed the ‘t-norm’.
c Royal Economic Society 2003
102
Murray D. Smith
2.2. Examples of copulas
Three bivariate copulas of some importance are
5 = uv,
(2)
u + v − 1 + |u + v − 1|
2
= max(u + v − 1, 0),
(3)
u + v − |u − v|
2
= min(u, v),
(4)
W=
and
M=
where (u, v) ∈ I2 . 5 is termed the Product copula, and it corresponds to stochastic independence;
that is, if two random variables are independent, then 5 is the copula of their joint distribution.
W is termed the Fr´echet lower bound for copulas, and M the Fr´echet upper bound for copulas.
The closed interval [W, M] has the property of containing all bivariate copulas; namely, for all
copulas C on I2 :
W ≤ C ≤ M.
(5)
These bounds—the Fr´echet bounds for copulas—were obtained by Hoeffding, and they arise as a
consequence of applying the representation (1) to the Fr´echet bounds for (bivariate) distributions:
max(F1 (x1 ) + F2 (x2 ) − 1, 0) ≤ F(x1 , x2 ) ≤ min(F1 (x1 ), F2 (x2 ))
2
((x1 , x2 ) ∈ R )
(e.g. Kwerel (1983)). One use of the Fr´echet bounds for copulas of some implication for statistical modelling is in establishing the coverage of a given family of copulas. For further discussion
see, for example, Fisher (1997).
2.3. Families of copulas
For the purposes of statistical modelling it is desirable to parameterize the copula function so
that data can be used to shed light on the extent of association between the random variables
of interest. Let θ denote the association parameter of the bivariate distribution (possibly vector
valued) and write the parameterized copula as per
Cθ (u, v).
This notation denotes a family of copulas, where the members are indexed according to values
assigned to θ. Provided that the margins F1 and F2 do not depend on θ, the representation (1)
holds for all members of a given family; this assumption is imposed hereafter.
There are numerous examples of families of bivariate copulas given in Joe (1997) and Nelsen
(1999). For example, the family of Bivariate Normal copulas is given by
Cθ (u, v) = 82 (8−1 (u), 8−1 (v); θ )
where − 1 ≤ θ ≤ 1.
(6)
c Royal Economic Society 2003
103
Modelling sample selection
Here, 8(·) denotes the cdf of a standard normal variate, and 82 (·, ·; θ ) the cdf of a bivariate standard normal variate with Pearson’s product moment correlation coefficient θ. Note that setting
u = 8(x1 ) and v = 8(x2 ) in (6) recovers the bivariate standard normal cdf. This family is the
basis of Lee’s self-selection model described in Section 3.3.1. The Farlie–Gumbel–Morgenstern
family of copulas (FGM hereafter) is given by
Cθ (u, v) = uv(1 + θ(1 − u)(1 − v))
where − 1 ≤ θ ≤ 1.
(7)
The FGM family can be useful in analytic work due to its mathematical simplicity; in Section 3.3.2 it is used to construct the FGM self-selection model. The Plackett family of copulas is
given by

p
where θ > 0, θ 6= 1 and
 1 s − s 2 − 4uvθ (θ − 1)
2(θ−1)
s = 1 + (u + v)(θ − 1),
Cθ (u, v) =
(8)

uv
when θ = 1.
Lee and Maddala employ the Plackett family in their discussion of joint and sequential decision
rules (see Maddala (1994, Ch. 21)).
The ability of a given family of copulas to represent differing degrees of association can be
examined in terms of the extent to which it covers the interval between the lower and upper
Fr´echet bounds for copulas (5). This is generally determined at the extremes of the parameter
space for θ. For example, for the Bivariate Normal family (6), C−1 (u, v) = W and C1 (u, v) =
M, so that this family has full coverage: Cθ (u, v) ∈ [W, M]. Furthermore, the family of Bivariate
Normal copulas is said to be comprehensive, where this nomenclature means that a given family
includes W , M and 5 amongst its members, or as limiting cases. The Plackett family is comprehensive too, for under (8), limθ→0+ Cθ (u, v) = W and limθ →∞ Cθ (u, v) = M. Comprehensive
families of copulas therefore parameterize the full range of association and, by (1), this property
holds irrespective of the form of the margins. However, there are typically many other features
of the data that are of interest, and these may not necessarily be well modelled if attention is
restricted to using comprehensive families of copulas.
There are many copula families that are not comprehensive, one example is the FGM
family (7): it includes 5, but not W and M. For such families it is desirable to assess coverage in terms of measures of association. The most familiar measure is Pearson’s product moment
correlation coefficient, but due to its lack of invariance with respect to the margins, the properties of this measure are dominated by others such as Kendall’s τ and Spearman’s ρ (Joe, 1997,
Section 2.1.9). The latter two are concordance measures that are bounded between [−1, 1]: both
are equal to −1 at W , 1 at M and 0 for 5. Importantly, both measures are invariant to strictly
increasing transformations of the variables, implying that they depend only on the copula of the
joint distribution, and not the margins. For independent pairs (X 1i , X 2i ), i = 1, 2, 3, that are
copies of (X 1 , X 2 ), τ and ρ are defined as
τ = Pr((X 11 − X 12 )(X 21 − X 22 ) > 0) − Pr((X 11 − X 12 )(X 21 − X 22 ) < 0)
and
ρ = 3(Pr((X 11 − X 12 )(X 21 − X 23 ) > 0) − Pr((X 11 − X 12 )(X 21 − X 23 ) < 0)).
Should (X 1 , X 2 ) be a pair of continuous random variables, with the copula of their joint distribution given by C, then τ and ρ may be simplified:
c Royal Economic Society 2003
104
Murray D. Smith
τ =4
Z Z
I2
C(u, v)dC(u, v) − 1
= 4E[C(U, V )] − 1
and
ρ = 12
Z Z
I2
uvdC(u, v) − 3
= 12E[U V ] − 3.
Here, U and V denote standard uniform random variables with joint cdf C. For the FGM family
of copulas τ = 2θ/9 and ρ = θ/3, clearly −2/9 ≤ τ ≤ 2/9 and −1/3 ≤ ρ ≤ 1/3 for this
family. For detailed derivations of the above results see Nelsen (1999, Section 5.1).
2.4. The Archimedean class of copulas
Of particular importance in this article is the class of Archimedean copulas. The class encompasses many families of copulas, a number of which can be of use in statistical modelling. The
mathematical properties of the Archimedean class are captured by an additive generator function
ϕ : I → [0, ∞], which is a continuous, convex, decreasing function (ϕ 0 (t) < 0 and ϕ 00 (t) > 0,
for 0 < t < 1), with terminal ϕ(1) = 0. ϕ may also be indexed by the association parameter θ,
thus an entire family of copulas can be Archimedean. Any function ϕ that satisfies these conditions can be used to generate a valid bivariate cdf. The advantage in mathematics of working with
Archimedean copulas is the achievement of reduction in dimensionality: while the copula of an
n-variate distribution is an n-place function, the generator ϕ only ever takes a single argument.
In econometrics, this property of Archimedean copulas has the potential to be of use in models of limited dependent variables, especially those requiring some probabilistic enumeration on
high-dimensional subspaces, for evaluation then becomes essentially a univariate task.
In the bivariate case, the means by which ϕ generates the copula is according to
ϕ(C(u, v)) = ϕ(u) + ϕ(v).
(9)
Note that the generator is unique up to a scaling constant. Particular examples are ϕ(t) = − log t
and ϕ(t) = (t −θ − 1)/θ, which are, respectively, the generators of the Product copula 5 and the
Clayton family of copulas
Cθ (u, v) = (u −θ + v −θ − 1)−1/θ
where θ ≥ 0.
(10)
Note that neither the Bivariate Normal family, nor the Plackett and FGM families are members
of the Archimedean class. Examples of families of Archimedean copulas are listed in Table 1.
If the terminal ϕ(0) = ∞, the generator is termed strict, and the inverse function ϕ −1 exists.
The generators of 5, (10) and those listed in Table 1 are strict. In this instance, from (9), the
copula is recovered by
C(u, v) = ϕ −1 (ϕ(u) + ϕ(v)).
Non-strict generators are those for which ϕ(0) < ∞; in this case, the generators are said to
have a singular component. Analysis in this instance must begin by defining a pseudo-inverse
function, ϕ [−1] . An example is ϕ(t) = 1 − t, for which ϕ [−1] (t) = max(1 − t, 0): note that
c Royal Economic Society 2003
105
Modelling sample selection
Table 1. Examples of families of bivariate Archimedean copulas.
Name
Copula Cθ (u, v)
AMH
uv/(1 − θ(1 − u)(1 − v))
p
1 r + r 2 + 4θ ,
2
AP
Parameter space
Generator ϕ(t)
Kendall’s τ
−1 ≤ θ < 1
log 1−θ (1−t)
t
−0.1817 ≤ τ < 13
0<θ <∞
(1 + θ/t)(1 − t)
0≤θ <∞
1 −θ − 1)
θ (t
−1 < τ < 13
0≤τ <1
−∞ < θ < ∞
−θt
− log e −θ −1
−1 < τ < 1
1≤θ <∞
(− log t)θ
0≤τ <1
1≤θ <∞
− log(1 − (1 − t)θ )
0≤τ <1
where r = u + v − 1 − θ u1 + v1 − 1
Clayton (u −θ + v −θ − 1)−1/θ
Frank
−θ −1 log(1 + (e−θu − 1)(e−θ v − 1)/
(e−θ − 1))
Gumbel exp(−((− log u)θ + (− log v)θ )1/θ )
Joe
1 − ((1 − u)θ + (1 − v)θ − (1 − u)θ (1 − v)θ )1/θ
e
−1
Notes: AMH denotes Ali–Mikhail–Haq.
ϕ [−1] (ϕ(u) + ϕ(v)) = max(u + v − 1, 0) = W , thus the lower Fr´echet bound for (bivariate)
copulas is Archimedean. In a modelling context it is not entirely clear what gains might be
made by specifying non-strict generators, so in this article attention is confined to the category
of strict generators. Nelsen (1999, Ch. 4) gives extensive details about Archimedean copulas
(strict and non-strict); see also Genest and MacKay (1986), Genest and Rivet (1993), Jouini and
Clemen (1996) and Mari and Kotz (2001, Section 4.6). A recent application of Archimedean
copulas in finance appears in Henessey and Lapan (2002), they study optimal allocation rules
for portfolios of risky assets. In actuarial science, Valdez (2001) uses Archimedean copulas to
induce dependence amongst random variables, where it is the distribution of the sum that is of
interest (in Valdez’s case the sum represents total claims made against an insurer).
To illustrate the range of bivariate behaviour that can be represented by Archimedean copulas,
consider Figure 1. Each plot shows the contours of a bivariate probability density function (pdf)
where, for reasons due only to familiarity, both margins are standard normal. The top left plot
depicts the well-known elliptical contours of the bivariate standard normal pdf, with Pearson
product moment correlation coefficient θ = 0.7. All plots (apart from the AMH) calibrate to
τ = 0.5, where, for an Archimedean copula with generator ϕ,
τ =1+4
1
Z
0
ϕ(t)
dt
ϕ 0 (t)
with notation ϕ 0 (t) = ∂t∂ ϕ(t); for a proof of this result see Nelsen (1999, p. 130). For example, for the Clayton family (10), τ = θ/(θ + 2), so that this family covers 0 ≤ τ < 1. The
distributions generated by Archimedean copulas evidence a wide range of behaviour including
joint asymmetry and skewness, and fat and thin tails in comparison to the bivariate normal.
Amongst the contours generated by Archimedean copulas, only the Frank shows (radial) symmetry (see Nelsen (1999, Section 2.7) for a discussion of the concepts of symmetry in bivariate
distributions), although by reducing τ towards zero in each plot, the resulting contour plots would
all start to appear increasingly circular. Relative to the bivariate normal, the contours generated
by the Clayton and Joe copulas imply fat tailed distributions, as do those of the Frank and Gumbel but to a lesser extent. Contours of thin tailed distributions can be seen in each of the Clayton,
Gumbel and Joe plots. The wide range of distributional shapes that the Archimedean copulas can
depict is an indicator that members of this class may be useful in modelling. In terms of coverage,
c Royal Economic Society 2003
106
Murray D. Smith
= 0.7
Bivariate Normal,
2
AMH,
= 0.714 ( = 0.2)
2
1
1
0
0
-1
-1
-2
-2
-2
-1
0
Clayton,
2
1
2
=2
-2
-1
0
1
0
0
-1
-1
-2
2
= 5.74
Frank,
2
1
1
-2
-2
-1
0
Gumbel,
2
1
2
=2
-2
-1
0
Joe,
2
1
1
0
0
-1
-1
-2
1
2
1
2
= 2.86
-2
-2
-1
0
1
2
-2
-1
0
Figure 1. Bivariate pdf contour plots induced by copula, N (0, 1) margins (τ = 0.5).
only the Frank family is comprehensive. The last column of Table 1 sets down the coverage of
the family in terms of the bounds applicable to their respective τ measure.
Finally, a particular result for Archimedean copulas that is especially relevant in this article:
∂
ϕ 0 (v)
Cθ (u, v) = 0
.
∂v
ϕ (Cθ (u, v))
(11)
c Royal Economic Society 2003
Modelling sample selection
107
This result follows from (9): simply differentiate both sides of that equation with respect to v and
re-arrange the result. Since ϕ is convex and decreasing on I, and Cθ (u, v) < Cθ (1, v) = v for
(u, v) ∈ (0, 1)2 , it follows that (11) takes values in (0, 1).
2.5. The copula approach to model construction
For the purposes of statistical modelling, it is the converse of the copula representation of the
joint cdf given by Sklar’s theorem that is relevant. In other words, given models for the margins
and a copula function that binds them together, this then has the effect of constructing a statistical
model for the random variables of interest, as a joint cdf is specified. Consider, for example, a
bivariate setting in which X 1 and X 2 denote the variables of interest. Required is a statistical
model for the true, but unknown joint distribution of X 1 and X 2 ; naturally, this distribution may
depend on parameters and covariates. Under a copula approach, models for the margins F1 (x1 )
and F2 (x2 ) are proposed, as well as a selection of a copula family Cθ . Then, by (1), these selections have the effect of specifying the joint cdf of X 1 and X 2 . Intuitively, the copula approach
determines each component of the overall model, then engineers them together using a copula
function.
As would be expected, the copula approach does not necessarily guarantee unique identification of the parameters of the resulting model. That issue would need to be addressed on a case
by case basis. An example in which parameter identification is important appears in Section 5.1,
where the switching regimes model is discussed.
The use of the copula in statistical modelling is beginning to expand into areas such as economics and econometrics; perhaps the most accessible contribution to date being a series of five
studies reported in Joe (1997, Ch. 11) that estimate copula models for various multivariate and
longitudinal data sets. An added boon for modelling that results by adopting a copula approach
concerns the freedom to specify each margin; for example, identicality in distribution of the margins need not be imposed. Indeed, because the copula representation is unique on the domain
of support of the random variables in question, multivariate models can be constructed using a
copula approach whose margins can be either continuous or discrete, or mixtures of both.
In the following two sections, examples are given of the copula approach applied in the
context of the classic self-selection model of microeconometrics.
3. THE SELF-SELECTION MODEL
3.1. Model and likelihood
Sample stratification, or sample selection, is commonplace amongst microeconometric data,
whereby underlying individual choices can themselves influence the observations collected on
the random variables of interest. Models of increasing complexity have been constructed to
account for stratification in its various guises, should it be present, and a number of these are discussed in texts such as Amemiya (1985, Sections 10.6–10.10), Maddala (1983), Maddala (1994,
Part III), and Lee (1996, Section 5.6). In this section, attention focuses on the self-selection model
based on a binary indicator S that governs whether or not an observation is generated on a second random variable Y . In economics, one often-studied example of this type of self-selectivity is
c Royal Economic Society 2003
108
Murray D. Smith
labour force participation, where data generated on labour supply from non-participants is unable
to reflect their true market wage.
Typically, the self-selection model is embedded within an utilitarian framework according to
a pair of underlying latent random variables Y1∗ and Y2∗ ; selectivity arises if these unobservables
are mutually dependent. Here it is assumed that the cdf of Yi∗ (i = 1, 2), denoted by Fi (yi∗ ) =
Pr(Yi∗ ≤ yi∗ ), where yi∗ ∈ R, depends on the linear function xi0 βi and a scaling factor σi , where
X i = xi (ki × 1) is a vector of covariates of Yi∗ , and βi (ki × 1) and scalar σi are unknown
parameters. The joint cdf of (Y1∗ , Y2∗ ) is denoted by F(y1∗ , y2∗ ) = Pr(Y1∗ ≤ y1∗ , Y2∗ ≤ y2∗ ), and it
depends on all covariates and parameters.
The purpose of Y1∗ is to represent participation. In the examples that follow, Y1∗ is assumed
to be a continuous random variable; however, this can be relaxed without loss of generality. In
the self-selection model, Y2∗ is observed for participants. In this section, it is assumed that Y2∗ is
a continuous random variable with pdf f 2 (y) = ∂∂y F2 (y), for all real y in the support of Y2∗ .
The self-selection model arises when observations on a pair of random variables (S, Y ) are
generated according to the following observation rules:
S = 1{Y1∗ > 0}
and
Y = 1{Y1∗ > 0}Y2∗
where 1{A} denotes the indicator function, taking value 1 if event A holds, and 0 otherwise. In
effect, Y2∗ can be observed only when Y1∗ > 0. The participation mechanism is represented by
the Bernoulli variable S, and it derives its properties from those of Y1∗ . Note that when S = 0,
Y2∗ cannot be observed, and Y is assigned a dummy value of 0.
Let s1 , . . . , sn denote n observations generated on S(s j ∈ {0, 1}, j = 1, . . . , n), and y1 , . . . ,
yn the corresponding n observations generated on Y (y j ∈ R, j = 1, . . . , n). For a random sample of n observations, the likelihood function for the self-selection model is given by
(cf. Amemiya (1985, equation (10.7.3)))
Y
Y
L=
Pr(Y1∗j ≤ 0)
f 2|1 (y j | Y1∗j > 0) Pr(Y1∗j > 0)
(12)
0
1
Q
Q
where 0 indicates the product over those observations for which s j = 0, and 1 the product
over those observations for which s j = 1. The function f 2|1 denotes the pdf of Y2∗ , given event
Y1∗ > 0. Its functional form can be derived as follows:
1
∂
(F2 (y) − F(0, y))
1 − F1 (0) ∂ y
1
∂
=
f 2 (y) −
F(0, y)
1 − F1 (0)
∂y
f 2|1 (y | Y1∗ > 0) =
where F1 (0) = Pr(Y1∗ ≤ 0) = Pr(S = 0). Substitution into (12) yields
Y
Y
∂
L=
F1 (0)
f 2 (y) −
F(0, y)
∂y
0
1
Y Y
∂
=
F1
f2 −
F(0, y)
∂y
0
(13)
1
where, for convenience, the index j has been dropped in the first line. Additional simplified
notation appears in the second line of (13): F1 will be used from now on to denote F1 (0) =
c Royal Economic Society 2003
109
Modelling sample selection
Pr(Y1∗j ≤ 0) = Pr(S j = 0), as too, from now on F2 denotes F2 (y j ) = Pr(Y2∗j ≤ y j ), and f 2
denotes f 2 (y j ).
The component of (13) that is the most difficult to evaluate is ∂∂y F(0, y). However, should
Y1∗ and Y2∗ be independent, for example, then ∂∂y F(0, y) = F1 f 2 , and L can be separated as per
Q
Q
Q
0 F1
1 (1 − F1 ) ×
1 f 2 . The likelihood (13) is the general form for the self-selection
model. Particular likelihood functions arise from specifications assumed for F etc, a number of
which are examined in the following.
3.2. The Normal model
By far and away the most common specification for F seen in the literature is due to Heckman
(1974), in which bivariate normality, along with univariate normal margins for Y1∗ and Y2∗ are
modelled such that E[Yi∗ ] = xi0 βi and V ar (Yi∗ ) = σi2 , i = 1, 2. That is,
F(y1∗ , y2∗ )
= 82
y1∗
−
x10 β1 ,
y2∗ − x20 β2
;θ
σ
(14)
where σ = σ2 , and σ1 is normalized to unity as all scale information about Y1∗ is lost in the
transformation to S. This self-selection model is termed here the Normal model. A number of
empirical applications of the Normal model are discussed in Amemiya (1985, Section 10.7). The
likelihood is given by
Y
Y 1 y − x 0 β2 x 0 β1 + θ (y − x 0 β2 )/σ 2
2
φ
8 1
L=
8(x10 β1 )
√
2
σ
σ
1
−
θ
0
1
cf. Amemiya (1985, equation 10.7.6).
3.3. Modelling using the copula approach
By using a copula approach, models are constructed that can be viewed as generalizations of the
Normal model, because despite structure being imposed on the joint cdf F through the choice of
copula Cθ , there is no behaviour assumed on the part of the margins, both of which may then be
modelled as desired, subject to parameter identification considerations. Generally, the parameters
of self-selection models that are built using a copula approach are identified through functional
form assumptions, rather than by exclusion restrictions applied to covariates.
Although there are any number of copula families that may be specified, two in particular (the
bivariate normal and the FGM) are included here for they have already appeared in the literature
on modelling self-selection. In all cases, once the margins F1 (y1∗ ) and F2 (y2∗ ) are specified, it
is straightforward to derive the score function, and then to evaluate it and the log-likelihood for
purposes of ML estimation using a quasi-Newton algorithm.
3.3.1. Lee’s model.
Lee (1983) (see also Maddala (1983, Section 9.4)) gives a bivariate normal
specification for the joint cdf F which allows the practitioner to specify non-normal margins.
Lee’s model sets
(15)
F(y1∗ , y2∗ ) = 82 (8−1 (F1 (y1∗ )), 8−1 (F2 (y2∗ )); θ ).
c Royal Economic Society 2003
110
Murray D. Smith
In fact, the copula representation of Lee’s specification shows that the family of Bivariate Normal
copulas (6) is in use here. The sense in which Lee’s model generalizes the Normal model is that
the latter is obtained as the special case corresponding to assuming normality for both margins;
that is, setting F1 (y1∗ ) = 8(y1∗ − x10 β1 ) and F2 (y2∗ ) = 8((y2∗ − x20 β2 )/σ ) in (15) yields (14). The
likelihood of Lee’s model is
p
Y Y
L=
F1 (1 − 8 (8−1 (F1 ) − θ 8−1 (F2 ))/ 1 − θ 2 ) f 2 .
0
1
3.3.2. The FGM model.
Set
F(y1∗ , y2∗ )
= F1 (y1∗ )F2 (y2∗ )(1 + θ (1 − F1 (y1∗ ))(1 − F2 (y2∗ ))).
Clearly, the family of FGM copulas (7) is specified here. The FGM family has been used in
Smith (2002) in the context of the double-hurdle selection model (see Cragg (1971)). Prieger
(2002) advocates the FGM model when modelling self-selection. While the FGM family has the
advantage of mathematical simplicity, its usefulness in modelling data (in general) is curtailed
by its limited coverage of dependency (Joe (1997, p. 149); −2/9 ≤ τ ≤ 2/9, see Section 2.3
above). In this respect, for the purposes of modelling it may be useful to consider extensions to
the FGM family of copulas that expand its coverage; Mari and Kotz (2001, Section 5.7) describe
a number of these. The likelihood of the FGM model is
Y Y
L=
F1 (1 − F1 )(1 − θ F1 (1 − 2F2 )) f 2 .
0
1
3.3.3. The Archimedean class of copula models.
In this subsection, attention focuses on selecting families of copulas from the Archimedean class. Due to the mathematical structure of Archimedean copulas, captured by the generator ϕ, it should not be surprising to learn that the likelihood and score can be re-expressed in terms of (derivatives of) the generator.
For Archimedean copulas, the following derivative, appearing in the general form of the
likelihood (13), simplifies to
∂
∂ F2
∂
F(0, y) =
Cθ (F1 , v)
×
∂y
∂v
∂y
v→F2
ϕ 0 (F2 )
= 0
× f2
ϕ (Cθ )
(16)
where Cθ denotes Cθ (F1 , F2 ) = Cθ (F1 (0), F2 (y)), which is evaluated as ϕ −1 (ϕ(F1 ) + ϕ(F2 )).
The second line of (16) follows from (11). The likelihood function of the self-selection model
for any distribution whose copula is Archimedean is
Y Y
ϕ 0 (F2 )
L=
F1
1− 0
f2.
(17)
ϕ (Cθ )
0
1
ϕ 0 (t)
As the functional form of
is generally quite easy to derive, the likelihood is relatively easy
to code. For example, under the Clayton family (10), the likelihood is
θ+1 !
Y Y
Cθ
F1
1−
f2.
F2
0
1
In Table 2, expressions for the component (1 − ϕ 0 (F2 )/ϕ 0 (Cθ )) of the likelihood are given for
selected families of Archimedean copulas.
c Royal Economic Society 2003
Modelling sample selection
111
0 (F )
2
Table 2. Expressions for 1 − ϕϕ0 (C
.
θ)
AMH
Clayton
Frank
Gumbel
Joe
(1−θ )F1 +θ F12
(1−θ (1−F1 )(1−F2 ))2
1+θ
−(θ+1) −θ
1 − F2
(F1 + F2−θ − 1)− θ
eθ F2 (eθ F1 −eθ )
eθ (F1 +F2 ) +eθ (1−eθ F1 −eθ F2 )
1 − Cθ (F1 , F2 )((−log F1 )θ + (−log F2 )θ )−1+1/θ (− log F2 )θ−1 F2−1
θ
θ +1 θ
θ
θ θ
1 − (1 − F 1 )F 2 (F 1 + F 2 − F 1 F 2 )−1+1/θ
1−
Notes: F 1 = 1 − F1 and F 2 = 1 − F2 .
3.4. Remarks
Under the copula approach, models are constructed in a component-wise fashion: specifying F1 ,
F2 and Cθ . For the margins F1 and F2 , parametric models can be constructed using generalized
linear methods (e.g. McCullagh and Nelder (1989)). This flexibility is a distinct advantage of
the copula approach, as the margins need not be restricted to the same family of distributions.
However, other approaches are also possible; for example, using semi- and non-parametric methods to specify the margins. For the copula function, this article advocates selecting families of
copulas from the Archimedean class.
Given the relatively simple functional form for the self-selection likelihood function under
an Archimedean copula (17), ML estimation can be employed to jointly estimate all parameters.
As general analytical expressions for the score function can be derived (these involve derivatives
of the generator ϕ; see the Appendix for details), it is relatively easy to implement well-known
quasi-Newton optimization algorithms such as DFP and BFGS; the latter is used in the following. Accordingly, the use of an Archimedean copula for Cθ satisfies the need identified by Vella
(1998, p. 132) to maintain ease of implementation as the model assumption departs from bivariate
normality, while remaining in the framework of ML estimation. Unfortunately, obtaining the analytic form of the Hessian of the log-likelihood is a tedious exercise, so if implementation of the
Newton–Raphson algorithm is desired, then, when deriving the Hessian matrix, it is perhaps better to use numerical methods that can approximate derivatives. These considerations also impact
on estimation of the asymptotic variance–covariance matrix of the ML estimator. The method
adopted here is to use as the estimate the final iterate of the approximation to the inverse Hessian
that is generated at each step of the BFGS algorithm. Other variance–covariance matrix estimators include the OPG estimator, although this is known to be prone to inflate standard errors in
small samples. Estimation using the inverse Information matrix does not seem practicable here
due to the difficulties induced by non-linearity in the variables of the model.
It seems quite plausible that the usual suite of asymptotic properties of the ML estimator
will hold. However, it remains an open question for research to prove those regularity conditions
under which the ML estimator is consistent, asymptotically normal and efficient for the selfselection model under an Archimedean copula (or more generally for any copula). A proof might
follow the general approach of Amemiya (1973) (he examined the properties of the ML estimator
in the Tobit model), which was extended by Newey and McFadden (1994) to a number of other
estimators in other models. It is well known that the ML estimator in the Normal model (14) is
sensitive to departures from bivariate normality. It follows then that non-normal selection models
will most likely be sensitive to distributional misspecification error too.
c Royal Economic Society 2003
112
Murray D. Smith
The use of ML contrasts against various two-step estimation methods that have been discussed in the copula literature. For example, Joe (1997, Ch. 10) proposes the IFM method (Inference Functions for Margins), and Bouy´e et al. (2000) (see also Genest et al. (1995)) propose
the CML method (Canonical Maximum Likelihood). The IFM method separately maximizes the
likelihoods of the marginal models, then proceeds by combining the estimated margins into a
multivariate model in order to estimate the remaining parameters, these being the parameters
of the copula. The CML method reverses this procedure, first estimating the association parameters using the empirical distribution functions of the margins, after which the parameters of
the margins are estimated. Generally, these estimators are consistent and asymptotically normal,
although less efficient than ML (e.g. see Joe and Xu (1996)); their advantage over ML is primarily in computational ease. However, in the specific case of the self-selection model, neither
estimator is appropriate because the model fitted for F2 only uses those observations pertaining
to the self-selected sub-population, thereby inducing selectivity bias.
Given empirical data, model selection across differing self-selection specifications is an
a posteriori consideration, for it is rare that the true data generating mechanism F is known
a priori. Setting aside for now the specification of the margins F1 and F2 , differing families of
copulas Cθ are, in general, parametrically non-nested, even if the families being compared are
Archimedean. For example, even though the same symbol θ is used to denote the association
parameter in each family in Table 1, none of the families that are listed there parametrically nests
another in that list. Consequently, following the suggestion of Joe (1997, Section 10.3), information measures such as AIC and BIC applied to each fitted model can be used as the selection
criterion amongst competing models. This is the method adopted in the examples which follow. However, because in both examples the specifications of the margins F1 and F2 are fixed
across competing models (i.e. models compete only according to specifications for Cθ , and so
the number of parameters does not vary across models), then model selection procedures that
use information measures (like AIC and BIC) that penalize fit by the number of parameters
used to attain that fit is equivalent simply to selection based on the largest of the maximized
log-likelihoods.
4. APPLICATIONS
4.1. Example 1: labour supply
In this example, self-selection models for female labour supply are constructed and estimated
using the specifications discussed in Section 3. The data (n = 200) were randomly drawn from
the 1987 Michigan Panel Study of Income Dynamics (these data are tabulated in Lee (1996,
Appendix)). The variable descriptions used here correspond to his:
Symbol
Y
inc
edu
pkid
mort
1
Description
Wife labour supply (hours per month)
Household income ($000)
Wife schooling (years)
Number of pre-school children, age 0 to 5 years
0-1 dummy, equal to 1 if house is mortgaged
Constant dummy
Here, the covariates of the binary indicator of labour force participation S are specified to be
x1 = (edu, pkid, mort, 1), while the covariates of labour supply Y are specified to be x2 =
c Royal Economic Society 2003
113
Modelling sample selection
Table 3. ML estimates of θ and τ for labour supply; fixed normal–normal margins.
θ
Indep.
(1)
0.000
τ
log L
Normal
(2)
FGM
(3)
0.584
Clayton
(5)
0.280
[0.445]
[1.684]
2.757
1.000
0.222
0.155
0.123
0.286
0.000
0.498
−933.012
−933.310
−933.761
−932.845
−933.959
−931.841
0.461
1.000
0.000
0.305
−933.959
−933.640
[0.289]
{1.47}
AMH
(4)
[0.323]
{1.47}
Frank
(6)
{0.72}
Gumbel
(7)
{1.88}
Joe
(8)
2.841
[0.734]
{5.19}
Notes: (i) Estimated standard errors on b
θ appear within square braces [ ].
(ii) b
τ = τ (b
θ ). Appearing in curly braces { } are associated t-statistics for the test of independence: τ = 0. These are
derived using the delta theorem.
(iii) For the bivariate normal distribution: τ (θ ) = 2π −1 sin−1 (θ ).
(iv) For the FGM distribution: τ (θ) = 2θ/9.
R
(v) For Archimedean copulas: τ (θ) = 1 + 4 01 ϕϕ(t)
0 (t) dt.
(inc, pkid, 1). Note that 44 respondents report 0 hours of labour. It is worth noting that Lee’s
analyses of these data do not find significant evidence of association between participation and
supply; in other words, for these data, he finds that selectivity bias is insignificant.
Following Lee, both margins are assumed normally distributed: Y1∗ ∼ N (x10 β1 , 1) and Y2∗ ∼
N (x20 β2 , σ 2 ) – term this ‘normal–normal’ margins. Then,
F1 =
1 − 8(x10 β1 )
and
thus,
f2 =
1
φ
σ
F2 = 8
y − x20 β2
σ
y − x20 β2
σ
.
For various families of copulas, ML estimation results appear in Table 3. Point estimates of the
covariate parameters are suppressed for there is broad similarity in these estimates across each
of the models. This is to be expected as the margins are fixed. However, there is improvement
in efficiency (smaller estimated standard errors) of the covariate parameter estimates, especially
so for the preferred Joe model over the others. A complete table of ML estimation results is
available from the author upon request.
For the Independent model (column (1)), Y1∗ and Y2∗ are specified to be independent (i.e.
F(y1∗ , y2∗ ) = F1 (y1∗ )F2 (y2∗ )) and so the association parameter θ is fixed at 0. In column (2),
results for the Normal model appear. There is a small improvement in the value of the loglikelihood in the Normal model from that of the Independent model, but at conventional levels of
significance this improvement is insignificant, indicating that selectivity bias is not present under
this specification. For the FGM model, maximization of the likelihood requires the degenerate
setting θ = 1 at the boundary of the parameter space for these data. The results for the FGM
model (column (3)) reinforce the opinion expressed in Section 3.3.2 concerning the inadequacy
of the FGM copula to model empirical data.
Results for models that use Archimedean copulas appear in columns (4)–(8). The Gumbel
model performs the worst for these data, because to maximize the likelihood the degenerate unit
estimate of θ corresponds to independence. The AMH and Clayton models do not perform well,
neither estimate of θ is significant from 0 (independence corresponds to θ = 0 for the AMH
family, and θ → 0+ for the Clayton family), and both maximized log-likelihoods fail to improve
c Royal Economic Society 2003
114
Murray D. Smith
on that achieved by the (degenerate) FGM model. Similar to the results for most of the other
models, Frank’s model (column (7)) yields insignificant positive association between Y1∗ and Y2∗
for these data. This is, however, not the case if Joe’s family of copulas are used, for the estimate
of θ of 2.841 in Joe’s model (column (8)) is significantly different from unity (θ = 1 yields
independence under Joe’s family), the relevant t-statistic being 2.5. Joe’s model also outperforms
the others presented here in the sense that the maximized log-likelihood of −931.841 is greatest.
As the association parameter θ is not comparable across the families of copulas appearing in
Table 3, it is re-parameterized to Kendall’s τ . The ML estimate of τ and the associated t-statistic
for the test of selectivity bias (τ = 0) appear in Table 3. Of all the models considered, the
estimate of τ of 0.498 for Joe’s model is greatest, as too, with an associated t-statistic of 5.19, it
is the only model whose estimate differs significantly from zero. Under the preferred Joe model,
there is significant evidence in these data for the presence of selectivity.
4.2. Example 2: length of time in hospital
Prieger (2002) studied the total spell of hospitalization for individuals reporting in the 1996 wave
of the US Medical Expenditure Panel Survey (n = 14,946; these data are available for download
from the Journal of Applied Econometrics Data Archive). Prieger fitted the Independent, Lee and
FGM self-selection models (see Sections 3.3.1–3.3.2), ultimately preferring the outcome of the
latter as per the maximized log-likelihood criterion. In this example, a further modelling improvement is demonstrated by using Archimedean copulas, under the same model selection criterion.
Prieger’s specification of the marginal models is termed here ‘normal-gamma’ margins, arising as follows: (i) Normal. Of the entire sample, a total of 1346 individuals reported having
been admitted to hospital. To represent this, Prieger specified normality for the propensity to
hospitalization; that is, Y1∗ ∼ N (x 0 β1 , 1), thus:
F1 = 1 − 8(x 0 β1 )
where the covariates x are described in Prieger’s Table 4. (ii) Gamma. For all hospitalized individuals (the self-selected sub-population), Prieger assumed that the duration of time spent per visit
to hospital was exponentially distributed with mean 1/λ = exp(x 0 β2 ). For Q = q ∈ {1, 2, 3, . . .}
hospitalizations (some 362 individuals reported multiple hospitalizations), the durations of which
Prieger assumed mutually independent, finds Y , the total spell of hospitalization, such that Y |
(Q = q ≥ 1) ∼ Gamma(q, 1/λ) with pdf
f2 =
1 q
λ exp(−λy)y q−1
0(q)
for real y > 0, and cdf
0(q, λy)
0(q)
R∞
where 0(a, b) denotes the incomplete gamma function, b exp(−t)t a−1 dt.
Prieger’s preferred FGM model yields a negative estimate of dependence (see Table 4).
Accordingly, it would seem appropriate to include families of Archimedean copulas that can
F2 = 1 −
c Royal Economic Society 2003
115
Modelling sample selection
Table 4. ML estimates for length of hospitalization; fixed normal-gamma margins.
θ
Indep.
(1)
0.000
τ
log L
Lee
(2)
FGM
(3)
AMH
(4)
Joe
(5)
AP
(6)
0.0013
−0.8735
−0.8624
[0.023]
1.0102
[0.045]
[0.004]
0.1228
[0.048]
0.000
0.0008
−0.1941
−0.1604
−0.0789
{0.06}
0.0059
{−18.24}
{−22.49}
{2.38}
{−3.14}
−7674.25
−7674.25
−7641.42
−7641.64
−7670.14
−7613.61
[0.016]
Notes: (i) Estimated standard errors on b
θ appear within square braces [ ].
(ii) b
τ = τ (b
θ ). Appearing in curly braces { } are associated t-statistics for the test of independence: τ = 0. These are
derived using the delta theorem.
R∞ R∞
(iii) For the Bivariate Normal copula: τ (θ ) = 4 −∞
−∞ 82 (x, y; θ )φ2 (x, y; θ )d xdy − 1.
(iv) For the FGM copula: τ (θ) = 2θ/9.
R
(v) For Archimedean copulas: τ (θ) = 1 + 4 01 ϕϕ(t)
0 (t) dt.
accommodate negative dependence in order to model these data.1 One family of Archimedean
copulas that can accommodate negative dependence (in fact it attains W , the lower Fr´echet bound
for copulas, as θ → 0+ ) is the AP family of copulas
√
Cθ (u, v) = 12 (r + r 2 + 4θ )
(18)
− 1 , and θ > 0 (see Table 1). The generator is ϕ(t) =
p
(1 + θ/t)(1 − t), for which
= 12 (1 − θ − t + (1 − θ − t)2 + 4θ) is convex, but not
completely monotonic. Note the similarity in appearance of the AP family to the Plackett family
of copulas (8).
For various families of copulas, ML estimation results appear in Table 4. For the same reasons
as given earlier, point estimates of the covariate parameters are suppressed (a complete table of
ML estimation results is available upon request). Fortunately, there is close agreement between
Prieger’s point estimates of (β1 , β2 , θ) and the complete set of results corresponding to columns
(1)–(3) of Table 4. However, where numerical differences arise they are in the estimated standard errors. This is because Prieger used the OPG method to estimate the asymptotic variance–
covariance matrix.
Columns (4)–(6) of Table 4 list estimation results for the association parameter θ and for
Kendall’s τ = τ (θ). The performance of the AMH model almost mimics that of the FGM model
for these data, both estimate significant negative dependence—for the AMH model b
τ = −0.1604
(t-statistic −22.49 for τ = 0) and for the FGM model b
τ = −0.1941 (t-statistic −18.24 for
τ = 0)—indicating the presence of self-selection. The Joe model, preferred in Example 1, performs poorly in this case, barely managing to improve upon the Independent and Lee models,
where r = u + v − 1 − θ
1
1
u + v
ϕ −1 (t)
1 The existence of any such family has been questioned by Jouini and Clemen, they write: ‘. . . Archimedean copulas
can be used to model only positive dependence . . . ’ (Jouini and Clemen (1996, p. 446)). Their view is due to a theorem (see Jouini and Clemen (1996, Theorem 10)) in which they prove that an Archimedean copula (for two or more
dimensions) with strict generator ϕ, such that ϕ −1 is completely monotonic, is bounded below by the Product copula
(complete monotonicity implies that for t ∈ [0, ∞) and all θ , all derivatives of ϕ −1 must exist and alternate in sign,
viz. (−1)k ∂ k ϕ −1 (t)/∂t k ≥ 0 for k = 0, 1, 2, 3, . . .). In the bivariate case, Jouini and Clemen’s view is incorrect. A
bivariate copula can be Archimedean under the weaker condition that ϕ −1 is convex ((Schweizer, 1991, Theorem 3.2);
convexity requires (−1)k ∂ k ϕ −1 (t)/∂t k ≥ 0 for k = 0, 1, 2, and leaves free the sign of the higher derivatives). Indeed,
the Frank family is Archimedean (see Table 1; Genest (1985)) with ϕ −1 (t) = −θ −1 log(1 + e−s (e−θ − 1)) convex, but
not completely monotonic, where this family is comprehensive.
c Royal Economic Society 2003
116
Murray D. Smith
both of which Prieger dismissed. The explanation for the poor performance of Joe’s model in
this example lies in the inability of the Joe family to represent negative dependence (ϕ −1 (t) =
1 − (1 − e−t )1/θ is completely monotonic, hence range 0 ≤ τ < 1 as per Jouini and Clemen’s
theorem). Finally, the AP model can be seen to outperform the others for these data, with the
maximized value of the log-likelihood of −7613.61 well above that obtained by Prieger’s preferred FGM model. The estimate b
τ = −0.0789 is significantly negative (t-statistic −3.14 for
τ = 0), indicating the presence of self-selection in these data.
To further contrast the models consider, for example, estimation of the mean duration of total
hospitalization, given that the individual is admitted. For Archimedean models
Z ∞
∗
E[Y | Y1 > 0] =
y f 2|1 (y | Y1∗ > 0)dy
0
Z ∞
1
q
ϕ 0 (F2 )
=
−
y 0
f 2 dy .
1 − F1 λ
ϕ (Cθ )
0
For only one visit to hospital (q = 1) and at x = x (x collects the covariate averages across the
1346 hospitalized individuals), this evaluates to 3.74 days for the AP model. For other, worsefitting models, estimates are 3.97 days for the AMH model, and 3.95 days for the FGM model. If
the selectivity in these data is ignored, the mean duration estimated by the Independence model
is considerably larger at 4.43 days.
5. EXTENSIONS
In this section, brief explanations of the derivation of the likelihoods of the switching regimes
model and the double-selection model are given, where it is assumed that the copulas representing the joint cdfs are Archimedean. In both instances, derivation is based on the existence of a
trio of latent utilitarian variables (Y1∗ , Y2∗ , Y3∗ ), the margins of which have cdf and pdf denoted
respectively by Fi (yi∗ ) and f i (yi∗ ), for yi∗ ∈ R (i = 1, 2, 3). It is assumed that these margins
depend on covariates and parameters; however, their specification is not of concern here.
5.1. The switching regimes model
The switching regimes model (also referred to as the extended or utility-based Roy model of
selectivity) arises when observations on the trio of random variables (S, Y2 , Y3 ) are generated
according to the following observation rules:
S = 1{Y1∗ > 0},
Y2 = 1{Y1∗ > 0}Y2∗ ,
Y3 = 1{Y1∗ ≤ 0}Y3∗ .
Basically, Y2∗ is observed when Y1∗ > 0, otherwise it is Y3∗ that is observed; the switching mechanism S is binary. Note that dummy values of 0 are assigned to Y2 and Y3 as required, according
to the outcomes of the switch S. Here, Fi (yi∗ ) is assumed continuous throughout the support of
Yi∗ , for i = 2, 3. Vijverberg (1993) cites a number of applications of this model.
Suppose that data (s j , y2 j , y3 j ) denotes the jth observation on (S, Y2 , Y3 ), j = 1, . . . , n. For
a random sample of size n, the likelihood is given by
c Royal Economic Society 2003
117
Modelling sample selection
L=
YZ
0
=
Y
0
−∞
f 13 (y1∗ j , y3 j )dy1∗ j
f 3|1 (y3 j |
1
Y1∗j
≤
0
∞
YZ
0) Pr(Y1∗j
0
≤ 0)
f 12 (y1∗ j , y2 j )dy1∗ j
Y
f 2|1 (y2 j | Y1∗j > 0) Pr(Y1∗j > 0)
1
Y
Y ∂
∂
F13 (0, y3 )
F12 (0, y2 ) ,
=
f2 −
∂ y3
∂ y2
0
(19)
1
where F12 and F13 denote bivariate margins with respective pdfs f 12 and f 13 , and f 2|1 and f 3|1
denote univariate pdfs conditioned on the event shown involving Y1∗ . The first line of (19) is the
equivalent of Amemiya’s formula for L (see Amemiya (1985, equation (10.10.2))), requiring that
Y1∗ is continuous. The second line expresses the likelihood in terms of the binary switch, similar
to (12), and it is more general than the expression given in the previous line in that Y1∗ need not
be continuous. The third line expresses the likelihood in terms of the underlying margins, by
analogy with (13); the presence of the differentials is due to the continuity assumptions on Y2∗
and Y3∗ .
From the general form of the likelihood (19), it is clear that any association parameters that
may exist between Y2∗ and Y3∗ cannot be identified as L is not a function of these parameters
(L does not depend on the bivariate margin F23 nor on the trivariate F); for further discussion,
see Heckman and Honor´e (1990). This implies that it is superfluous to specify F, the trivariate
distribution of (Y1∗ , Y2∗ , Y3∗ ).
Under the copula approach, modelling a switching regimes process proceeds by specifying
margins Fi , and the copulas that represent the bivariate margins F12 and F13 . Let ϕ and η denote
respectively the generators of the (Archimedean) copulas that represent F12 and F13 . Then, the
likelihood is given by
!
Y
Y η0 (F3 )
ϕ 0 (F2 )
f3
1−
f2
L=
ϕ 0 (Cθ12 )
η0 (Cλ13 )
1
0
where Cθ12 = ϕ −1 (ϕ(F1 (0)) + ϕ(F2 (y2 ))) and Cλ13 = η−1 (η(F1 (0)) + η(F3 (y3 ))), and θ and λ
collect the relevant association parameters. It is not necessary that the generators ϕ and η have
the same functional form, but this can be imposed. For given specifications of the margins Fi
and generators ϕ and η, it is easy to construct the likelihood by adapting the quantities given in
Table 2.
5.2. The double-selection model
The double-selection model arises when observations on a trio of random variables (S1 , S2 , Y )
are generated according to the following observation rules:
S1 = 1{Y1∗ > 0}
S2 = 1{Y1∗ > 0, Y2∗ > 0}
Y = 1{Y1∗ > 0, Y2∗ > 0}Y3∗ .
In this model, the two binary selectors S1 and S2 serve to partition the total sample; note that S1 is
determined in sequence prior to S2 . Figure 2 depicts the sample space of outcomes of (S1 , S2 , Y )
as the branches of a decision tree.
Tunali (1986) cites a number of applications of the double-selection model. A recent example is given by Henneberger and Sousa-Poza (1998), whose double-selection model is designed
c Royal Economic Society 2003
118
Murray D. Smith
S1= 1
S1= 0
(0, 0, 0)
S2= 0
S2=1
(1,1,Y)
(1, 0, 0)
Figure 2. Decision tree of outcomes of (S1 , S2 , Y ).
to account for survey non-response: S1 is used to indicate labour force participation, and S2 to
indicate whether or not participants report earnings; those electing to do so report wage earnings Y .
Let si1 , . . . , sin denote n observations on Si (si j ∈ {0, 1}, i = 1, 2, j = 1, . . . , n), and
y1 , . . . , yn the corresponding n observations on Y (y j ∈ R, j = 1, . . . , n). For a random sample
of n observations, the likelihood function for the double-selection model is given by
L=
Y
Pr(Y1∗j ≤ 0)
0
×
Y
Pr(Y2∗j ≤ 0 | Y1∗j > 0) Pr(Y1∗j > 0)
1
Y
f 3|12 (y j | Y1∗j > 0, Y2∗j > 0) Pr(Y1∗j > 0, Y2∗j > 0),
2
Q
Q
where 0 indicates the product over those observations
Q for which s1 = 0, 1 the product over
those observations for which s1 = 1 and s2 = 0, and 2 the product over those observations for
which s1 = s2 = 1. Here
1 ∂
(F3 (y) − F13 (0, y) − F23 (0, y) + F(0, 0, y))
p ∂y
1
∂
∂
∂
=
f 3 (y) −
F13 (0, y) −
F23 (0, y) +
F(0, 0, y) ,
p
∂y
∂y
∂y
f 3|12 (y | Y1∗ > 0, Y2∗ > 0) =
where p = Pr(Y1∗ > 0, Y2∗ > 0); obviously Y3∗ is required to be a continuous random variable.
Substitution then yields
c Royal Economic Society 2003
119
Modelling sample selection
L=
Y
F1 (0)
0
×
Y
(F2 (0) − F12 (0, 0))
1
Y
2
∂
∂
∂
f 3 (y) −
F13 (0, y) −
F23 (0, y) +
F(0, 0, y)
∂y
∂y
∂y
(20)
as the general form of the likelihood.
Let the 3-copula representation of the joint cdf of (Y1∗ , Y2∗ , Y3∗ ) be as follows:
F(y1∗ , y2∗ , y3∗ ) = Cθ (F1 (y1∗ ), F2 (y2∗ ), F3 (y3∗ ))
and assume that Cθ is a three-part family of Archimedean copulas, with additive generator ϕ; in
other words, Cθ is specified such that
ϕ(Cθ (u, v, w)) = ϕ(u) + ϕ(v) + ϕ(w)
for all real (u, v, w) ∈ I3 . Moreover, because the dimensionality in this case is greater than 2, the
inverse function ϕ −1 must be continuous on [0, ∞) and be completely monotonic (see Nelsen
(1999, Section 4.6)). An example of a family of 3-copulas is (u −θ + v −θ + w−θ − 2)−1/θ , θ > 0,
termed the Clayton 3-copula. As the bivariate margins of F are themselves Archimedean with
generator ϕ, then the derivatives appearing in (20) simplify to;
ϕ 0 (F3 (y))
∂
Fi3 (0, y) = 0
f 3 (y)
∂y
ϕ (Fi3 (0, y))
∂
ϕ 0 (F3 (y))
F(0, 0, y) = 0
f 3 (y).
∂y
ϕ (F(0, 0, y))
(i = 1, 2)
Substitution into (20) yields
Y
Y
L=
F1 (0) (F2 (0) − F12 (0, 0))
0
1
Y
×
1−
2
ϕ 0 (F3 (y))
ϕ 0 (F3 (y))
ϕ 0 (F3 (y))
−
+
ϕ 0 (F13 (0, y)) ϕ 0 (F23 (0, y)) ϕ 0 (F(0, 0, y))
f 3 (y).
(21)
Observe that the bivariate and trivariate cdfs appearing as the arguments of ϕ 0 in (21) only ever
require univariate integration in order to be evaluated, as per
Fi3 (0, y) = ϕ −1 (ϕ(Fi (0)) + ϕ(F3 (y)))
(i = 1, 2)
and
F(0, 0, y) = ϕ −1 (ϕ(F1 (0)) + ϕ(F2 (0)) + ϕ(F3 (y))).
Thus, computation of the likelihood under an Archimedean copula is straightforward. In contrast,
if F123 is assumed trivariate normal, evaluating the likelihood requires greater numerical effort:
Fi3 (0, y) requires at least one numerical integration, and for F(0, 0, y) at least two are required.
Comparing the likelihoods of the self-selection model (17) and its extension to the doubleselection model (21), suggests that a similar pattern of terms will arise if further numbers of
(sequential) binary selection mechanisms happen to be present in the data, provided, of course,
that model specification assumes the joint cdf can be represented by an Archimedean copula. The
c Royal Economic Society 2003
120
Murray D. Smith
likelihood will then have a number of terms involving a ratio of derivatives of the generator, with
all the marginal cdfs (of whatever dimension) calculable with only univariate integration. In these
higher dimensional models (the triple-selection model etc), the specification of Archimedean
copulas will still allow ML estimation to proceed using standard iterative algorithms. This then
neatly avoids the need for higher dimensional numerical integration, or for estimation based on
simulation methods.
6. CONCLUSION
In this article, a copula approach was used in the specification of binary models that are designed
to account for data selectivity. This involved specifying distributions for each of the margins, as
well as selecting a family of copulas. It was shown that previous modelling attempts that have
appeared in the selectivity literature (notably Lee (1983)) corresponded to the use of particular
families of copulas. When selecting copulas, the class of Archimedean copulas was, in particular,
shown to have a number of attractive properties. Not least among these was the simple form taken
by the likelihood, which was shown to involve a ratio of derivatives of the generator function.
ACKNOWLEDGEMENTS
This paper was written while visiting the Payments Policy Department of the Reserve Bank of
Australia and the Institute for Economics and Social Statistics of the University of Dortmund,
Germany. Access to the facilities of the Bank and the University of Dortmund are gratefully
acknowledged. In addition, financial assistance from the Alexander von Humboldt Foundation
is acknowledged gratefully. Thanks are also due to the co-Editor Pravin Trivedi as well as the
anonymous referees for a number of helpful comments. Others providing helpful comments and
suggestions include Walter Kr¨amer, Christian Kleiber, Jerry Hausman and seminar participants
at the universities of Melbourne, Dortmund, Amsterdam, Zurich and Munich. Any remaining
errors are entirely my responsibility.
REFERENCES
Amemiya, T. (1973). Regression analysis when the dependent variable is truncated normal. Econometrica
41, 997–1016.
Amemiya, T. (1985). Advanced Econometrics. Cambridge, MA: Harvard.
Bouy´e, E., V. Durrleman, A. Nikeghbali, G. Riboulet and T. Roncalli (2000). Copulas for finance: a reading
guide and some applications, All About Value at Risk Working Papers (download from http://www.
gloriamundi.org/var/wps.html).
Clemen, R. T. and T. Reilly (1999). Correlations and copulas for decision and risk analysis. Management
Science 45, 208–24.
Cragg, J. G. (1971). Some statistical models for limited dependent variables with applications to the demand
for durable goods. Econometrica 39, 829–44.
Dall’Aglio, G. (1991). Frechet classes: the beginnings. In G. Dall’Aglio, S. Kotz and G. Salinetti (eds),
Advances in Probability Distributions with Given Marginals: Beyond the Copulas, Chapter 1, pp. 13–50.
Dordrecht: Kluwer.
c Royal Economic Society 2003
Modelling sample selection
121
Dardanoni, V. and P. Lambert (2001). Horizontal inequity comparisons. Social Choice and Welfare 18,
799–816.
Fisher, N. I. (1997). Copulas. In S. Kotz, C. B. Read and D. L. Banks (eds), Encyclopedia of Statistical
Sciences, Update vol. 1, pp. 159–63. New York: Wiley.
Galambos, J. (1978). The Asymptotic Theory of Extreme Order Statistics. New York: Wiley.
Genest, C. (1985). Frank’s family of bivariate distributions. Biometrika 74, 549–55.
Genest, C., K. Ghoudi and L.-P. Rivest (1995). A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika 82, 542–52.
Genest, C. and J. MacKay (1986). The joy of copulas: bivariate distributions with uniform marginals. American Statistician 40, 280–3.
Genest, C. and L.-P. Rivet (1993). Statistical inference procedures for bivariate Archimedean copulas. Journal of the American Statistical Association 88, 1034–43.
H¨ardle, W. and C. F. Manski (eds.) (1993). Nonparametric and semiparametric approaches to discrete response analysis. Annals of the Journal of Econometrics 58, 1–274.
Heckman, J. J. (1974). Shadow prices, market wages and labor supply. Econometrica 42, 679–94.
Heckman, J. J. and B. E. Honor´e (1990). The empirical content of the Roy model. Econometrica 58,
1121–49.
Henessey, D. A. and H. E. Lapan (2002). The use of Archimedean copulas to model portfolio allocations.
Mathematical Finance 12, 143–54.
Henneberger, F. and A. Sousa-Poza (1998). Estimating wage functions and wage discrimination using data
from the 1995 Swiss labour force survey: a double-selectivity approach. International Journal of Manpower 19, 486–506.
Joe, H. (1997). Multivariate Models and Dependence Concepts. London: Chapman and Hall.
Joe, H. and J. J. Xu. (1996). The estimation method of inference functions for margins for multivariate
models, Technical Report no. 166, Department of Statistics, University of British Columbia (download
from http://hajek.stat.ubc.ca/~harry/ifm.pdf).
Jouini, M. N. and R. T. Clemen (1996). Copula models for aggregating expert opinions. Operations
Research 44, 444–57.
Kimeldorf, G. and A. R. Sampson (1975). Uniform representations of bivariate distributions. Communications in Statistics, Theory and Method 4, 617–27.
Kwerel, S. M. (1983). Fr´echet Bounds. In S. Kotz and N. L. Johnson (eds), Encyclopedia of Statistical
Sciences, vol. 3, pp. 202–9. New York: Wiley.
Lee, L.-F. (1983). Generalized econometric models with selectivity. Econometrica 51, 507–12.
Lee, M.-J. (1996). Methods of Moments and Semiparametric Econometrics for Limited Dependent Variable
Models. New York: Springer.
Maddala, G. S. (1983). Limited-Dependent and Qualitative Variables in Econometrics. Cambridge: Cambridge University Press.
Maddala, G. S. (ed.) (1994). Econometric Methods and Applications, vol. 2. Aldershot: Edward Elgar.
Mari, D. D. and S. Kotz (2001). Correlation and Dependence. London: Imperial College Press.
McCullagh, P. and J. A. Nelder (1989). Generalized Linear Models, 2nd edn. London: Chapman and Hall.
Nelsen, R. B. (1999). An Introduction to Copulas. New York: Springer.
Newey, W. K. and D. L. McFadden (1994). Large sample estimation and hypothesis testing. In R. F. Engle
and D. L. McFadden (eds), Handbook of Econometrics, Chapter 36, vol. 4. New York: North-Holland.
Patton, A. J. (2001). Modelling time-varying exchange rate dependence using the conditional copula,
Department of Economics Discussion Paper 2001-09, San Diego: University of California.
Prieger, J. E. (2002). A flexible parametric selection model for non-normal data with application to health
care usage. Journal of Applied Econometrics 17, 367–92.
c Royal Economic Society 2003
122
Murray D. Smith
Schweizer, B. (1991). Thirty years of copulas. In G. Dall’Aglio, S. Kotz and G. Salinetti (eds), Advances in
Probability Distributions with Given Marginals: Beyond the Copulas, Chapter 2, pp. 13–50. Dordrecht:
Kluwer.
Schweizer, B. and A. Sklar (1983). Probabilistic Metric Spaces. New York: North-Holland.
Smith, M. D. (2002). On specifying double-hurdle models. In A. Ullah, A. Wan and A. Chaturvedi (eds),
Handbook of Applied Econometrics and Statistical Inference, Chapter 25, pp. 535–52. New York:
Marcel-Dekker.
Tunali, I. (1986). A general structure for models of double-selection and an application to a joint migration/earnings process with remigration. Research in Labor Economics 8B, 235–82.
Valdez, E. A. (2001). Copula Models for Sums of Dependent Risks, School of Actuarial Studies,
University of New South Wales (download from http://www.actuarial.unsw.edu.au/events/
symposiums/2001/EValdez.pdf).
Vella, F. (1998). Estimating models with sample selection bias: a survey. The Journal of Human Resources
33, 127–43.
Vijverberg, W. P. M. (1993). Measuring the unidentified parameter of the extended Roy model of selectivity.
Journal of Econometrics 57, 69–90.
APPENDIX: DERIVATION OF THE SELF-SELECTION SCORE
In this section, the score function is derived for the self-selection model under a family of Archimedean
copulas in terms of derivatives of the generator function ϕ. In turn, the score depends on the following
derivatives:
∂ F1
∂ F2
∂ log f 2
∂ F2
∂ log f 2
,
,
,
,
,
∂β1
∂β2
∂β2
∂σ
∂σ
all of which can be determined once a particular functional form is assumed for the margins F1 (y1∗ ) and
F2 (y2∗ ). Moreover, as Cθ depends on all parameters, then, using (11), the following derivatives will be
required when constructing the score vector
ϕ 0 (F ) ∂ F1
∂Cθ
= 0 1
,
∂β1
ϕ (Cθ ) ∂β1
∂Cθ
ϕ 0 (F ) ∂ F2
= 0 2
,
∂β2
ϕ (Cθ ) ∂β2
∂Cθ
ϕ 0 (F ) ∂ F2
= 0 2
.
∂σ
ϕ (Cθ ) ∂σ
From (17), the log-likelihood function for parameter λ = (β1 , β2 , σ, θ) for the self-selection model
under an Archimedean copula is given by
X
X
ϕ 0 (F )
log L =
log F1 +
log f 2 + log 1 − 0 2
,
ϕ (Cθ )
0
1
P
P
where 0 denotes the sum over those observations for which s j = 0 and 1 denotes the sum over those
observations for which s j = 1. The component of the score due to β1 is given by
0
X ∂ log F1 X ∂
ϕ 0 (Cθ )
ϕ (F2 ) 00
∂Cθ
ϕ (Cθ )
log L =
+
∂β1
∂β1
ϕ 0 (Cθ ) − ϕ 0 (F2 ) ϕ 0 (Cθ )2
∂β1
0
1
X 1 ∂ F1 X
∂F
=
+
α1 α2 1 ,
F1 ∂β1
∂β1
1
0
where the scalars α1 and α2 are given by
α1 =
ϕ 0 (Cθ )
0
ϕ (Cθ ) − ϕ 0 (F2 )
and
α2 =
ϕ 0 (F1 )ϕ 0 (F2 )ϕ 00 (Cθ )
.
ϕ 0 (Cθ )3
c Royal Economic Society 2003
123
Modelling sample selection
The component of the score due to β2 is given by



θ
ϕ 0 (Cθ )ϕ 00 (F2 ) ∂∂βF2 − ϕ 0 (F2 )ϕ 00 (Cθ ) ∂C
X ∂ log f 2
∂
∂β
2
2


log L =
− α1 
∂β2
∂β2
ϕ 0 (Cθ )2
1
!
!
X ∂ log f 2
ϕ 00 (F2 ) ϕ 0 (F2 )2 ϕ 00 (Cθ ) ∂ F2
=
−
− α1
∂β2
ϕ 0 (Cθ )
∂β2
ϕ 0 (Cθ )3
1
X ∂ log f 2
∂F
=
− α3 2 ,
∂β2
∂β2
1
where the scalar α3 is given by
α3 = α1
!
ϕ 00 (F2 ) ϕ 0 (F2 )2 ϕ 00 (Cθ )
−
.
ϕ 0 (Cθ )
ϕ 0 (Cθ )3
The component of the score due to σ is given by
F2
θ
X ∂ log f 2
ϕ 0 (Cθ )ϕ 00 (F2 ) ∂∂σ
− ϕ 0 (F2 )ϕ 00 (Cθ ) ∂C
∂
∂σ
log L =
− α1
∂σ
∂σ
ϕ 0 (Cθ )2
1
X ∂ log f 2
∂F
=
− α3 2 .
∂σ
∂σ
!!
1
Clearly α1 , α2 and α3 depend on the generator ϕ, as well as every parameter in the model. The component
of the score due to θ has a more complicated form


0 (C )
0 (F )
θ
2
X
ϕ 0 (F2 ) ∂ϕ ∂θ
− ϕ 0 (Cθ ) ∂ϕ∂θ
∂

log L =
α1 
∂θ
ϕ 0 (Cθ )2
1
!
X
ϕ 0 (F2 ) ∂ϕ 0 (Cθ ) ϕθ0 (F2 )
=
α1
− 0
,
(22)
ϕ (Cθ )
ϕ 0 (Cθ )2 ∂θ
1
where
ϕθ0 (t) =
∂2
ϕ(t).
∂t∂θ
∂ ϕ 0 (C ) can be difficult to simplify. This is because
In (22), the expression for ∂θ
θ
∂ 0
∂ 0
ϕ (Cθ ) =
ϕ (Cθ (u, v))
∂θ
∂θ
u→F1 ,v→F2
depends on θ through both the function ϕ 0 and the argument supplied to it, Cθ . However, for a given family
∂ ϕ 0 (C (u, v)) can be derived. For
of Archimedean copulas, such as those listed in Table 1, the form of ∂θ
θ
example, for the Clayton family (10)
!
∂ 0
1
θ + 1 v θ log u + u θ log v
−(θ+1)
ϕ (C) = C
− log C .
∂θ
θ
θ
u θ + v θ − (uv)θ
∂ ϕ 0 (C (u, v)) is straightforward if a computer
The task of evaluating the differential forms α1 , α2 , α3 and ∂θ
θ
algebra system such as Mathematica is employed.
c Royal Economic Society 2003