Sample Selection Regression Models (Ch. 17)

Sample Selection Regression Models (Ch. 17)
Until now we always assumed to have a random sample
Now we cover cases where no random sample is available
We look at two different cases
- the sample was collected/selected according to some value of y
- the sample is selected by behaviour of the population under
consideration (self-selection)
The assumption that a random sample from the underlying population is available is not always
realistic.
Selected sample: non random sample, selection mechanisms due to sample design, or to behaviour
of the persons being sampled (including non response on survey questions, attrition from social
programs)
Microeconometrics
Examples:
Saving function
Estimate a saving function for all families in a given country:
saving = β 0
+ β1 income + β 2 age + β3 married + β 4 children + u,
age is the age of the household head.
We have data on families whose household head is > 45 years old
Æ leads to sample selection problem, because we are interested in all
families and have a random sample only for a subset of the population
Æ Selection on basis of x
2
Microeconometrics
Examples
Family wealth function
Effect of pension plan on wealth accumulation
Estimate effect of worker eligibility in a pension plan on family wealth
wealth = β 0
+ β1 plan + β 2 educ + β3 age + β 4 income + u,
plan is an indicator for eligibility.
(17.2) y = β 0 + β1 plan + β 2 x + u
The sample only contains people with wealth less than 100'000
Æ Selection on basis of y
(endogenous variable)
3
Microeconometrics
Wage offer function
Estimation of wage function for population in working age
But wages are only observed for workers
Æ y is only observable for subsample which is defined by another variable
(working)
Æ Self selection: decision to work depends on wage
Estimate a wage offer equation for people of working age. However, data (wage) are only available
for working people. Sample selection problem often called incidental truncation, because wage is
missing as a result of another outcome, participation to the labour force.
4
Microeconometrics
When can Sample Selection Be Ignored?
Conditions under which 2SLS using the selected sample is consistent.
Population represented by vector (x, y, z)
x: 1 x K
y: 1 x 1
z: 1 x L
Population model:
(17.3) y = β1 + β 2 x2 + ... + β K xK + u
(17.4) E (u | z ) = 0
This is stronger than we need for 2SLS to be consistent!
Special case: z = x Æ x is exogenous
General treatment Æ x can be endogenous
5
Microeconometrics
With a random sample (17.3) can be estimated consistently with 2SLS
(if rank[E(z’x)]=K)
(17.5) E ( y | x) = β1 + β 2 x2 + ... + β K xK
No random sample Æ available data follow selection rule.
s: binary selection indicator
s = 1: observation is used
s = 0: observation is not used
Key assumption
(17.6) E (u | z, s ) = 0
6
Microeconometrics
(17.6) can follow directly from (17.4)
- s is deterministic function of z Æ E (u | z, s ) = E (u | z ) . In this case
selection follows a fixed rule which only depends on exogenous
variables
- Selection is independent of (z,u) Æ E (u | z, s) = E (u | z )
In estimating (17.3) we apply 2SLS to observations with s = 1.
The observed sample is {(xi , yi , zi , si ) : i = 1,... N }. Observation i is used if
si = 1.
7
Microeconometrics
The 2SLS estimator with the selected sample is
⎡
⎤
′⎛
N
N
N
⎛
⎞
⎞
⎛
⎞
βˆ = ⎢⎜ N −1 ∑ si zi ' xi ⎟ ⎜ N −1 ∑ si zi ' zi ⎟ ⎜ N −1 ∑ si zi ' xi ⎟⎥
⎢⎝
i =1
i =1
i =1
⎠ ⎝
⎠ ⎝
⎠⎥
⎣
⎦
−1
−1
−1
⎡
⎤
′
N
N
N
⎛
⎞
⎛
⎞
⎛
⎞
× ⎢⎜ N −1 ∑ si zi ' xi ⎟ ⎜ N −1 ∑ si zi ' zi ⎟ ⎜ N −1 ∑ si zi ' yi ⎟ ⎥
⎢⎝
⎠ ⎝
⎠ ⎝
⎠⎥
i =1
i =1
i =1
⎣
⎦
Substituting yi = xi β + ui gives
⎡
⎤
′⎛
N
N
N
⎛
⎞
⎞
⎛
⎞
βˆ = β + ⎢⎜ N −1 ∑ si zi ' xi ⎟ ⎜ N −1 ∑ si zi ' zi ⎟ ⎜ N −1 ∑ si zi ' xi ⎟⎥
⎢⎝
i =1
i =1
i =1
⎠ ⎝
⎠ ⎝
⎠⎥
⎣
⎦
−1
−1
⎡
⎤
′
N
N
N
⎛
⎞
⎛
⎞
⎛
⎞
× ⎢⎜ N −1 ∑ si zi ' xi ⎟ ⎜ N −1 ∑ si zi ' zi ⎟ ⎜ N −1 ∑ si zi ' ui ⎟ ⎥
⎢⎝
⎠ ⎝
⎠ ⎝
⎠⎥
i =1
i =1
i =1
⎣
⎦
8
−1
Microeconometrics
By assumption E (ui | z i , si ) = 0 and so E ( si zi ' ui ) = 0 (Law of iterated
expectations)
Æ plim βˆ = β (by law of large numbers)
Theorem 17.1 (Consistency of 2SLS under Sample Selection)
In model (17.3) assume that
- (17.6)
E (u | z, s ) = 0
- (17.8)
rank E (z ' z | s = 1) = L
- (17.9)
rank E (z ' x | s = 1) = K
Then the 2SLS estimator using the selected sample is consistent for β and
asymptotically normally distributed
9
Microeconometrics
Under homoskedasticity,
(
E (u 2 z , s ) = σ 2
)
−1
'
A var N βˆ − β = σ 2 ⎡ E ( sz ' x ) E ( sz ' z ) E ( sz ' x ) ⎤
⎢⎣
⎥⎦
How to estimate
⎛ N ⎞
⎜ ∑ si ⎟
⎝ i =1 ⎠
−1 N
∑ s uˆ
i =1
2
i i
σ2
−1
with
?
p
⎯⎯
→σ 2
(mean in selected sample)
Why is this consistent ?
E[ su 2 ] = E[ s ] ⋅ σ 2
Hence
E[ su 2 ] N ⋅ E[ su 2 ]
=
σ =
E[ s ]
N ⋅ E[ s ]
2
10
Microeconometrics
Example 4: Nonrandomly Missing IQ Scores
log( wage) = z1δ1 + abil + v, E ( v z1 , abil , IQ ) = 0
Assume that IQ is a valid proxy for abil
(good instrument:
correlated with abil and independent from e conditional on z1):
abil = θ1 IQ + e,
log( wage) = z1δ1 + θ1 IQ + u
Under these assumptions,
E ( e z1 , IQ ) = 0
u=v+e
E ( u z1 , IQ ) = 0
.
By Theorem 1, if we choose the sample excluding all people with IQs
below a fixed value, then OLS estimation on the last equation will be
consistent. (selection on exogenous variable)
11
Microeconometrics
6.2.2 Nonlinear Models
1. If E ( y x , s ) = E ( y x ) , then selection is ignorable
and NLS on the selected sample is consistent:
Why?
min N
β
−1
N
∑ si ⎡⎣ yi − m ( xi , β )⎦⎤
i =1
We use that y − m ( x, β ) = y − m ( x, β0 ) + m ( x, β0 ) − m ( x, β )
and that E ⎡⎣ y − m ( x, β0 ) | x, s ⎤⎦ = 0
E[ s ⋅ ( y − m ( x, β )) 2 ] = E[ s ⋅ E[( y − m ( x, β ))2 | x, s]]
E[ s ⋅ u 2 | x, s]] + E[ s ⋅ (m ( x, β ) − m ( x, β 0 ))2 | x, s]]
because
in
minimizes {
estimated by OLS of y on x, using the selected sample.
β0
E ⎡⎣ y x ⎤⎦ = m ( x, β 0 )
E s ⎡⎣ y − m ( x, β ) ⎦⎤
12
2
}
2
Microeconometrics
2. General conditional ML setup
If distribution D ( y x , s ) = D ( y x ) , then selection again is ignorable.
This assumption holds if s in a nonrandom function of x or if s is
independent of (x, y).
In this case, MLE on the selected sample is consistent:
because for each x , θ 0 maximizes
Now write
E ⎡⎣l ( y, x,θ ) x ⎤⎦
{
}
over
max N
−1
θ
∑ s l ( y , x ,θ )
i =1
Θ
{
}
E ⎡⎣ sl ( y, x,θ ) ⎤⎦ = E sE ⎡⎣l ( y, x,θ ) x , s ⎤⎦ = E sE ⎡⎣l ( y , x,θ ) x ⎤⎦
Because, for every x,
E ⎡⎣l ( y, x,θ ) x ⎤⎦
it must also be the case that
is maximized at θ 0
{
} is maximized at θ
E sE ⎡⎣l ( y , x,θ ) x ⎤⎦
13
N
0
i
i
i
Microeconometrics
θ 0 maximizes
{
}
E sE ⎡⎣l ( y , x,θ ) x ⎤⎦
14
Microeconometrics
Truncated Regression (Selection on Response Variable)
(xi , yi ) random draw from population: estimate E ( yi | xi ) in this population.
But we only observe sample selected on value of y
Examples: wealth, wage in specific samples, ...
yi is continuous variable
The selection rule is si = 1[a1 < yi < a2 ]
were a1 and a2 are known.
We observe (xi , yi ) if a1 < yi < a2 , otherwise we observe neither y nor x.
In most cases we want to estimate E ( yi | xi ) = xi β .
15
Microeconometrics
Æ we need specification of full conditional distribution of yi | xi
Specify the conditional density of yi | x i
f (⋅ | xi ; β , γ ) ,
where γ are additional parameters, e.g. variance
cdf of yi | x i is given by F (⋅ | xi ; β , γ )
In estimating E ( yi | xi ) we must condition on a1 < yi < a2 , i.e. si = 1
Selection rule indicates that if yi
falls in
( a1 , a2 ) , then both
yi and x i are observed; if yi
is
outside this interval, then we do not observe yi and xi . In estimation, use the density of yi
conditional on xi and the fact that we observe ( yi , xi )
The cdf of yi | x i , si = 1 is
P ( yi ≤ c, si = 1| xi )
P( yi ≤ c | xi , si = 1) =
P( si = 1 | xi )
16
Microeconometrics
P( si = 1| xi ) = P(a1 < y < a2 | xi ) = F (a2 | xi ; β , γ ) − F (a1 | xi ; β , γ )
If y is truncated from one side only then either: a1 = -∞ or a2 = ∞
To obtain the numerator above we write
P( yi ≤ c, si = 1 | xi ) = P (a1 < y < c | xi ) = F (c | xi ; β , γ ) − F (a1 | xi ; β , γ )
Plug this into the above equation and take the derivative with respect to c
we get the density of yi given (xi,si)
(17.14)
f ( c | xi ; β , γ )
p(c | xi , si = 1) =
F (a2 | xi ; β , γ ) − F (a1 | xi ; β , γ )
(17.14) is valid irrespective of specific distributional assumption.
Usually we assume a normal distribution for f.
Assume further that E ( y | x) = xβ
17
for a1 < c < a2
Microeconometrics
Then we have
1 ⎛ yi − xi β ⎞
φ⎜
σ ⎝ σ ⎟⎠
f ( yi | x, si = 1) =
⎛ a2 − xi β ⎞
⎛ a1 − xi β ⎞
Φ⎜
⎟ − Φ⎜
⎟
σ
σ
⎝
⎠
⎝
⎠
In many cases a1 = 0 und a2 = ∞ .
The CMLEs of β and γ using the selected sample are efficient in the class of estimators not using
information about the distribution of x.
18
Microeconometrics
6.3 Selection on Basis of the Response Variable: Truncated Regression
In most applications of truncated samples, the population conditional
distribution is assumed to be N ( xβ ,σ ) : truncated Tobit model or
truncated normal regression model.
2
The truncated Tobit model is related to the censored Tobit model for datacensoring applications (see Chapter 5). The key difference between
censored and truncated regressions is that in censored regression, we
observe x for all people even if y is not known.
heteroskedasticity or nonnormality in truncated regression results in
inconsistent estimators of β.
19
Microeconometrics
Example:
set obs 10000
g x = uniform()
g u = invnorm(uniform())
replace u = 2*u
g y = 1 + x + u
drop if y <= 0
reg y x
outreg using d:\stata\micro\out\trunc_sim, replace
truncreg y x, ll(0)
outreg using d:\stata\micro\out\trunc_sim, append
OLS
0.536
(9.11)**
2.011
(57.84)**
TRUNCREG
x
0.951
(9.00)**
Constant
1.047
(13.24)**
1.97 (64.5)**
σ
Observations
7708
7708
Absolute value of t statistics in parentheses
* significant at 5%; ** significant at 1%
20
Microeconometrics
17.4 Probit Selection
Sample selection is not result of sample design but due to decisions made by
members of the population (self selection)
Exogenous explanatory variable
Classic example: Labour force participation and wages
We want to know: E ( wi | xi ) for a person randomly drawn from the
population (w : wage)
w is only observed for working people.
21
Microeconometrics
Model of labour supply:
(17.15)
max U i ( wi h + ai , h)
wrt
h
h: hours of work per week
0 ≤ h ≤ 168
a non-labour income
si (h) ≡ U i ( wi h + ai , h) and h < 168
Possible solutions: h = 0 or 0 < h < 168
If d si /d h ≤ 0 at h = 0 ⇒ h = 0
22
Microeconometrics
This implies that h = 0 if
(17.16)
wi ≤ −muih (ai ,0) / muiq (ai ,0)
where muh is marginal disutility of work and muq is marginal utilty of
income. The righthand side of (17.16) is called the reservation wage wr.
Parametric assumptions:
(17.17)
wi = exp(xi1 β1 + ui1 )
wir = exp(xi 2 β 2 + γ 2 ai + ui 2 )
(u11 , ui2) independent of (xi1 , xi2 , ai). xi1 contains productivity
characteristics and xi2 contains charactistics that determine marginal utility
of leisure and income (there may be an overlap)
(17.18)
log wi = xi1 β1 + ui1
23
Microeconometrics
But wage is only observed if w > wr, i.e.
log wi − log wir = xi1 β1 − xi 2 β 2 − γ 2 ai + ui1 − ui 2 ≡ xiδ 2 + v2 > 0
Problem: wr is not observed and depends on xi2 and ui2 ,
Æ wr is unknown constant Æ we need another estimation procedure
Notation: drop subscript i, y1 ≡ log w and y2 ist binary indicator
(17.19)
y1 = x1 β1 + u1
(17.20)
y2 = 1[xδ 2 + v2 > 0]
(17.20) is a probit if v2 is normally distributed
24
Microeconometrics
Assumptions 17.1: (a) (x,y2) are always observed, y1 only observed if y2 =1
(b) (u1,v2) is independent of x with zero mean
(c) v2 ~ N (0,1)
and
(d) E (u1 | v2 ) = γ 1v2
(a) describes the selection process;
(b) is strong exogeneity assumption;
(c) necessary to derive a conditional expectation given the selected sample;
(d) requires linearity of regression of u on v.
(d) always holds if (u1,v2) is bivariate normal (but it is not necessary
to assume that u is normally distributed).
25
Microeconometrics
Estimation of Selection Model
Let ( y1 , y2 , x, u1 , v2 ) denote a random draw from the population. Given the
selection rule we can hope to estimate
E ( yi | x, y2 = 1) and P ( y2 = 1| x)
How does E ( yi | x, y2 = 1) depend on β1?
First, note that
(17.21)
E ( yi | x, v2 ) = x1 β1 + E (u1 | x, v2 ) = x1 β1 + E (u1 | v2 ) = x1 β1 + γ 1v2
where the second equality follows because (u1,v2) is independent of x
If γ1 = 0 Æ no selection problem!
26
Microeconometrics
What if γ1 ≠ 0? Using iterated expectations on (17.21) gives
E ( yi | x, y2 ) = x1 β1 + γ 1 E ( v2 | x, y2 ) = x1 β1 + γ 1h(x, y2 )
where h(x, y2 ) = E ( v2 | x, y2 )
If we knew h(x, y2 ), we could estimate β1 und γ1 from
the regression of y1 on x and h(x, y2 ) (in the selected sample).
In the selected sample y2 = 1 Æ we only have to find h(x,1) .
h(x,1) = E ( v2 | v2 > −xδ 2 ) = λ (xδ 2 ) , where λ (⋅) =
27
φ (⋅)
Φ(⋅)
Microeconometrics
This follows from a special property of the normal distribution :
If z ~ N (0,1) then E ( z | z > c) =
The term λ (⋅) =
φ (⋅)
Φ(⋅)
φ (c)
1 − Φ(c)
is called the inverse of Mill’s ratio
This implies
(17.22)
E ( y1 | x, y2 = 1) = x1 β1 + γ 1λ (xδ 2 )
From (17.22) it is obvious that OLS of y on x1 in the selected sample omits
the term λ (xδ 2 ) Æ omitted variable bias
28
Microeconometrics
(17.22) also shows a way to consistently estimate β1.
Heckman (1979) has shown that β1 und γ1 can consistently be estimated in
the selected sample by regressing y on x1 and λ (xδ 2 ) .
But δ2 is unknown and must be estimated in a first step (using Probit).
29
Microeconometrics
Heckman Estimator
Step 1: Estimate Probit model
(17.23)
P( y2 = 1| x) = Φ( xiδ 2 )
using all observations.
Obtain λˆi 2 ≡ λ (xiδˆ2 )
Step 2: Estimate βˆ1 und γˆ1 using OLS in the selected sample
(17.24)
yi1 = xi1 β1 + γ 1λˆi 2 + ui
This estimator is consistent and asymptotically normally distributed
30
Microeconometrics
Simple test for selection bias:
under H0 (no selection bias) in (17.24) γ1 = 0 Æ t – test for γ1.
IMPORTANT: this test is only valid if the model is correctly specified
(distributional assumptions)
If γ1 ≠ 0 the standard errors of β1 must be corrected
- for heteroskedasticity
- because δ2 has been estimated in the first step
Stata does this for you if you use the command heckman
31
Microeconometrics
Theoretically, it is not necessary that x1 is a strict subset of x
Æ β1 is identified if x = x1 (because λ is nonlinear function of x)
However, in practice λ is often almost a linear function of x
Æ severe multicollinearity Æ very imprecise estimates
Î Strong recommendation: you should have at least one element in x that is
not in x1 (exclusion restriction)
32
Microeconometrics
0
1
lambda
2
3
Relation between xβ and λ
-4
-2
0
xb
2
33
4
Microeconometrics
use d:\stata\micro\data\mroz;
reg lwage educ exper expersq;
heckman lwage educ exper expersq, select (inlf = educ exper expersq
age kidslt6 kidsge6 nwifeinc) twostep;
heckman lwage educ exper expersq , select (inlf = educ exper expersq
) twostep;
Table 17.1
wage equation
OLS
educ
0.107
(7.60)**
0.042
(3.15)**
-0.001
(2.06)*
exper
expersq
mills:lambda
Constant
-0.522
(2.63)**
Heckman 2 Heckman 2 step
step
no excl. restr.
0.109
0.093
(7.03)**
(1.82)
0.044
0.021
(2.70)**
(0.28)
-0.001
-0.000
(1.96)
(0.27)
0.032
-0.270
(0.24)
(0.28)
-0.578
-0.010
(1.90)
(0.01)
34
Microeconometrics
selection equation
inlf:educ
inlf:exper
inlf:expersq
inlf:age
inlf:kidslt6
inlf:kidsge6
inlf:nwifeinc
inlf:Constant
lambda
sigma
0.131
(5.18)**
0.123
(6.59)**
-0.002
(3.15)**
-0.053
(6.23)**
-0.868
(7.33)**
0.036
(0.83)
-0.012
(2.48)*
0.270
(0.53)
.032
(0.24)
.663
0.097
(4.38)**
0.127
(7.12)**
-0.002
(4.12)**
-1.925
(6.67)**
-.270
(-0.28)
.691
Observations
428
753
753
R-squared
0.16
Absolute value of t statistics in parentheses
* significant at 5%; ** significant at 1%
35
Microeconometrics
Data generation for selection problem
set obs 10000
g x = uniform()
g z = uniform()
matrix c = (4, 1 \ 1, 1)
/*Kovarianzmatrix u1,v2*/
drawnorm u1 v2, n(10000) cov(c)
/*korrelierte Störterme */
g y1 = 1 + x + u1
g y2star = 0.5 + 0.5*x + 0.5*z + v2
g y2 = y2star>0.6
replace y1 = . if y2==0
reg y1 x
heckman y1 x, select (y2= x z) twostep
heckman y1 x, select (y2= x z)
heckman y1 x, select (y2= x) twostep
heckman y1 x z, select (y2= x z) twostep
36
Microeconometrics
Simulation results
OLS
x
0.758
(9.48)**
Heckman 2
step
Heckman
ML
2 step no
excl.
restr.
Structural equation
1.206
1.115
-0.258
(8.96)** (11.94)** (0.08)
z
lambda
1.611
-3.585
(4.44)**
(0.32)
Constant 1.683
0.557
0.793
4.216
(35.23)** (2.15)*
(7.54)** (0.53)
Selection equation
x
0.535
0.525
0.526
(11.83)** (11.65)** (11.67)**
z
0.457
0.474
(10.12)** (11.23)**
Constant
-0.093
-0.097
0.137
(2.73)** (2.93)** (5.36)**
rho
0.712
(8.93)**
Absolute value of t statistics in parentheses
* significant at 5%; ** significant at 1%
37
2 Step no
excl.
restr.
-1.432
(0.48)
-2.264
(0.88)
-7.603
(0.73)
8.219
(0.95)
0.535
(11.83)**
0.457
(10.12)**
-0.093
(2.73)**
Microeconometrics
Predictions after estimation of selection models
Often selection models are used to predict the dependent variable for the
observations not in the selected subsample
Example: expected wage of nonworkers
Correct prediction:
E ( yi1 | xi ) = xi βˆ
and NOT
E ( yi | xi , yi 2 = 1) = xi βˆ + γ 1λˆi 2 ≠ E ( yi1 | xi )
Stata
heckman lnlohn .../*selection model for ln(wage)*/
predict lnlohn_pred, e(.,.) /* prediction of ln(wage)*/
38
Microeconometrics
Joint ML estimator
If (c) and (d) in Assumption 1 are replaced by stronger assumption that ( u1 , v2 )
2
is bivariate normal with mean 0, Var ( u1 ) = σ 1 , Cov ( u1 , v2 ) = σ 12 , and Var ( v2 ) = 1
then partial likelihood estimation can be used. Partial MLE will be more efficient than the 2-step
procedure (Partial MLE using the density of y1 when y2 = 1 )
f ( y1 y2 = 1, x ) = P ( y2 = 1 y1 , x ) f ( y1 x ) / P ( y2 = 1 x ) , with
{
P ( y2 = 1 y1 , x ) = Φ ⎡⎣ xδ 2 + σ 12σ 1−2 ( y1 − x1β1 ) ⎤⎦ (1 − σ 122 σ 1−2 )
−1/ 2
}
the log-likelihood for observation i is:
li (θ ) = (1 − yi 2 ) log ⎡⎣1 − Φ ( xiδ 2 ) ⎤⎦ +
( {
yi 2 log Φ ⎡⎣ xiδ 2 + σ 12σ 1−2 ( yi1 − xi1β1 ) ⎤⎦ (1 − σ 122 σ 1−2 )
−1/ 2
39
} + logφ ⎡⎣( y
i1
− xi1β1 ) / σ 1 ⎤⎦ − log (σ 1 )
)
Microeconometrics
Endogenous Explanatory Variables
One element of x1 correlated with u1
(17.25)
y1 = z1δ 1 + α1 y2 + u1
(17.26)
y2 = zδ 2 + v2
(17.27)
y3 = 1[ zδ 3 + v3 > 0]
(17.25) is the structural equation to be estimated,
(17.26) is linear projection of the endogenous variable y2 (i.e. not structural)
(17.27) is the selection equation.
The correlations between u1, v2, v3 are unrestricted.
40
Microeconometrics
3 interesting cases:
- y2 is always observed, but endogenous in (17.25) (e.g. education in wage
equation)
- y2 is as well only observed if y3 = 1 . In this case y2 can be exogenous in
the population, but due to selection it becomes endogenous.
- y1 is always observed, but y2 only sometimes
If y1 and y2 were always observed along with z,
we would estimate (17.25) with 2SLS if y2 is endogenous.
In case of selection 2SLS with the inverse of Mill’s ratio added to the
regressors is consistent (only using the selcted sample in the second step).
41
Microeconometrics
Assumptions 17.2:
(a) (z,y3) are always observed, (y1, y2) are only observed if y3 = 1
(b) (u1,v3) are independent of z with mean 0
(c) v3 ~ N (0,1)
(d) E (u1 | v3 ) = γ 1v3
(e) E (z ' v2 ) = 0 and in zδ 2 = z1δ 21 + z 2δ 22 is δ 22 ≠ 0
Parts b, c, und d are identical to assumptions 17.1. Assumption e is new, it
corresponds to the usual assumptions needed for identification in 2SLS
42
Microeconometrics
Derivation of estimating equation
Write
(17.28)
y1 = z1δ 1 + α1 y2 + g ( z, y3 ) + e1
with g ( z, y3 ) ≡ E (u1 | z, y3 ) und e1 ≡ u1 − E (u1 | z, y3 ) . Thus E (e1 | z, y3 ) = 0
Note that cov(g,e1) = 0.
If we knew g ( z, y3 ) we estimate (17.28) with 2SLS in selected sample,
with instruments (z, g ( z,1)).
We know g ( z,1) up to some parameters:
E (u1 | z, y3 = 1) = γ 1λ ( zδ 3 )
δ3 can be estimated consistently with Probit Æ two-step prodecure
43
Microeconometrics
Step 1:
Estimate δˆ3 with Probit of y3 on z using all observations and
calculate λˆi 3 = λ ( ziδˆ3 )
Step 2:
Estimate in selected sample
(17.29)
yi1 = zi1δ 1 + α1 yi 2 + γ 1λˆi 3 + ei
with 2SLS and IV:
( zi , λˆi 3 )
This procedure applies to any kind of endogenous variable y2, including
discrete variables (because reduced form for y2 (17.26) is linear projection
without distributional assumptions)
z2 must have predictive power in regression of y2 onto z1, z2, λ (z iδˆ3 )
two exclusion restrictions needed (otherwise functional form identification)
Hypothesis of selection bias can be testet with t-value for λˆi 3 .
44
Microeconometrics
Example:
wage offer equation with education being endogenous
IV for education: mother’s and father’s education
IV for selection: number and age of children, non-labour income
45
Microeconometrics
How to do it with STATA
Possible endogeneity of education in wage equation of married women:
Instruments: education of parents and husband
use d:\stata\micro\data\mroz
probit inlf exper expersq age kidslt6 kidsge6 nwifeinc motheduc
fatheduc huseduc;
predict xb, xb;
g lambda = normden(xb)/norm(xb);
ivreg lwage exper expersq lambda (educ=motheduc fatheduc huseduc
kidslt6 kidsge6 nwifeinc ) if inlf==1;
46
Microeconometrics
6.4.3 Binary Response Models with Sample Selection
Assume that latent errors are bivariate normal and independent of regressors
y1 = 1[ x1β1 + u1 > 0]
y2 = 1[ xδ 2 + v2 > 0]
y1
is observed only when y2 = 1
Example:
y1
x is always observed.
is employment indicator, x contains a job training indicator
We can lose track of some people who are eligible to participate in program;
example of sample attrition. If attrition is systematically related to u1 , then
estimation using the selected sample leads to an inconsistent estimator of β1 .
is independent of x with a 0 mean normal
If we assume that ( u1 , v2 )
distribution and unit variances, we can apply the partial MLE using the
.
density of y1 conditional on x and y2 = 1
47
Microeconometrics
2-step procedure can also be applied:
(1) estimate δ 2 by probit of
(2) estimate
β1
y2
on x;
and ρ1 (the correlation between u1 and
along with P ( y
1
= 0 x, y2 = 1)
48
v2 )
Microeconometrics
6.5 A Tobit Selection Equation
6.5.1 Exogenous Explanatory Variables
Selection equation is a censored Tobit equation. The population model is:
y1 = x1 β1 + u1
y2 = max ( 0, xδ 2 + v2 )
where ( x, y ) always observed, but y1 is observed only when y > 0
2
Example:
2
y1 = log( wage)
and
y2 = log(hours )
Assumption 3: Type III Tobit model
(a)
(b)
(c)
(d)
always observed, but y1 observed only when y2 > 0
independent of x;
v2 ~ N ( 0,τ 22 ) here we do not have to normalize variance
E (u v ) = γ v
( x, y2 )
( u1 , v2 )
1
2
1 2
Estimate E ( y1 x, v2 , s2 ) = x1β1 + γ 1v2
49
Estimate E ( y1 x, v2 , s2 ) = x1β1 + γ 1v2
Microeconometrics
If we knew v2 we could estimate this equation.
For observations with s2=1 we can estimate v2 = y2 − xδ 2
This was not possible in Probit selection model
Procedure 3:
(a) δˆ2 is the standard Tobit estimate from the selection model using all N
observations, then compute vˆ = y − x δˆ , for those obs with y > 0
i2
i2
i 2
i2
and γˆ1 from the OLS regression of yi1 on xi1 and vˆi 2
(b) estimator βˆ1
using the selected sample yi 2 > 0
The estimators are consistent and
N
-asymptotically normal.
No instrument needed, because variation in y2 produces variation in v2
50
Microeconometrics
For partial likelihood estimation, assume that ( u1 , v2 ) jointly normal such that
Var ( u1 ) = σ 12 , Cov ( u1 , v2 ) = σ 12 and Var ( v2 ) = ι22
Density f ( y x ) for entire sample is used and the conditional density f ( y
for selected sample. The log-likelihood for observation i is:
2
1
(
)
li (θ ) = si 2 log f ( yi1 xi , yi 2 ;θ ) + log f yi 2 xi ; δ 2 ,ι22 , with si 2 = 1[ yi 2 > 0]
where
f ( yi1 xi , yi 2 ;θ ) = Normal ⎡⎣ xi1β1 + γ 1 ( yi 2 − xiδ 2 ) , η12 ( ≡ σ 12 − σ 122 / ι22 ) ⎤⎦
(
)
f yi 2 xi ; δ 2 ,ι22 standard censored Tobit density
⎛
⎧
⎛xβ
⎜ see chapter 5, f ( y xi )= ⎨1- Φ ⎜ i
⎜
⎝ σ
⎩
⎝
1[ y = 0]
⎞⎫
⎟⎬
⎠⎭
1[ y > 0]
⎧ 1 ⎡ xi β ⎤ ⎫
⎨ φ⎢
⎥⎬
⎩σ ⎣ σ ⎦ ⎭
51
⎞
⎟
⎟
⎠
x, y2 )
Microeconometrics
6.5.2 Endogenous Explanatory Variables (in Tobit model)
the model in the population is:
y1 = z1δ1 + α1 y2 + u1
y2 = zδ 2 + v2
(4)
(5)
y3 = max ( 0, zδ 3 + v3 ) (6)
Assumption 4:
always observed, ( y1 , y2 ) observed when
(a) ( z, y )
(b) ( u1 , v3 ) is independent of z
3
(c)
(d)
(e)
y3 > 0
v3 ~ N ( 0,τ 32 )
E ( u1 v3 ) = γ 1v3
E ( z 'v2 ) = 0
and writing zδ 2 = z1δ 21 + z 2δ 22 , δ 22 ≠ 0
We need only one instrument (for selection equation)
52
Microeconometrics
Write (17.3)
y1 = z1δ1 + α1 y2 + γ 1v3 + e1
Procedure 4:
(a) obtain δˆ from Tobit of
3
Obtain the Tobit residuals
e1 = u1 − E[u1 | v3 ]
where
y3
on z using all observations (eq(6)).
vˆi 3 = yi 3 − z iδˆ3 , for yi 3 > 0.
(b) using the selected subsample (for which
y1
and
estimate the equation: yi1 = z i1δ1 + α1 yi 2 + γ 1vˆi 3 +errori
by 2SLS using instruments ( z i , vˆi 3 )
53
y2
are observed),
Microeconometrics
6.6 Estimating Structural Tobit Equations with Sample Selection
Structural labor supply model involving simultaneity and sample selection
y1 ≡ log( w0 ) = z1β1 + u1
y2 ≡ h = max ( 0, z 2 β 2 + α 2 y1 + u2 )
Reduced form: enter equation 1 into equation 2.
What is different from previous analysis? Now we are interested in α 2
Assumption 5:
always observed, y1 observed when y2 > 0
(a) ( z, y2 )
(b) ( u1 , u2 )
is independent of z with 0-mean bivariate normal distribution
(c) z1 contains at least one element with non-zero coefficient not in z 2
Î i.e. we need an IV for the first equation
The assumption (c) is needed to identify α 2 , β 2 ,
whereas β1 is always identified
54
Microeconometrics
require new methods, whether or not y > 0 and u2 are uncorrelated,
because y1 is not observed when y2 = 0 . Estimation of (α 2 , β 2 ) easy to
obtained after having estimated β1 .
2
Procedure 5:
(a) use procedure 3 to obtain βˆ1
(b) obtain βˆ2 and αˆ from the Tobit in
2
55
(
(
)
yi 2 = max 0, z i 2 β 2 + α 2 z i1 βˆ1 + errori
)
Microeconometrics
6.7 Sample Selection and Attrition in Linear Panel Models
unbalanced panel (time periods for some persons are missing because of
rotating panel or attrition or incidental truncation problem
6.7.1 Fixed Effects Estimation with Unbalanced Panels
Model:
yit = xit β + ci + uit , t = 1,..., Ti
where xit is a 1xK and β a Kx1 vector. For a random draw i from the
population, let
'
si ≡ ( si1 , si 2 ,..., siT )
the Tx1 vector of selection indicators:
sit = 1 if
( xit , yit ) observed
random sample from the population: {( x , y , s ) : i = 1, 2,..., N }
i
Fixed effects estimator:
−1
⎛ −1
⎞ ⎛ −1 N T
⎞
'
ˆ
β = β + ⎜ N ∑∑ sit &&xit &&xit ⎟ ⎜ N ∑∑ sit &&xit 'uit ⎟ ,
i =1 t =1
i =1 t =1
⎝
⎠ ⎝
⎠
N
T
56
i
i
Microeconometrics
6.7 Sample Selection and Attrition in Linear Panel Data Models
6.7.1 Fixed Effects Estimation with Unbalanced Panels (continued)
Assumption 6:
(a) E ( uit xi , si , ci ) = 0,
t = 1, 2,..., T
T
(b) ∑ E ( s &&x 'x&& )
it
t =1
(c)
it
nonsingular;
it
E ( u i u i' si , xi , ci ) = σ u2 IT
Under Assumption 6, the FE on the unbalanced panel is consistent and
asymptotically normal (T fixed and large N)
⎛
⎞
A V ar N βˆ − β = σˆ u2 ⎜ ∑∑ sit &&
xit '&&
xit ⎟
⎝ i =1 t =1
⎠
(
)
N
T
−1
with
⎡N
⎤
2
σˆ u = ⎢ ∑ (Ti − 1) ⎥
⎣ i =1
⎦
57
−1 N
T
∑∑ s uˆ
i =1 t =1
2
it it
N →∞
⎯⎯⎯
→σ u2
Microeconometrics
6.7 Sample Selection and Attrition in Linear Panel Data Models
6.7.2 Testing and Correcting for Sample Selection Bias
Model: yit1 = xit1β1 + ci1 + uit1 , t = 1,..., T
selection equation: s = 1[ x ψ + v > 0] , v x N ( 0,1) → xi contains 1
(note: no fixed effect in selection equation!)
under the null of Assumption 6 (a), the inverse Mills ratio λˆit 2 should not be
significant
in the equation estimated by fixed effects. Then a valid test of the null is a t
statistic on λˆit 2 in the FE estimation on the unbalanced panel.
under Assumption 6 (c), the usual t statistic is valid.
it 2
i
t2
it 2
it 2
i
Correcting for sample selection: adding λˆit 2 to the equation and using FE
does not produce consistent estimators (if FE in selection equation).
Chamberlain‘s approach to panel data models works, but we need some
linearity assumptions.
58
Microeconometrics
Assumption 7: (a) the selection equation is given above; (b)
E ( uit1 xi , vit 2 ) = E ( uit1 vit 2 ) = ρt1vit 2 , t = 1,..., T
; and (c) E ( ci1 xi , vit 2 ) = xiπ 1 + φt1vit 2
Under Assumption 7, E ( yit1 xi , sit 2 = 1) = xit1β1 + xiπ1 + γ t1λ ( xiψ t 2 )
59
Microeconometrics
6.7 Sample Selection and Attrition in Linear Panel Data Models
6.7.2 Testing and Correcting for Sample Selection Bias (continued)
we can consistently estimate β1 by
1) estimate a probit of sit 2 on xi for each t, compute inverse Mills ratio,
λˆit 2
, all i and t;
2) run the pooled OLS regression using the selected sample of yit1
xit1 , xi , λˆit 2 , d 2t λˆit 2 ,..., dTt λˆit 2 for all sit = 1
where
d 2t ,..., dTt
are time dummies.
60
on
Microeconometrics
6.7 Sample Selection and Attrition in Linear Panel Data Models
6.7.3 Attrition
Test and correct for attrition in a linear panel data model where attrition is
assumed to be an absorbing state. Assume ( xit , yit ) observed for all i when t
= 1.
sit = 1 if
( xit , yit ) observed
To remove the unobserved effect, first differencing:
Δyit = Δxit β + Δuit , t = 2,..., T
selection equation for t > 2: sit = 1[ w itδ t + vit > 0] , vit {w it , sit −1 = 1} N ( 0,1)
Under the assumptions that xit are strictly exogenous and selection does
not depend on Δxit
once w it controlled for;
E ( Δuit Δx it , w it , vit , sit −1 = 1) = E ( Δuit vit ) = ρt vit
Then E ( Δyit Δxit , w it , sit −1 = 1) = Δxit β + ρt λ ( w itδ t ) , t=2,...,T
pooled OLS regression using the selected sample of
Δxit1 , d 2t λˆit ,..., dTt λˆit
61
Δyit
on
is consistent for
β1
Microeconometrics
and the ρt
Relaxing exogeneity of the x‘s: z is a vector of variables, redundant in the
selection equation and exogenous.
In this case, we can estimate Δy = Δx β + ρ d 2 λˆ + ... + ρ dT λˆ + error by IV using
( zit , d 2t λˆit ,..., dTt λˆit ) in the selected sample
instruments
it
it
2
it
t
it
T
t
it
it
6.7.3 Attrition (continued)
Estimate linear panel data under possible nonrandom attrition. ( x , y )
observe only
if s = 1
. Under the assumption called selection on observables
P(s =1 y ,x ,z ) = P(s =1 z )
it
it
it
it
it
it
i1
it
i1
Estimation method using the Inverse Probability Weighting (IPW): 2
steps
1.for each t, probit or logit is estimated of sit on z i1 → get the fitted
values pˆ it
2.weight the objective function by 1/ pˆ it
62
Microeconometrics
the argument of the IPW is that the probability limit of the weighted
objective function is identical to that of the unweighted function if we had
no attrition problem. Under this argument, Wooldridge (2000) shows that
the IPW produces a consistent, N - asymptotically normal estimator.
For the case where attrition is an absorbing state, the following probabilities
can be used in the IPW procedure: pˆ it ≡ πˆi 2πˆi 3 ...πˆit , where π it ≡ P ( sit = 1 z it , sit −1 = 1)
under the key assumption that
P ( sit = 1 v i1 ,..., v iT , sit −1 = 1) = P ( sit = 1 z it , sit −1 = 1) , where v it = ( w it , z it )
63
Microeconometrics
6.8 Stratified Sampling
6.8.1 Standard Stratified Sampling and Variable Probability Sampling
2 most common kinds of stratification used in social sciences:
- standard stratified sampling (SS sampling) and
- variable probability sampling (VP sampling).
SS Sampling
Population is partitioned into J groups W ,W ,...,W assumed to be non
overlapping
and exhaustive. Let w a RV representing the population of interest. For j =
1,...,J,
draw a random sample of size N j from stratum j. For each j, denote this
random
sample by {w : i = 1, 2,..., N }
The strata sample sizes are non random, thus the total sample size N is also
non
random. Observations within a stratum are iid, across strata they are not.
1
ij
j
64
2
J
Microeconometrics
VP Sampling: repeat the following steps N times
1.Draw an observation w i at random from the population.
2.if w is in stratum j, toss a coin with probability p j of turning up
hij = 1 if the coin turns up heads and 0 otherwise
heads. Let
3.keep observation i if hij = 1 ; otherwise omit it from the sample
i
65
Microeconometrics
6.8.2 Weighted Estimators to Account for Stratification
with VP sampling: define a set of binary variables that indicate whether a
draw w i is kept in the sample and if so, which stratum it falls into: r
ij
N
the weighted M-estimator:
J
θˆw = arg min ∑∑ p −j 1rij q ( w i ,θ )
θ ∈Θ
i =1 j =1
Wooldridge (1999) shows that under the same assumptions as Theorem 2 in
chapter 1, the weighted M-estimator is consistent. Asymptotic normality
follows under the same regularity conditions as in chapter 1.
with SS sampling: weights are defined by Q j = P ( w ∈W j ) (population
frequency for stratum j). Using a random sample obtained from each
stratum, we can obtain a consistent estimator as in the VP sampling with the
following weights
−1
Q / ( H ) , with H ≡ N / N
instead of p j
ji
ji
ji
j
66
Microeconometrics
6.8.3 Stratification Based on Exogenous Variables (does not matter!)
w partitioned as (x,y) where x is exogenous in the sense
θ 0 = arg min E ( q ( w i ,θ ) x )
θ ∈Θ
in the VP sampling:
the unweighted M-estimator on the stratification sample is:
N J
θˆu = arg min ∑∑ hij sij q ( w i ,θ )
θ ∈Θ
i =1 j =1
Wooldridge (1999) shows that, when stratification is based on x, the
unweighted estimator is more efficient than the weighted estimator under
the key assumption:
(
)
E ∇θ q ( w i ,θ 0 ) ∇θ q ( w i ,θ 0 ) x = σ 02 E ( ∇θ2 q ( w i ,θ 0 ) x )
'
(type of IM equality)
with SS sampling: similar conclusions obtained.
67
Microeconometrics
One useful fact is that when stratification is based on x we need not to
compute within-strata variation in the estimated score to obtain consistent
estimators for parameters that do not vary in the population.
68