Sample Selection Regression Models (Ch. 17) Until now we always assumed to have a random sample Now we cover cases where no random sample is available We look at two different cases - the sample was collected/selected according to some value of y - the sample is selected by behaviour of the population under consideration (self-selection) The assumption that a random sample from the underlying population is available is not always realistic. Selected sample: non random sample, selection mechanisms due to sample design, or to behaviour of the persons being sampled (including non response on survey questions, attrition from social programs) Microeconometrics Examples: Saving function Estimate a saving function for all families in a given country: saving = β 0 + β1 income + β 2 age + β3 married + β 4 children + u, age is the age of the household head. We have data on families whose household head is > 45 years old Æ leads to sample selection problem, because we are interested in all families and have a random sample only for a subset of the population Æ Selection on basis of x 2 Microeconometrics Examples Family wealth function Effect of pension plan on wealth accumulation Estimate effect of worker eligibility in a pension plan on family wealth wealth = β 0 + β1 plan + β 2 educ + β3 age + β 4 income + u, plan is an indicator for eligibility. (17.2) y = β 0 + β1 plan + β 2 x + u The sample only contains people with wealth less than 100'000 Æ Selection on basis of y (endogenous variable) 3 Microeconometrics Wage offer function Estimation of wage function for population in working age But wages are only observed for workers Æ y is only observable for subsample which is defined by another variable (working) Æ Self selection: decision to work depends on wage Estimate a wage offer equation for people of working age. However, data (wage) are only available for working people. Sample selection problem often called incidental truncation, because wage is missing as a result of another outcome, participation to the labour force. 4 Microeconometrics When can Sample Selection Be Ignored? Conditions under which 2SLS using the selected sample is consistent. Population represented by vector (x, y, z) x: 1 x K y: 1 x 1 z: 1 x L Population model: (17.3) y = β1 + β 2 x2 + ... + β K xK + u (17.4) E (u | z ) = 0 This is stronger than we need for 2SLS to be consistent! Special case: z = x Æ x is exogenous General treatment Æ x can be endogenous 5 Microeconometrics With a random sample (17.3) can be estimated consistently with 2SLS (if rank[E(z’x)]=K) (17.5) E ( y | x) = β1 + β 2 x2 + ... + β K xK No random sample Æ available data follow selection rule. s: binary selection indicator s = 1: observation is used s = 0: observation is not used Key assumption (17.6) E (u | z, s ) = 0 6 Microeconometrics (17.6) can follow directly from (17.4) - s is deterministic function of z Æ E (u | z, s ) = E (u | z ) . In this case selection follows a fixed rule which only depends on exogenous variables - Selection is independent of (z,u) Æ E (u | z, s) = E (u | z ) In estimating (17.3) we apply 2SLS to observations with s = 1. The observed sample is {(xi , yi , zi , si ) : i = 1,... N }. Observation i is used if si = 1. 7 Microeconometrics The 2SLS estimator with the selected sample is ⎡ ⎤ ′⎛ N N N ⎛ ⎞ ⎞ ⎛ ⎞ βˆ = ⎢⎜ N −1 ∑ si zi ' xi ⎟ ⎜ N −1 ∑ si zi ' zi ⎟ ⎜ N −1 ∑ si zi ' xi ⎟⎥ ⎢⎝ i =1 i =1 i =1 ⎠ ⎝ ⎠ ⎝ ⎠⎥ ⎣ ⎦ −1 −1 −1 ⎡ ⎤ ′ N N N ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ × ⎢⎜ N −1 ∑ si zi ' xi ⎟ ⎜ N −1 ∑ si zi ' zi ⎟ ⎜ N −1 ∑ si zi ' yi ⎟ ⎥ ⎢⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎥ i =1 i =1 i =1 ⎣ ⎦ Substituting yi = xi β + ui gives ⎡ ⎤ ′⎛ N N N ⎛ ⎞ ⎞ ⎛ ⎞ βˆ = β + ⎢⎜ N −1 ∑ si zi ' xi ⎟ ⎜ N −1 ∑ si zi ' zi ⎟ ⎜ N −1 ∑ si zi ' xi ⎟⎥ ⎢⎝ i =1 i =1 i =1 ⎠ ⎝ ⎠ ⎝ ⎠⎥ ⎣ ⎦ −1 −1 ⎡ ⎤ ′ N N N ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ × ⎢⎜ N −1 ∑ si zi ' xi ⎟ ⎜ N −1 ∑ si zi ' zi ⎟ ⎜ N −1 ∑ si zi ' ui ⎟ ⎥ ⎢⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎥ i =1 i =1 i =1 ⎣ ⎦ 8 −1 Microeconometrics By assumption E (ui | z i , si ) = 0 and so E ( si zi ' ui ) = 0 (Law of iterated expectations) Æ plim βˆ = β (by law of large numbers) Theorem 17.1 (Consistency of 2SLS under Sample Selection) In model (17.3) assume that - (17.6) E (u | z, s ) = 0 - (17.8) rank E (z ' z | s = 1) = L - (17.9) rank E (z ' x | s = 1) = K Then the 2SLS estimator using the selected sample is consistent for β and asymptotically normally distributed 9 Microeconometrics Under homoskedasticity, ( E (u 2 z , s ) = σ 2 ) −1 ' A var N βˆ − β = σ 2 ⎡ E ( sz ' x ) E ( sz ' z ) E ( sz ' x ) ⎤ ⎢⎣ ⎥⎦ How to estimate ⎛ N ⎞ ⎜ ∑ si ⎟ ⎝ i =1 ⎠ −1 N ∑ s uˆ i =1 2 i i σ2 −1 with ? p ⎯⎯ →σ 2 (mean in selected sample) Why is this consistent ? E[ su 2 ] = E[ s ] ⋅ σ 2 Hence E[ su 2 ] N ⋅ E[ su 2 ] = σ = E[ s ] N ⋅ E[ s ] 2 10 Microeconometrics Example 4: Nonrandomly Missing IQ Scores log( wage) = z1δ1 + abil + v, E ( v z1 , abil , IQ ) = 0 Assume that IQ is a valid proxy for abil (good instrument: correlated with abil and independent from e conditional on z1): abil = θ1 IQ + e, log( wage) = z1δ1 + θ1 IQ + u Under these assumptions, E ( e z1 , IQ ) = 0 u=v+e E ( u z1 , IQ ) = 0 . By Theorem 1, if we choose the sample excluding all people with IQs below a fixed value, then OLS estimation on the last equation will be consistent. (selection on exogenous variable) 11 Microeconometrics 6.2.2 Nonlinear Models 1. If E ( y x , s ) = E ( y x ) , then selection is ignorable and NLS on the selected sample is consistent: Why? min N β −1 N ∑ si ⎡⎣ yi − m ( xi , β )⎦⎤ i =1 We use that y − m ( x, β ) = y − m ( x, β0 ) + m ( x, β0 ) − m ( x, β ) and that E ⎡⎣ y − m ( x, β0 ) | x, s ⎤⎦ = 0 E[ s ⋅ ( y − m ( x, β )) 2 ] = E[ s ⋅ E[( y − m ( x, β ))2 | x, s]] E[ s ⋅ u 2 | x, s]] + E[ s ⋅ (m ( x, β ) − m ( x, β 0 ))2 | x, s]] because in minimizes { estimated by OLS of y on x, using the selected sample. β0 E ⎡⎣ y x ⎤⎦ = m ( x, β 0 ) E s ⎡⎣ y − m ( x, β ) ⎦⎤ 12 2 } 2 Microeconometrics 2. General conditional ML setup If distribution D ( y x , s ) = D ( y x ) , then selection again is ignorable. This assumption holds if s in a nonrandom function of x or if s is independent of (x, y). In this case, MLE on the selected sample is consistent: because for each x , θ 0 maximizes Now write E ⎡⎣l ( y, x,θ ) x ⎤⎦ { } over max N −1 θ ∑ s l ( y , x ,θ ) i =1 Θ { } E ⎡⎣ sl ( y, x,θ ) ⎤⎦ = E sE ⎡⎣l ( y, x,θ ) x , s ⎤⎦ = E sE ⎡⎣l ( y , x,θ ) x ⎤⎦ Because, for every x, E ⎡⎣l ( y, x,θ ) x ⎤⎦ it must also be the case that is maximized at θ 0 { } is maximized at θ E sE ⎡⎣l ( y , x,θ ) x ⎤⎦ 13 N 0 i i i Microeconometrics θ 0 maximizes { } E sE ⎡⎣l ( y , x,θ ) x ⎤⎦ 14 Microeconometrics Truncated Regression (Selection on Response Variable) (xi , yi ) random draw from population: estimate E ( yi | xi ) in this population. But we only observe sample selected on value of y Examples: wealth, wage in specific samples, ... yi is continuous variable The selection rule is si = 1[a1 < yi < a2 ] were a1 and a2 are known. We observe (xi , yi ) if a1 < yi < a2 , otherwise we observe neither y nor x. In most cases we want to estimate E ( yi | xi ) = xi β . 15 Microeconometrics Æ we need specification of full conditional distribution of yi | xi Specify the conditional density of yi | x i f (⋅ | xi ; β , γ ) , where γ are additional parameters, e.g. variance cdf of yi | x i is given by F (⋅ | xi ; β , γ ) In estimating E ( yi | xi ) we must condition on a1 < yi < a2 , i.e. si = 1 Selection rule indicates that if yi falls in ( a1 , a2 ) , then both yi and x i are observed; if yi is outside this interval, then we do not observe yi and xi . In estimation, use the density of yi conditional on xi and the fact that we observe ( yi , xi ) The cdf of yi | x i , si = 1 is P ( yi ≤ c, si = 1| xi ) P( yi ≤ c | xi , si = 1) = P( si = 1 | xi ) 16 Microeconometrics P( si = 1| xi ) = P(a1 < y < a2 | xi ) = F (a2 | xi ; β , γ ) − F (a1 | xi ; β , γ ) If y is truncated from one side only then either: a1 = -∞ or a2 = ∞ To obtain the numerator above we write P( yi ≤ c, si = 1 | xi ) = P (a1 < y < c | xi ) = F (c | xi ; β , γ ) − F (a1 | xi ; β , γ ) Plug this into the above equation and take the derivative with respect to c we get the density of yi given (xi,si) (17.14) f ( c | xi ; β , γ ) p(c | xi , si = 1) = F (a2 | xi ; β , γ ) − F (a1 | xi ; β , γ ) (17.14) is valid irrespective of specific distributional assumption. Usually we assume a normal distribution for f. Assume further that E ( y | x) = xβ 17 for a1 < c < a2 Microeconometrics Then we have 1 ⎛ yi − xi β ⎞ φ⎜ σ ⎝ σ ⎟⎠ f ( yi | x, si = 1) = ⎛ a2 − xi β ⎞ ⎛ a1 − xi β ⎞ Φ⎜ ⎟ − Φ⎜ ⎟ σ σ ⎝ ⎠ ⎝ ⎠ In many cases a1 = 0 und a2 = ∞ . The CMLEs of β and γ using the selected sample are efficient in the class of estimators not using information about the distribution of x. 18 Microeconometrics 6.3 Selection on Basis of the Response Variable: Truncated Regression In most applications of truncated samples, the population conditional distribution is assumed to be N ( xβ ,σ ) : truncated Tobit model or truncated normal regression model. 2 The truncated Tobit model is related to the censored Tobit model for datacensoring applications (see Chapter 5). The key difference between censored and truncated regressions is that in censored regression, we observe x for all people even if y is not known. heteroskedasticity or nonnormality in truncated regression results in inconsistent estimators of β. 19 Microeconometrics Example: set obs 10000 g x = uniform() g u = invnorm(uniform()) replace u = 2*u g y = 1 + x + u drop if y <= 0 reg y x outreg using d:\stata\micro\out\trunc_sim, replace truncreg y x, ll(0) outreg using d:\stata\micro\out\trunc_sim, append OLS 0.536 (9.11)** 2.011 (57.84)** TRUNCREG x 0.951 (9.00)** Constant 1.047 (13.24)** 1.97 (64.5)** σ Observations 7708 7708 Absolute value of t statistics in parentheses * significant at 5%; ** significant at 1% 20 Microeconometrics 17.4 Probit Selection Sample selection is not result of sample design but due to decisions made by members of the population (self selection) Exogenous explanatory variable Classic example: Labour force participation and wages We want to know: E ( wi | xi ) for a person randomly drawn from the population (w : wage) w is only observed for working people. 21 Microeconometrics Model of labour supply: (17.15) max U i ( wi h + ai , h) wrt h h: hours of work per week 0 ≤ h ≤ 168 a non-labour income si (h) ≡ U i ( wi h + ai , h) and h < 168 Possible solutions: h = 0 or 0 < h < 168 If d si /d h ≤ 0 at h = 0 ⇒ h = 0 22 Microeconometrics This implies that h = 0 if (17.16) wi ≤ −muih (ai ,0) / muiq (ai ,0) where muh is marginal disutility of work and muq is marginal utilty of income. The righthand side of (17.16) is called the reservation wage wr. Parametric assumptions: (17.17) wi = exp(xi1 β1 + ui1 ) wir = exp(xi 2 β 2 + γ 2 ai + ui 2 ) (u11 , ui2) independent of (xi1 , xi2 , ai). xi1 contains productivity characteristics and xi2 contains charactistics that determine marginal utility of leisure and income (there may be an overlap) (17.18) log wi = xi1 β1 + ui1 23 Microeconometrics But wage is only observed if w > wr, i.e. log wi − log wir = xi1 β1 − xi 2 β 2 − γ 2 ai + ui1 − ui 2 ≡ xiδ 2 + v2 > 0 Problem: wr is not observed and depends on xi2 and ui2 , Æ wr is unknown constant Æ we need another estimation procedure Notation: drop subscript i, y1 ≡ log w and y2 ist binary indicator (17.19) y1 = x1 β1 + u1 (17.20) y2 = 1[xδ 2 + v2 > 0] (17.20) is a probit if v2 is normally distributed 24 Microeconometrics Assumptions 17.1: (a) (x,y2) are always observed, y1 only observed if y2 =1 (b) (u1,v2) is independent of x with zero mean (c) v2 ~ N (0,1) and (d) E (u1 | v2 ) = γ 1v2 (a) describes the selection process; (b) is strong exogeneity assumption; (c) necessary to derive a conditional expectation given the selected sample; (d) requires linearity of regression of u on v. (d) always holds if (u1,v2) is bivariate normal (but it is not necessary to assume that u is normally distributed). 25 Microeconometrics Estimation of Selection Model Let ( y1 , y2 , x, u1 , v2 ) denote a random draw from the population. Given the selection rule we can hope to estimate E ( yi | x, y2 = 1) and P ( y2 = 1| x) How does E ( yi | x, y2 = 1) depend on β1? First, note that (17.21) E ( yi | x, v2 ) = x1 β1 + E (u1 | x, v2 ) = x1 β1 + E (u1 | v2 ) = x1 β1 + γ 1v2 where the second equality follows because (u1,v2) is independent of x If γ1 = 0 Æ no selection problem! 26 Microeconometrics What if γ1 ≠ 0? Using iterated expectations on (17.21) gives E ( yi | x, y2 ) = x1 β1 + γ 1 E ( v2 | x, y2 ) = x1 β1 + γ 1h(x, y2 ) where h(x, y2 ) = E ( v2 | x, y2 ) If we knew h(x, y2 ), we could estimate β1 und γ1 from the regression of y1 on x and h(x, y2 ) (in the selected sample). In the selected sample y2 = 1 Æ we only have to find h(x,1) . h(x,1) = E ( v2 | v2 > −xδ 2 ) = λ (xδ 2 ) , where λ (⋅) = 27 φ (⋅) Φ(⋅) Microeconometrics This follows from a special property of the normal distribution : If z ~ N (0,1) then E ( z | z > c) = The term λ (⋅) = φ (⋅) Φ(⋅) φ (c) 1 − Φ(c) is called the inverse of Mill’s ratio This implies (17.22) E ( y1 | x, y2 = 1) = x1 β1 + γ 1λ (xδ 2 ) From (17.22) it is obvious that OLS of y on x1 in the selected sample omits the term λ (xδ 2 ) Æ omitted variable bias 28 Microeconometrics (17.22) also shows a way to consistently estimate β1. Heckman (1979) has shown that β1 und γ1 can consistently be estimated in the selected sample by regressing y on x1 and λ (xδ 2 ) . But δ2 is unknown and must be estimated in a first step (using Probit). 29 Microeconometrics Heckman Estimator Step 1: Estimate Probit model (17.23) P( y2 = 1| x) = Φ( xiδ 2 ) using all observations. Obtain λˆi 2 ≡ λ (xiδˆ2 ) Step 2: Estimate βˆ1 und γˆ1 using OLS in the selected sample (17.24) yi1 = xi1 β1 + γ 1λˆi 2 + ui This estimator is consistent and asymptotically normally distributed 30 Microeconometrics Simple test for selection bias: under H0 (no selection bias) in (17.24) γ1 = 0 Æ t – test for γ1. IMPORTANT: this test is only valid if the model is correctly specified (distributional assumptions) If γ1 ≠ 0 the standard errors of β1 must be corrected - for heteroskedasticity - because δ2 has been estimated in the first step Stata does this for you if you use the command heckman 31 Microeconometrics Theoretically, it is not necessary that x1 is a strict subset of x Æ β1 is identified if x = x1 (because λ is nonlinear function of x) However, in practice λ is often almost a linear function of x Æ severe multicollinearity Æ very imprecise estimates Î Strong recommendation: you should have at least one element in x that is not in x1 (exclusion restriction) 32 Microeconometrics 0 1 lambda 2 3 Relation between xβ and λ -4 -2 0 xb 2 33 4 Microeconometrics use d:\stata\micro\data\mroz; reg lwage educ exper expersq; heckman lwage educ exper expersq, select (inlf = educ exper expersq age kidslt6 kidsge6 nwifeinc) twostep; heckman lwage educ exper expersq , select (inlf = educ exper expersq ) twostep; Table 17.1 wage equation OLS educ 0.107 (7.60)** 0.042 (3.15)** -0.001 (2.06)* exper expersq mills:lambda Constant -0.522 (2.63)** Heckman 2 Heckman 2 step step no excl. restr. 0.109 0.093 (7.03)** (1.82) 0.044 0.021 (2.70)** (0.28) -0.001 -0.000 (1.96) (0.27) 0.032 -0.270 (0.24) (0.28) -0.578 -0.010 (1.90) (0.01) 34 Microeconometrics selection equation inlf:educ inlf:exper inlf:expersq inlf:age inlf:kidslt6 inlf:kidsge6 inlf:nwifeinc inlf:Constant lambda sigma 0.131 (5.18)** 0.123 (6.59)** -0.002 (3.15)** -0.053 (6.23)** -0.868 (7.33)** 0.036 (0.83) -0.012 (2.48)* 0.270 (0.53) .032 (0.24) .663 0.097 (4.38)** 0.127 (7.12)** -0.002 (4.12)** -1.925 (6.67)** -.270 (-0.28) .691 Observations 428 753 753 R-squared 0.16 Absolute value of t statistics in parentheses * significant at 5%; ** significant at 1% 35 Microeconometrics Data generation for selection problem set obs 10000 g x = uniform() g z = uniform() matrix c = (4, 1 \ 1, 1) /*Kovarianzmatrix u1,v2*/ drawnorm u1 v2, n(10000) cov(c) /*korrelierte Störterme */ g y1 = 1 + x + u1 g y2star = 0.5 + 0.5*x + 0.5*z + v2 g y2 = y2star>0.6 replace y1 = . if y2==0 reg y1 x heckman y1 x, select (y2= x z) twostep heckman y1 x, select (y2= x z) heckman y1 x, select (y2= x) twostep heckman y1 x z, select (y2= x z) twostep 36 Microeconometrics Simulation results OLS x 0.758 (9.48)** Heckman 2 step Heckman ML 2 step no excl. restr. Structural equation 1.206 1.115 -0.258 (8.96)** (11.94)** (0.08) z lambda 1.611 -3.585 (4.44)** (0.32) Constant 1.683 0.557 0.793 4.216 (35.23)** (2.15)* (7.54)** (0.53) Selection equation x 0.535 0.525 0.526 (11.83)** (11.65)** (11.67)** z 0.457 0.474 (10.12)** (11.23)** Constant -0.093 -0.097 0.137 (2.73)** (2.93)** (5.36)** rho 0.712 (8.93)** Absolute value of t statistics in parentheses * significant at 5%; ** significant at 1% 37 2 Step no excl. restr. -1.432 (0.48) -2.264 (0.88) -7.603 (0.73) 8.219 (0.95) 0.535 (11.83)** 0.457 (10.12)** -0.093 (2.73)** Microeconometrics Predictions after estimation of selection models Often selection models are used to predict the dependent variable for the observations not in the selected subsample Example: expected wage of nonworkers Correct prediction: E ( yi1 | xi ) = xi βˆ and NOT E ( yi | xi , yi 2 = 1) = xi βˆ + γ 1λˆi 2 ≠ E ( yi1 | xi ) Stata heckman lnlohn .../*selection model for ln(wage)*/ predict lnlohn_pred, e(.,.) /* prediction of ln(wage)*/ 38 Microeconometrics Joint ML estimator If (c) and (d) in Assumption 1 are replaced by stronger assumption that ( u1 , v2 ) 2 is bivariate normal with mean 0, Var ( u1 ) = σ 1 , Cov ( u1 , v2 ) = σ 12 , and Var ( v2 ) = 1 then partial likelihood estimation can be used. Partial MLE will be more efficient than the 2-step procedure (Partial MLE using the density of y1 when y2 = 1 ) f ( y1 y2 = 1, x ) = P ( y2 = 1 y1 , x ) f ( y1 x ) / P ( y2 = 1 x ) , with { P ( y2 = 1 y1 , x ) = Φ ⎡⎣ xδ 2 + σ 12σ 1−2 ( y1 − x1β1 ) ⎤⎦ (1 − σ 122 σ 1−2 ) −1/ 2 } the log-likelihood for observation i is: li (θ ) = (1 − yi 2 ) log ⎡⎣1 − Φ ( xiδ 2 ) ⎤⎦ + ( { yi 2 log Φ ⎡⎣ xiδ 2 + σ 12σ 1−2 ( yi1 − xi1β1 ) ⎤⎦ (1 − σ 122 σ 1−2 ) −1/ 2 39 } + logφ ⎡⎣( y i1 − xi1β1 ) / σ 1 ⎤⎦ − log (σ 1 ) ) Microeconometrics Endogenous Explanatory Variables One element of x1 correlated with u1 (17.25) y1 = z1δ 1 + α1 y2 + u1 (17.26) y2 = zδ 2 + v2 (17.27) y3 = 1[ zδ 3 + v3 > 0] (17.25) is the structural equation to be estimated, (17.26) is linear projection of the endogenous variable y2 (i.e. not structural) (17.27) is the selection equation. The correlations between u1, v2, v3 are unrestricted. 40 Microeconometrics 3 interesting cases: - y2 is always observed, but endogenous in (17.25) (e.g. education in wage equation) - y2 is as well only observed if y3 = 1 . In this case y2 can be exogenous in the population, but due to selection it becomes endogenous. - y1 is always observed, but y2 only sometimes If y1 and y2 were always observed along with z, we would estimate (17.25) with 2SLS if y2 is endogenous. In case of selection 2SLS with the inverse of Mill’s ratio added to the regressors is consistent (only using the selcted sample in the second step). 41 Microeconometrics Assumptions 17.2: (a) (z,y3) are always observed, (y1, y2) are only observed if y3 = 1 (b) (u1,v3) are independent of z with mean 0 (c) v3 ~ N (0,1) (d) E (u1 | v3 ) = γ 1v3 (e) E (z ' v2 ) = 0 and in zδ 2 = z1δ 21 + z 2δ 22 is δ 22 ≠ 0 Parts b, c, und d are identical to assumptions 17.1. Assumption e is new, it corresponds to the usual assumptions needed for identification in 2SLS 42 Microeconometrics Derivation of estimating equation Write (17.28) y1 = z1δ 1 + α1 y2 + g ( z, y3 ) + e1 with g ( z, y3 ) ≡ E (u1 | z, y3 ) und e1 ≡ u1 − E (u1 | z, y3 ) . Thus E (e1 | z, y3 ) = 0 Note that cov(g,e1) = 0. If we knew g ( z, y3 ) we estimate (17.28) with 2SLS in selected sample, with instruments (z, g ( z,1)). We know g ( z,1) up to some parameters: E (u1 | z, y3 = 1) = γ 1λ ( zδ 3 ) δ3 can be estimated consistently with Probit Æ two-step prodecure 43 Microeconometrics Step 1: Estimate δˆ3 with Probit of y3 on z using all observations and calculate λˆi 3 = λ ( ziδˆ3 ) Step 2: Estimate in selected sample (17.29) yi1 = zi1δ 1 + α1 yi 2 + γ 1λˆi 3 + ei with 2SLS and IV: ( zi , λˆi 3 ) This procedure applies to any kind of endogenous variable y2, including discrete variables (because reduced form for y2 (17.26) is linear projection without distributional assumptions) z2 must have predictive power in regression of y2 onto z1, z2, λ (z iδˆ3 ) two exclusion restrictions needed (otherwise functional form identification) Hypothesis of selection bias can be testet with t-value for λˆi 3 . 44 Microeconometrics Example: wage offer equation with education being endogenous IV for education: mother’s and father’s education IV for selection: number and age of children, non-labour income 45 Microeconometrics How to do it with STATA Possible endogeneity of education in wage equation of married women: Instruments: education of parents and husband use d:\stata\micro\data\mroz probit inlf exper expersq age kidslt6 kidsge6 nwifeinc motheduc fatheduc huseduc; predict xb, xb; g lambda = normden(xb)/norm(xb); ivreg lwage exper expersq lambda (educ=motheduc fatheduc huseduc kidslt6 kidsge6 nwifeinc ) if inlf==1; 46 Microeconometrics 6.4.3 Binary Response Models with Sample Selection Assume that latent errors are bivariate normal and independent of regressors y1 = 1[ x1β1 + u1 > 0] y2 = 1[ xδ 2 + v2 > 0] y1 is observed only when y2 = 1 Example: y1 x is always observed. is employment indicator, x contains a job training indicator We can lose track of some people who are eligible to participate in program; example of sample attrition. If attrition is systematically related to u1 , then estimation using the selected sample leads to an inconsistent estimator of β1 . is independent of x with a 0 mean normal If we assume that ( u1 , v2 ) distribution and unit variances, we can apply the partial MLE using the . density of y1 conditional on x and y2 = 1 47 Microeconometrics 2-step procedure can also be applied: (1) estimate δ 2 by probit of (2) estimate β1 y2 on x; and ρ1 (the correlation between u1 and along with P ( y 1 = 0 x, y2 = 1) 48 v2 ) Microeconometrics 6.5 A Tobit Selection Equation 6.5.1 Exogenous Explanatory Variables Selection equation is a censored Tobit equation. The population model is: y1 = x1 β1 + u1 y2 = max ( 0, xδ 2 + v2 ) where ( x, y ) always observed, but y1 is observed only when y > 0 2 Example: 2 y1 = log( wage) and y2 = log(hours ) Assumption 3: Type III Tobit model (a) (b) (c) (d) always observed, but y1 observed only when y2 > 0 independent of x; v2 ~ N ( 0,τ 22 ) here we do not have to normalize variance E (u v ) = γ v ( x, y2 ) ( u1 , v2 ) 1 2 1 2 Estimate E ( y1 x, v2 , s2 ) = x1β1 + γ 1v2 49 Estimate E ( y1 x, v2 , s2 ) = x1β1 + γ 1v2 Microeconometrics If we knew v2 we could estimate this equation. For observations with s2=1 we can estimate v2 = y2 − xδ 2 This was not possible in Probit selection model Procedure 3: (a) δˆ2 is the standard Tobit estimate from the selection model using all N observations, then compute vˆ = y − x δˆ , for those obs with y > 0 i2 i2 i 2 i2 and γˆ1 from the OLS regression of yi1 on xi1 and vˆi 2 (b) estimator βˆ1 using the selected sample yi 2 > 0 The estimators are consistent and N -asymptotically normal. No instrument needed, because variation in y2 produces variation in v2 50 Microeconometrics For partial likelihood estimation, assume that ( u1 , v2 ) jointly normal such that Var ( u1 ) = σ 12 , Cov ( u1 , v2 ) = σ 12 and Var ( v2 ) = ι22 Density f ( y x ) for entire sample is used and the conditional density f ( y for selected sample. The log-likelihood for observation i is: 2 1 ( ) li (θ ) = si 2 log f ( yi1 xi , yi 2 ;θ ) + log f yi 2 xi ; δ 2 ,ι22 , with si 2 = 1[ yi 2 > 0] where f ( yi1 xi , yi 2 ;θ ) = Normal ⎡⎣ xi1β1 + γ 1 ( yi 2 − xiδ 2 ) , η12 ( ≡ σ 12 − σ 122 / ι22 ) ⎤⎦ ( ) f yi 2 xi ; δ 2 ,ι22 standard censored Tobit density ⎛ ⎧ ⎛xβ ⎜ see chapter 5, f ( y xi )= ⎨1- Φ ⎜ i ⎜ ⎝ σ ⎩ ⎝ 1[ y = 0] ⎞⎫ ⎟⎬ ⎠⎭ 1[ y > 0] ⎧ 1 ⎡ xi β ⎤ ⎫ ⎨ φ⎢ ⎥⎬ ⎩σ ⎣ σ ⎦ ⎭ 51 ⎞ ⎟ ⎟ ⎠ x, y2 ) Microeconometrics 6.5.2 Endogenous Explanatory Variables (in Tobit model) the model in the population is: y1 = z1δ1 + α1 y2 + u1 y2 = zδ 2 + v2 (4) (5) y3 = max ( 0, zδ 3 + v3 ) (6) Assumption 4: always observed, ( y1 , y2 ) observed when (a) ( z, y ) (b) ( u1 , v3 ) is independent of z 3 (c) (d) (e) y3 > 0 v3 ~ N ( 0,τ 32 ) E ( u1 v3 ) = γ 1v3 E ( z 'v2 ) = 0 and writing zδ 2 = z1δ 21 + z 2δ 22 , δ 22 ≠ 0 We need only one instrument (for selection equation) 52 Microeconometrics Write (17.3) y1 = z1δ1 + α1 y2 + γ 1v3 + e1 Procedure 4: (a) obtain δˆ from Tobit of 3 Obtain the Tobit residuals e1 = u1 − E[u1 | v3 ] where y3 on z using all observations (eq(6)). vˆi 3 = yi 3 − z iδˆ3 , for yi 3 > 0. (b) using the selected subsample (for which y1 and estimate the equation: yi1 = z i1δ1 + α1 yi 2 + γ 1vˆi 3 +errori by 2SLS using instruments ( z i , vˆi 3 ) 53 y2 are observed), Microeconometrics 6.6 Estimating Structural Tobit Equations with Sample Selection Structural labor supply model involving simultaneity and sample selection y1 ≡ log( w0 ) = z1β1 + u1 y2 ≡ h = max ( 0, z 2 β 2 + α 2 y1 + u2 ) Reduced form: enter equation 1 into equation 2. What is different from previous analysis? Now we are interested in α 2 Assumption 5: always observed, y1 observed when y2 > 0 (a) ( z, y2 ) (b) ( u1 , u2 ) is independent of z with 0-mean bivariate normal distribution (c) z1 contains at least one element with non-zero coefficient not in z 2 Î i.e. we need an IV for the first equation The assumption (c) is needed to identify α 2 , β 2 , whereas β1 is always identified 54 Microeconometrics require new methods, whether or not y > 0 and u2 are uncorrelated, because y1 is not observed when y2 = 0 . Estimation of (α 2 , β 2 ) easy to obtained after having estimated β1 . 2 Procedure 5: (a) use procedure 3 to obtain βˆ1 (b) obtain βˆ2 and αˆ from the Tobit in 2 55 ( ( ) yi 2 = max 0, z i 2 β 2 + α 2 z i1 βˆ1 + errori ) Microeconometrics 6.7 Sample Selection and Attrition in Linear Panel Models unbalanced panel (time periods for some persons are missing because of rotating panel or attrition or incidental truncation problem 6.7.1 Fixed Effects Estimation with Unbalanced Panels Model: yit = xit β + ci + uit , t = 1,..., Ti where xit is a 1xK and β a Kx1 vector. For a random draw i from the population, let ' si ≡ ( si1 , si 2 ,..., siT ) the Tx1 vector of selection indicators: sit = 1 if ( xit , yit ) observed random sample from the population: {( x , y , s ) : i = 1, 2,..., N } i Fixed effects estimator: −1 ⎛ −1 ⎞ ⎛ −1 N T ⎞ ' ˆ β = β + ⎜ N ∑∑ sit &&xit &&xit ⎟ ⎜ N ∑∑ sit &&xit 'uit ⎟ , i =1 t =1 i =1 t =1 ⎝ ⎠ ⎝ ⎠ N T 56 i i Microeconometrics 6.7 Sample Selection and Attrition in Linear Panel Data Models 6.7.1 Fixed Effects Estimation with Unbalanced Panels (continued) Assumption 6: (a) E ( uit xi , si , ci ) = 0, t = 1, 2,..., T T (b) ∑ E ( s &&x 'x&& ) it t =1 (c) it nonsingular; it E ( u i u i' si , xi , ci ) = σ u2 IT Under Assumption 6, the FE on the unbalanced panel is consistent and asymptotically normal (T fixed and large N) ⎛ ⎞ A V ar N βˆ − β = σˆ u2 ⎜ ∑∑ sit && xit '&& xit ⎟ ⎝ i =1 t =1 ⎠ ( ) N T −1 with ⎡N ⎤ 2 σˆ u = ⎢ ∑ (Ti − 1) ⎥ ⎣ i =1 ⎦ 57 −1 N T ∑∑ s uˆ i =1 t =1 2 it it N →∞ ⎯⎯⎯ →σ u2 Microeconometrics 6.7 Sample Selection and Attrition in Linear Panel Data Models 6.7.2 Testing and Correcting for Sample Selection Bias Model: yit1 = xit1β1 + ci1 + uit1 , t = 1,..., T selection equation: s = 1[ x ψ + v > 0] , v x N ( 0,1) → xi contains 1 (note: no fixed effect in selection equation!) under the null of Assumption 6 (a), the inverse Mills ratio λˆit 2 should not be significant in the equation estimated by fixed effects. Then a valid test of the null is a t statistic on λˆit 2 in the FE estimation on the unbalanced panel. under Assumption 6 (c), the usual t statistic is valid. it 2 i t2 it 2 it 2 i Correcting for sample selection: adding λˆit 2 to the equation and using FE does not produce consistent estimators (if FE in selection equation). Chamberlain‘s approach to panel data models works, but we need some linearity assumptions. 58 Microeconometrics Assumption 7: (a) the selection equation is given above; (b) E ( uit1 xi , vit 2 ) = E ( uit1 vit 2 ) = ρt1vit 2 , t = 1,..., T ; and (c) E ( ci1 xi , vit 2 ) = xiπ 1 + φt1vit 2 Under Assumption 7, E ( yit1 xi , sit 2 = 1) = xit1β1 + xiπ1 + γ t1λ ( xiψ t 2 ) 59 Microeconometrics 6.7 Sample Selection and Attrition in Linear Panel Data Models 6.7.2 Testing and Correcting for Sample Selection Bias (continued) we can consistently estimate β1 by 1) estimate a probit of sit 2 on xi for each t, compute inverse Mills ratio, λˆit 2 , all i and t; 2) run the pooled OLS regression using the selected sample of yit1 xit1 , xi , λˆit 2 , d 2t λˆit 2 ,..., dTt λˆit 2 for all sit = 1 where d 2t ,..., dTt are time dummies. 60 on Microeconometrics 6.7 Sample Selection and Attrition in Linear Panel Data Models 6.7.3 Attrition Test and correct for attrition in a linear panel data model where attrition is assumed to be an absorbing state. Assume ( xit , yit ) observed for all i when t = 1. sit = 1 if ( xit , yit ) observed To remove the unobserved effect, first differencing: Δyit = Δxit β + Δuit , t = 2,..., T selection equation for t > 2: sit = 1[ w itδ t + vit > 0] , vit {w it , sit −1 = 1} N ( 0,1) Under the assumptions that xit are strictly exogenous and selection does not depend on Δxit once w it controlled for; E ( Δuit Δx it , w it , vit , sit −1 = 1) = E ( Δuit vit ) = ρt vit Then E ( Δyit Δxit , w it , sit −1 = 1) = Δxit β + ρt λ ( w itδ t ) , t=2,...,T pooled OLS regression using the selected sample of Δxit1 , d 2t λˆit ,..., dTt λˆit 61 Δyit on is consistent for β1 Microeconometrics and the ρt Relaxing exogeneity of the x‘s: z is a vector of variables, redundant in the selection equation and exogenous. In this case, we can estimate Δy = Δx β + ρ d 2 λˆ + ... + ρ dT λˆ + error by IV using ( zit , d 2t λˆit ,..., dTt λˆit ) in the selected sample instruments it it 2 it t it T t it it 6.7.3 Attrition (continued) Estimate linear panel data under possible nonrandom attrition. ( x , y ) observe only if s = 1 . Under the assumption called selection on observables P(s =1 y ,x ,z ) = P(s =1 z ) it it it it it it i1 it i1 Estimation method using the Inverse Probability Weighting (IPW): 2 steps 1.for each t, probit or logit is estimated of sit on z i1 → get the fitted values pˆ it 2.weight the objective function by 1/ pˆ it 62 Microeconometrics the argument of the IPW is that the probability limit of the weighted objective function is identical to that of the unweighted function if we had no attrition problem. Under this argument, Wooldridge (2000) shows that the IPW produces a consistent, N - asymptotically normal estimator. For the case where attrition is an absorbing state, the following probabilities can be used in the IPW procedure: pˆ it ≡ πˆi 2πˆi 3 ...πˆit , where π it ≡ P ( sit = 1 z it , sit −1 = 1) under the key assumption that P ( sit = 1 v i1 ,..., v iT , sit −1 = 1) = P ( sit = 1 z it , sit −1 = 1) , where v it = ( w it , z it ) 63 Microeconometrics 6.8 Stratified Sampling 6.8.1 Standard Stratified Sampling and Variable Probability Sampling 2 most common kinds of stratification used in social sciences: - standard stratified sampling (SS sampling) and - variable probability sampling (VP sampling). SS Sampling Population is partitioned into J groups W ,W ,...,W assumed to be non overlapping and exhaustive. Let w a RV representing the population of interest. For j = 1,...,J, draw a random sample of size N j from stratum j. For each j, denote this random sample by {w : i = 1, 2,..., N } The strata sample sizes are non random, thus the total sample size N is also non random. Observations within a stratum are iid, across strata they are not. 1 ij j 64 2 J Microeconometrics VP Sampling: repeat the following steps N times 1.Draw an observation w i at random from the population. 2.if w is in stratum j, toss a coin with probability p j of turning up hij = 1 if the coin turns up heads and 0 otherwise heads. Let 3.keep observation i if hij = 1 ; otherwise omit it from the sample i 65 Microeconometrics 6.8.2 Weighted Estimators to Account for Stratification with VP sampling: define a set of binary variables that indicate whether a draw w i is kept in the sample and if so, which stratum it falls into: r ij N the weighted M-estimator: J θˆw = arg min ∑∑ p −j 1rij q ( w i ,θ ) θ ∈Θ i =1 j =1 Wooldridge (1999) shows that under the same assumptions as Theorem 2 in chapter 1, the weighted M-estimator is consistent. Asymptotic normality follows under the same regularity conditions as in chapter 1. with SS sampling: weights are defined by Q j = P ( w ∈W j ) (population frequency for stratum j). Using a random sample obtained from each stratum, we can obtain a consistent estimator as in the VP sampling with the following weights −1 Q / ( H ) , with H ≡ N / N instead of p j ji ji ji j 66 Microeconometrics 6.8.3 Stratification Based on Exogenous Variables (does not matter!) w partitioned as (x,y) where x is exogenous in the sense θ 0 = arg min E ( q ( w i ,θ ) x ) θ ∈Θ in the VP sampling: the unweighted M-estimator on the stratification sample is: N J θˆu = arg min ∑∑ hij sij q ( w i ,θ ) θ ∈Θ i =1 j =1 Wooldridge (1999) shows that, when stratification is based on x, the unweighted estimator is more efficient than the weighted estimator under the key assumption: ( ) E ∇θ q ( w i ,θ 0 ) ∇θ q ( w i ,θ 0 ) x = σ 02 E ( ∇θ2 q ( w i ,θ 0 ) x ) ' (type of IM equality) with SS sampling: similar conclusions obtained. 67 Microeconometrics One useful fact is that when stratification is based on x we need not to compute within-strata variation in the estimated score to obtain consistent estimators for parameters that do not vary in the population. 68
© Copyright 2024