ECO 2901 EMPIRICAL INDUSTRIAL ORGANIZATION Lecture 3: Intro to Dynamic Models In Empirical IO (II)

Likelihood Under CI-1 to CI-3, we have that: h i n T J log (Pr(Data)) = ∑i =1 ∑t =1 ∑j =0 1fait = j g log P (j j xit ) h h ∑ i =1 ∑ t =1 n ∑ T n i =1 1 log fx (xi ,t +1 jxit , ait ) log p (xi 1 ) i i The probabilities P (j j xit ) for j = 0, 1, ..., J are denoted Conditional Choice Probabilities: P (j j x ) Pr (ait = j j xit = x ) = Z 1 fa (x, ε) = j g fε (d ε) Components of the Likelihood function Each of the three components of the full likelihood function can be considered as likelihood functions for di¤erent components of the data: l (θ ) = lChoice (θ ) + lTrans (θ f ) + lInitial (θ ) = ∑i =1 ∑t =1 ln P (ait jxit ; θ ) n T 1 + ∑i =1 ∑t =1 ln fx (xi ,t +1 jxit , ait ; θ f ) n + ∑i =1 log p (xi 1 jθ ) n T In stationary models, obtaining p (xi 1 jθ ) requires the computation of the ergodic distribution of the state variables (including endogenous). Most applications do not do this, and consider the conditional likelihood function: l (θ jx1 ) = lChoice (θ ) + lTrans (θ f )

Conditional ML Estimation Typically, the parameters θ f can be identifed/estimated from the transition data, i.e., from the likelihood lTrans (θ f ). Then, a common estimation approach is the following. In a …rst step, estimate θ f as: b θ f = arg max lTrans (θ f ) θf Given b θ f , in a secind step, estimate (θ U , θ ε , β) as: (bθ U , bθ ε , b β) = arg max θ U ,θ ε ,β lChoice (θ U , θ ε , β; b θf ) Unless stated otherwise, this is the approach that we will always consider.

Estimation and Solution of the DP Problem We have that: lChoice (θ ) = where: P (j j x, θ ) h ∑i =1 ∑t =1 ∑j =0 1fait = j g n T J Pr (ait = j j xit = x, θ ) = Z log P (j j xit ; θ ) i 1 fa (x, ε, θ ) = j g fε (d ε, ; θ ε ) In contrast to static (and single-agent) decision models where the optimal decision rule has a known closed-form expression, in DP decision models, the functional form of a (x, ε, θ ) is unknown, unless we solve the DP problem. The DP problem cannor be solved generically for the in…nite possible values of θ. Solution methods provide a (x, ε, θ ) for a single value of θ. Estimation and Solution of the DP Problem (2) Solving the DP problem for each trial value of θ in our search for the MLE is computationally costly. The literature on estimation of Dynamic Discrete Choice structural models has been partly motivated by reducing this computational cost. This typically requires additional assumptions / structure.

Additive separability of unobservables (AS) The payo¤ function is: U (at , st ; θ U ) = u (at , xt ; θ U ) + εt (at ) Where εt = fεt (0), εt (1), ..., εt (J )g is a vector of continuous rv's, with support the real line, and continuously di¤erentiable density. εt has zero mean and it is independent of xt . Discrete observables (DO) The vector of observables state variables has a discrete and …nite support: xit 2 X = fx (1 ) , x (2 ) , ..., x (M ) g CI + AS + DO assumptions imply a substantial reduction in the dimension of the state space of the DP problem, and therefore in computation time.

Integrated Value function and Bellman equation Under the AS + CI assumptions: 2 V (xt , εt ) = max 4 a 2A + β ∑x t +1 Z u (a, xt ) + εt (a) V (xt +1 , εt +1 ) fε (d εt +1 ) fx (xt +1 jxt , De…ne the integrated value function: V σ (xt ) Z V (xt , εt ) fε (d εt ) Given this de…nition, we have the integrated Bellman equation: #) ( " Z u ( a, x ) + ε ( a ) t t fε (d ε t ) V σ (xt ) = max + β ∑x V σ (xt +1 ) fx (xt +1 jxt , a) a 2A t +1 Winter 2015 9 / 25 Integrated Value function and Bellman equation The integrated value function and Bellman equation have some interesting properties. [1] The integrated Bellman equation is a contraction mapping, and therefore V σ is unique and it can be computed using succesive iterations in the Int. Bellman. [2] Given V σ (xt ), we can obtain the optimal decision rule (we do not need V (xt , εt )) " # u (xt , a) + εt (a) a (xt , εt ) = arg max + β ∑x V σ (xt +1 ) f (xt +1 jxt , a) a 2A t +1 [3] Function V σ can be described as a vector in the Euclidean space of dimention M (instead of an in…nite dimension space of real-valued functions). Victor Aguirregabiria () Empirical IO Toronto. Integrated Value function and Bellman equation [4] For di¤erent well-known discrete choice models, such as Binary Probit and Logit, Multinomial Logit, or Nested Logit, the Social Surplus function has a closed form expression in terms of v (a, xt )'s. For instance, for the MNL: V σ (xt ) = ln (exp fv (0, xit )g + ... + exp fv (J, xit )g) In vector form: V = ln (exp fu(0) + β Fx (0) Vg + ... + exp fu(J ) + β Fx (J ) Vg) u(0) ... u(J ) are M 1 vectors of pro…ts; Fx (0) ... Fx (J ) are M M transition probability matrices of x.

Value function iteration algorithm Given this equation, the vector V can be obtained by successive approximations.(iterations) in the Bellman equation. Let V0 be an arbitrary initial value for the vector V. For instance, V0 could be a M 1 vector of zeroes. Then, at iteration k 1 we obtain: Vk = Γ(Vk 1) where Γ(.) is the function in the RHS of the Bellman equation, i.e., Vk = ln (exp fu(0) + β Fx (0) Vk 1g + ... + exp fu(J ) + β Fx (J ) Vk 1 g) Since the Bellman equation is a contraction mapping, this algorithm always converges (regardless the initial V0 ) and it converges to the unique …xed point. McFadden's Conditional Logit Consider the static Random Utility Model (RUM): ai = arg max j 2f0,1,...,J g [u (j, xi ) + εi (j )] where we have data on fai , xi g. McFadden's Conditional Logit is a particular type of RUM with: ai = arg max j 2f0,1,...,J g [z (j, xi ) θ + εi (j )] where z (j, xi ) are known functions to the researcher; and fεi (j ) : j = 0, 1, ..., J g are extreme value distributed. Then, P (j j xi , θ ) = expfz (j, xi ) θ g expfz (k, xi ) θ g ∑Jk =0 McFadden's Conditional Logit (2) The log-likelihood function of this CLogit is: " # n l (θ ) = ∑ i =1 J ∑ 1fai = j g z (j, xi ) θ ln j =0 And the likelihood equations: ∂l (θ ) ∂θ n = ∑ i =1 ∂li (θ ) ∂θ J ∂li (θ ) = ∑ z (j, xi ) [1fai = j g ∂θ j =0 J ∑ expfz (k, xi ) θ g k =0 with: P (j j xi , θ )] A nice feature of the Clogit is that this likelihood function is globally concave, and therefore standard gradient search methods easily provide the MLE. For instance, BHHH iterations: ! 1 ! n n ^ ^ ^ ∂l ( θ ) ∂l ( θ ) ∂l ( θ ) i i i k k k ^ θk +1 = ^ θk + ∑ ∑ ∂θ ∂θ ∂θ 0 i =1 i =1 DP - McFadden's Conditional Logit Consider the DP CLogit: ait = arg max j 2f0,1,...,J g [v (j, xit , θ ) + εit (j )] where v (j, xit , θ ) z (j, xit ) θ + β fx (j, xit ) V(θ ) with the vector V(θ ): V(θ ) = ln (exp fz(0) θ + β Fx (0) V(θ )g + ... + exp fz(J ) θ + β Fx (J ) The CCPs are: P (j j xit , θ ) = expfz (j, xit ) θ + β fx (j, xit ) V(θ )g expfz (k, xit ) θ + β fx (k, xit ) V(θ )g ∑Jk =0

DP - McFadden's Conditional Logit (2) The log-likelihood function of the DP-CLogit is: " # l (θ ) = ∑ i ,t J J ∑ 1fait = j g v (j, xit , θ ) j =0 And the likelihood equations: ∂l (θ ) ∂θ ∑ expfv (k, xit , θ )g ln k =0 = ∑ ∂lit∂θ(θ ) with: i ,t J ∂lit (θ ) ∂V(θ ) = ∑ z (j, xit ) + β fx (j, xit ) [1fait = j g 0 ∂θ ∂θ 0 j =0 P (j jxit , θ )] with " # J ∂V(θ ) = I β ∑ P(j, θ ) Fx (j ) ∂θ 0 j =0 1 " J ∑ P(j, θ ) z(j ) j =0 # Nested Fixed Point Algorithm (NFXP) The NFXP algorithm (Rust, 1987) is a gradient iterative search method to obtain the MLE of the structural parameters. This algorithm nests: (1) a BHHH method (outer algorithm), that searches for a root of the likelihood equations; (2) with a value function method (inner algorithm), that solves the DP problem for each trial value of the structural parameters.

Nested Fixed Point Algorithm (2) The algorithm is initialized with an arbitrary vector ^ θ0 . Outer Algorithm: BHHH iteration is de…ned as: ^ θk +1 = ^ θk + ∑ i ,t θk ) ∂lit (^ θk ) ∂lit (^ ∂θ ∂θ 0 ! 1 ∑ i ,t ∂lit (^ θk ) ∂θ ! Inner Algorithm: Value function iterations to solve the DP given ^ θk , ∂lit (^ θk ) ^ to calculate V(θk ), and the corresponding ∂θ . NFXP with CLOGIT (1) We start with an arbitrary initial guess ^ θ0 . Then, we obtain the vector V(^ θ0 ) by using value function iterations: n o n Vk = ln exp z(0)^ θ0 + β Fx (0) Vk 1 + ... + exp z(J )^ θ0 + β Fx (J ) until convergence. Given V(^ θ0 ), we calculate the CCPs: P j j x, ^ θ0 = expfz (j, x ) ^ θ0 + β fx (j, x ) V(^ θ0 )g expfz (k, x ) θ + β fx (k, x ) V(^ θ0 )g ∑Jk =0

NFXP with CLOGIT (2) And given the CCPs P j j x, ^ θ0 , we can make a BHHH iteration to obtain a new value of θ: ! 1 ! ^ ^ ^ ∂l ( θ ) ∂l ( θ ) ∂l ( θ ) 0 0 0 it it it ^ θ1 = ^ θ0 + ∑ ∑ ∂θ ∂θ ∂θ 0 i ,t i ,t where # " J ∂lit (^ θ0 ) ∂V(^ θ0 ) h = ∑ z (j, xit ) + β fx (j, xit ) 1 Then, we obtain the vector V(^ θ0 ) by using value function iterations: n o n Vk = ln exp z(0)^ θ0 + β Fx (0) Vk 1 + ... + exp z(J )^ θ0 + β Fx (J ) until convergence. Given V(^ θ0 ), we calculate the CCPs: P j j x, ^ θ0 = Victor Aguirregabiria () expfz (j, x ) ^ θ0 + β fx (j, x ) V(^ θ0 )g expfz (k, x ) θ + β fx (k, x ) V(^ θ0 )g ∑Jk =0 Empirical IO Toronto. Winter 2015 20 / 25 NFXP with CLOGIT (2) And given the CCPs P j j x, ^ θ0 , we can make a BHHH iteration to obtain a new value of θ: ! 1 ! ^ ^ ^ ∂l ( θ ) ∂l ( θ ) ∂l ( θ ) 0 0 0 it it it ^ θ1 = ^ θ0 + ∑ ∑ ∂θ ∂θ ∂θ 0 i ,t i ,t where # " J ∂lit (^ θ0 ) ∂V(^ θ0 ) h = ∑ z (j, xit ) + β fx (j, xit ) 1fait = j g ∂θ ∂θ 0 j =0 P j jxit , ^ θ0 and " # J ∂V(^ θ0 ) = I β ∑ P(j, ^ θ0 ) Fx (j ) ∂θ 0 j =0 Victor Aguirregabiria () Empirical IO 1 " J θ0 ) z(j ) ∑ P(j, ^ j =0 Toronto. Winter 2015 # 21 / 25 NFXP with CLOGIT If ^ θ1 (3) ^ θ0 satis…es a convergence criterion, then ^ θ1 is the MLE. Otherwise, we apply again the same steps as before but now to ^ θ1 : ^ - Value function iterations to obtain V(θ1 ), and the ∂V(^ θ1 ) corresponding P j j x, ^ θ1 and 0 ; ∂θ - BHHH iteration to obtain ^ θ2 ; And so on until convergence. Victor Aguirregabiria () Empirical IO Toronto. Winter 2015 22 / 25 NFXP: Advantages and limitations The main advantages of the NFXP algorithm are its conceptual simplicity and, most importantly, that it provides the MLE which is the most e¢ cient estimator asymptotically under the assumptions of the model. The main limitation of this algorithm is its computational cost. In particular, the DP problem should be solved for each trial value of the structural parameters. Note: Even for the DP-CLOGIT, the log-likelihood function ia not globally concave in general. Victor Aguirregabiria () Empirical IO Toronto. Winter 2015 23 / 25 EXERCISES [1] In your favorite programming language, write a code for the implementation of the NFXP in the Entry-Exit model with logit errors. [2] Given a choice for the "true" θ, write a code to generate a simulated sample from the "true model". Use this sample to estimate θ. [3] In the general DP-CLOGIT, show that in general the log-likelihood is not globally concave. Obtain su¢ cient conditions on z (j, x ) or/and fx (j, xit ) that imply global concavity. Victor Aguirregabiria () Empirical IO Toronto. Winter 2015 24 / 25 Hotz-Miller Estimator * Main idea: To estimate consistently θ we do not have to solve, even once, a DP problem. Victor Aguirregabiria () Empirical IO Toronto. Winter 2015 25 / 25
