Introductory Econometrics Problem set 1

Introductory Econometrics
Problem set 1
Jan Zouhar
Department of Econometrics, University of Economics, Prague, zouharj@vse.cz
Due date: March 26
Problem 1.1. In an attempt to prove that a tall person earns more than a short one ceteris paribus, your friend
Peter ran a simple regression of wage.usd (hourly wage, in U.S. dollars) on height.cm (height of a person in
centimetres), using data on a random sample of 300 employees. The estimated equation is
3
wage.usd D 7 C 0:02 height.cm:
(1)
Next, Peter thought it might be a good idea to estimate the log-level model, yielding him the equation
6
log.wage.usd/ D 1:9 C 0:0025 height.cm:
a)
b)
c)
d)
(2)
Give an example of a descriptive interpretation of the estimated slope coefficients in (1) and (2).
Give an example of a causal interpretation of the estimated slope coefficients in (2).
What is the interpretation of the intercept in (1)?
Do you think the estimated slope quantifies a causal relationship? Can you give an example of an
y
z ! x relationship that might spoil the causal interpretation?
Problem 1.2. Apart from wage.usd and height.cm, Peter’s dataset (from the previous problem) also contains
data on respondents’ wage and height in euros and metres, respectively, stored in variables wage.eur and
height.m, and for each observation it holds that
wage.eur D 0:8 wage.usd;
height.m D 0:01 height.cm:
Write down the sample regression function (= estimated equation) in the regression of . . .
a ) wage.eur on height.cm.
b ) wage.eur on height.m.
c ) log(wage.eur) on height.cm.
d ) log(wage.eur) on height.m.
Problem 1.3. Using the data in attend.gdt, you are supposed to study the relationship between class
attendance at a university, and the resulting test score. First, read the description of the dataset (Data !
Dataset info) and the individual variables (Descriptive label in the main window). (If needed, browse the
web for additional information about ACT scores and GPAs.)
a ) Regress final on attend. Report the estimated equation and create a scatter plot with the actual and
fitted values (i.e., the actual data points and the regression line). Looking at the plot, do you think
that the homoskedasticity assumption holds in this model? Or does the variance of the final score vary
systematically with attendance rate? Explain.
b ) Regress final on attend and priGPA. Report the estimated equation again. Explain the difference in the
estimated slope coefficient of the attend variable (compared with your previous equation). Which of
the results do you consider a more accurate estimate of the causal effect of class attendance on the final
score?
c ) Regress final on attend, priGPA, and ACT. Again, explain the differences between the attend coefficients.
d ) Interpret the R-squared in your last equation. Does its low value render the coefficient estimates unreliable?
1