{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Week 9 Lab\n",
"\n",
"Notice: This notebook mimics your assignment. Instead of the labor supply application, we are studying Card's (1993) return to schooling application.\n",
"\n",
"Since you have worked on your assignment, you should be able to answer this notebook relatively easily."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# **Estimating the Return to Schooling (continued)**\n",
"\n",
"In week 3 we have replicated Card's (1993) OLS estimation of the return to schooling, using this specification:\n",
"\n",
"* $\\qquad \\log earn = \\beta_1 + \\beta_2 educ + \\beta_3 exper + \\beta_4 expersq + \\beta_5 black + \\beta_6 south + \\beta_7 smsa + \\beta_8 smsa66 + \\beta_9 reg661 + \\cdots + \\beta_{16} reg668 + u$ \n",
"\n",
"Labor economists typically believe that `educ` in such earnings regessions is an **endogenous** variable: it is a choice variable that is likely correlated with other factors that are not controlled for. Consequently, `educ` and the error term will be correlated. The canonical example is that the error term contains some measure of a person's ability, and obviously education and ability would be correlated."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 1\n",
"\n",
"Read the data into Julia. Create the vector $Y$ and matrix $X$ that you need to compute the OLS estimator.\n",
"\n",
"Make sure to put the file `card.csv` in the same directory as your Julia notebook."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 2\n",
"\n",
"Write a function `lm_ols` (*lm* for linear model) that takes the arguments `Y` and `X` and returns\n",
"\n",
"* a vector containing the OLS estimator;\n",
"\n",
"* a matrix `Avar_hat` corresponding to $\\widehat{\\Omega}/N$ from the lecture (the estimated asymptotic covariance matrix of $\\widehat{\\beta}^{OLS}$)\n",
"\n",
"(Throughout this entire notebook always allow for heteroskedasticity.)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 3\n",
"\n",
"Write a function `lm_inference` that takes an estimator and its covariance matrix as arguments and returns\n",
"\n",
"* a vector containing the standard errors;\n",
"\n",
"* a vector containing the t-statistics"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 4\n",
"\n",
"Report your OLS estimate, its standard error, and t-statistic in the following table:\n",
"\n",
"**TABLE 1: Estimates of $\\beta_2$ in the structural equation**\n",
"\n",
"| | OLS | IV | 2SLS |\n",
"|---------------------------|-------|-------|-------|\n",
"| Point estimate | 0.00 |0.00 |0.00 |\n",
"| Standard error | 0.00 |0.00 |0.00 |\n",
"| t-statistic | 0.00 |0.00 |0.00 |"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 5\n",
"\n",
"Many applied economtricians believe that `educ` is an endogenous variable. What could be their reasons?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 6\n",
"\n",
"(removed)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 7\n",
"\n",
"To address the endogeneity issue of `educ` Card proposes the \"*presence of a nearby college* [i.e., university]\". For every person in the sample, he defines the dummy variable\n",
"$$\n",
"nearc4 =\n",
"\\begin{cases}\n",
" 1 & \\text{ if person lives near a 4-year college} \\\\0 & \\text{ otherwise}\n",
"\\end{cases}\n",
"$$\n",
"\n",
"The reduced form model using `nearc4` is\n",
"\n",
"* $educ = \\pi_1 + \\pi_2 exper + \\pi_3 expersq + \\pi_4 black + \\pi_5 south + \\pi_7 smsa + \\pi_7 smsa66 + \\pi_8 reg661 + \\cdots + \\pi_{15} reg668 + \\pi_{16} nearc4 + v$.\n",
"\n",
"Using your function `lm_ols`, run the reduced form regression for `educ`. \n",
"\n",
"Report your results in the following table:\n",
"\n",
"**TABLE 2: Estimates of $\\pi_{16}$ in the first stage regression**\n",
"\n",
"| | $\\widehat{\\pi}_{16}$ |\n",
"|---------------------------|--------------------|\n",
"| Point estimate | 0.00 |\n",
"| Standard error | 0.00 |\n",
"| t-statistic | 0.00 |\n",
"\n",
"Interpret your results: Is `nearc4` a good instrument?\n",
"\n",
"(In your code, use the matrix naming convention from the lecture (where you split $X$ in $X_1$ and $X_2$ and likewise with $Z$).)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 8\n",
"\n",
"Write a function `lm_iv` that takes the arguments `Y` and `X_1`, `X_2`, `Z_1`, and `Z_2` and returns\n",
"\n",
"* a vector containing the IV estimator;\n",
"\n",
"* a matrix containing the estimated asymptotic covariance of $\\widehat{\\beta}^{IV}$.\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 9\n",
"\n",
"Use your function `lm_iv` to obtain $\\widehat{\\beta}^{IV}$, and the function `lm_inference` to obtain its standard error and t-statistic and report your results in Table 1 above."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 10\n",
"\n",
"The data set contains a second possible instrumental variable, `nearc2`. Write a function `lm_2sls` to be able to run a 2SLS estimation.\n",
"\n",
"The function `lm_2sls` takes the arguments `Y` and `X_1`, `X_2`, `Z_1`, and `Z_2` and returns\n",
"\n",
"* a vector containing the 2SLS estimates;\n",
"\n",
"* a matrix containing the estimated asymptotic covariance matrix of $\\widehat{\\beta}^{2SLS}$."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 11\n",
"\n",
"Use the function `lm_2sls` to estimate $\\beta_2$ via 2SLS using `nearc2` and `nearc4` as instrumental variables. Report your results in the Table 1 above. Also report the standard errors and t-statistic."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 12\n",
"\n",
"Take a closer look at the first stage regression:\n",
"\n",
"* $educ = \\pi_1 + \\pi_2 exper + \\pi_3 expersq + \\pi_4 black + \\pi_5 south + \\pi_7 smsa + \\pi_7 smsa66 + \\pi_8 reg661 + \\cdots + \\pi_{15} reg668 + \\pi_{16} nearc4 + \\pi_{17} nearc2 + v$.\n",
"\n",
"Using your function `lm_ols`, run the reduced form regression for `educ`.\n",
"\n",
"Report your results in the following table:\n",
"\n",
"**TABLE 3: Estimates of $\\pi_{16}$ and $\\pi_{17}$ in the first stage regression**\n",
"\n",
"| | $\\widehat{\\pi}_{16}$ | $\\widehat{\\pi}_{17}$ |\n",
"|---------------------------|--------------------|--------------------|\n",
"| Point estimate | 0.00 | 0.00 |\n",
"| Standard error | 0.00 | 0.00 |\n",
"| t-statistic | 0.00 | 0.00 |\n",
"\n",
"Interpret your results: Are `nearc2` and `nearc4` good instruments?\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 13\n",
"\n",
"Write a function `lm_2sls_ftest` that takes the arguments `Y` and `X_1`, `X_2`, `Z_1`, and `Z_2` and returns\n",
"\n",
"* a scalar containing the first stage F-statistic for the restriction that the coefficients of $Z_2$ equal zero.\n",
"\n",
"Once you have calculated the F-statistic, you can use Stock and Yogo's weak IV cutoffs to make a decision with regard to the strength/weakness of your instruments. What do you find?\n",
"\n",
"Here's a brief reminder about the first stage $F$-test (it generalized straightforwardly to any F-test in linear regression.) In the first stage you run this regression:\n",
"\n",
"$$\n",
"X_{2i} = Z_{i1} \\pi_1 + Z_{2i} \\pi_2 + v_i\n",
"$$\n",
"\n",
"You want to test if $\\pi_2 = 0$ where $\\dim(\\pi_2) = L_2$ with $L_2$ possibly exceeding 1.\n",
"\n",
"You have estimated $\\pi$ by $\\widehat{\\pi}$ using OLS and you have estimated the covariance matrix so that\n",
"\n",
"$$\n",
"\\sqrt{N} (\\widehat{\\pi} - \\pi) \\overset{\\text{approx}}{\\sim} \\mathcal{N} (0, \\widehat{\\Omega})\n",
"$$\n",
"\n",
"Define the $L \\times L_2$ dimensional matrix $R := (0_{L_2 \\times K_1}, \\quad I_{L_2})'$ for the purpose of selecting the appropriate elements of \n",
"$\\widehat{\\pi} - \\pi$. Then $R' (\\widehat{\\pi} - \\pi) = \\pi_2$. \n",
"\n",
"It follows that\n",
"\n",
"$$\n",
"\\sqrt{N} R' (\\widehat{\\pi} - \\pi) \\overset{\\text{approx}}{\\sim} \\mathcal{N} (0, R' \\widehat{\\Omega} R).\n",
"$$\n",
"\n",
"It follows, after the typical normal standardization, that for the quadratic form\n",
"$$\n",
"N (R' (\\widehat{\\pi} - \\pi))' (R' \\widehat{\\Omega} R)^{-1} (R' (\\widehat{\\pi} - \\pi))\n",
"\\overset{\\text{approx}}{\\sim} \\chi^2_{L_2}\n",
"$$\n",
"\n",
"Our null hypothesis is that $\\pi = 0$, consequently\n",
"$$\n",
"N (R' \\widehat{\\pi})' (R' \\widehat{\\Omega} R)^{-1} (R' \\widehat{\\pi})\n",
"\\overset{\\text{approx}}{\\sim} \\chi^2_{L_2}\n",
"$$\n",
"\n",
"The F-statistic is the left hand side divided by $L_2$:\n",
"$$\n",
"F := \n",
"\\frac{N (R' \\widehat{\\pi})' (R' \\widehat{\\Omega} R)^{-1} (R' \\widehat{\\pi})}{L_2}\n",
"$$\n",
"\n",
"(Notice: Hansen, in chapters 9.10 and 9.14, explains that for the purpose of calculating F one could use $\\widehat{\\Omega}$ that corresponds to the covariance estimator under *homoskedasticity*. Using the heteroskedasticity-robust version is ok here if you find it less confusing. The final test result happens to be unaffected.)\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 14\n",
"\n",
"Calculate the F-statistic."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"\n",
"\n",
"\n",
"\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Julia 1.7.2",
"language": "julia",
"name": "julia-1.7"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.7.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}