{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Week 6 Lab "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# **Estimating the Return to Schooling (continued)**\n",
"\n",
"In week 3 we have replicated Card's (1993) OLS estimation of the return to schooling, using this specification:\n",
"\n",
"* $\\qquad \\log earn = \\beta_1 + \\beta_2 educ + \\beta_3 exper + \\beta_4 expersq + \\beta_5 black + \\beta_6 south + \\beta_7 smsa + \\beta_8 smsa66 + \\beta_9 reg661 + \\cdots + \\beta_{16} reg668 + u$ \n",
"\n",
"Labor economists typically believe that `educ` in such earnings regessions is an **endogenous** variable: it is a choice variable that is likely correlated with other factors that are not controlled for. Consequently, `educ` and the error term will be correlated. The canonical example is that the error term contains some measure of a person's ability, and obviously education and ability would be correlated."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Excercise 1\n",
"\n",
"Redo the OLS estimation from week 3, but this time implement **heteroskedasticity robust standard errors**. Recall from the lecture that\n",
"\n",
"$$\n",
"\\begin{align*}\n",
" \\sqrt{N} \\left( \\widehat{\\beta}^\\text{OLS} - \\beta^* \\right) \n",
" &\\overset{d}{\\to} N (0, \\Omega) \n",
"\\end{align*}\n",
"$$\n",
"where $\\Omega := E(X_i X_i')^{-1} E(u_i^2 X_i X_i') E(X_i X_i')^{-1}$.\n",
"\n",
"We say that\n",
"\n",
"* $\\Omega$ is the asymptotic variance of $\\sqrt{N} \\left( \\widehat{\\beta}^\\text{OLS} - \\beta^* \\right)$ \n",
"\n",
"* $\\Omega / N$ is the asymptotic variance of $\\widehat{\\beta}^\\text{OLS}$ \n",
"\n",
"We take this to mean that $\\widehat{\\beta}^\\text{OLS}$ has an *approximate* normal distribution with mean $\\beta^*$ and variance $\\Omega / N$.\n",
"\n",
"A consistent estimator for the covariance matrix $\\Omega$ is\n",
"$$\n",
"\\begin{align*}\n",
" \\widehat{\\Omega}\n",
" = \\left( \\tfrac{1}{N} \\sum_{i=1}^N X_i X_i' \\right)^{-1}\n",
" \\left( \\tfrac{1}{N-K} \\sum_{i=1}^N \\hat{u}_i^2 X_i X_i' \\right)\n",
" \\left( \\tfrac{1}{N} \\sum_{i=1}^N X_i X_i' \\right)^{-1} \n",
"\\end{align*}\n",
"$$\n",
"\n",
"Therefore, the variance of $\\widehat{\\beta}^\\text{OLS}$ is approximately equal to $\\widehat{\\Omega}/N$.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(3010, 16)"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# read csv-file\n",
"using DelimitedFiles\n",
"data = readdlm(\"card.csv\", ',');\n",
"\n",
"# loading data\n",
"Y = Array{Float64}(data[:, 33])\n",
"\n",
"# now create an n-by-k matrix X by grabbing the correct columns from the data matrix\n",
"X = Array{Float64}(data[:,[4, 32, 34, 22, 23, 24, 25, 12, 13, 14, 15, 16, 17, 18, 19]])\n",
"X = hcat(ones(length(Y), 1), X) # adding constant to front\n",
"\n",
"n, k = size(X)\n",
"\n",
"# implement heteroskedasticity robust estimation below\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Instrumental Variable\n",
"\n",
"To address the endogeneity issue of `educ` Card proposes the \"*presence of a nearby college* [i.e., university]\". For every person in the sample, he defines the dummy variable\n",
"$$\n",
"nearc4 =\n",
"\\begin{cases}\n",
" 1 & \\text{ if person lives near a 4-year college} \\\\0 & \\text{ otherwise}\n",
"\\end{cases}\n",
"$$\n",
"\n",
"(How does Card define \"nearness\"?)\n",
"\n",
"Check page 10 of Card's paper to read how he justifies the validity of his IV. Are you convinced?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 2\n",
"\n",
"Define the vectors and matrices $Y, X_1, X_2, X, Z_1, Z_2$, and $Z$. You can find their definition in the week 5 lecture notes."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# define vectors and matrices here\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 3\n",
"\n",
"Estimate the **reduced form** model $X_{i2} = Z_i'\\pi + v_i$, where $\\pi = (\\pi_1', \\pi_2)'$ and $\\pi_2$ is the coefficient that belongs to $Z_2$.\n",
"\n",
"Report OLS estimates for $\\pi_2$ and their standard errors (under heterskedasticity).\n",
"\n",
"Compare your estimate to Card's table 3.\n",
"\n",
"What does the estimation result tell you about **instrument relevance**? "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# reduced form between X and Z\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 4\n",
"\n",
"Estimate the **reduced form** model $Y_i = Z_i' \\lambda + w_i$, where $\\lambda = (\\lambda_1', \\lambda_2)'$ and $\\lambda_2$ is the coefficient that belongs to $Z_2$.\n",
"\n",
"Report OLS estimates for $\\lambda_2$ and their standard errors. \n",
"\n",
"Compare your results to Card's table 3."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# reduced form between Y and Z\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Excercise 5\n",
"\n",
"### IV Estimation\n",
"\n",
"Now estimate $\\beta_2$ using three different approaches:\n",
"\n",
"1. Using the formula $\\widehat{\\beta}^{\\text{IV}} = (Z'X)^{-1} Z'Y$\n",
"\n",
"2. Using the two-step procedure:\n",
"\n",
" * regress $X_i$ on $Z_i$, obtain $\\widehat{\\pi}$, create $\\widehat{X}_i$ (the exogenous version of $X_i$)\n",
" * regress $Y_i$ on $\\widehat{X}_i$\n",
"\n",
"3. Using the two-step procedure:\n",
"\n",
" * regress $X_{i2}$ on $Z_i$, obtain residuals $\\widehat{v}_i$\n",
" * regress $Y_i$ on $X_i$ and $\\widehat{v}_i$\n",
"\n",
"Can you confirm that you obtain three (almost) identical numerical values for your estimate of $\\beta_2$?\n",
"\n",
"(No need to estimate standard errors this week!)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# first approach\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"# second approach\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"#third approach\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Julia 1.1.0",
"language": "julia",
"name": "julia-1.1"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.7.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}