{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Week 12 Lab "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Maximum Likelihood Estimation of a Normal Regression Model\n",
"\n",
"In assignment 11 you have studied the **normal regression model**\n",
"\n",
"\\begin{align*}\n",
" Y_i &= X_i'\\beta + e_i, \\qquad \\text{where }\n",
" e_i | X_i \\sim \\mathcal{N}(0, \\sigma_e^2).\n",
"\\end{align*}\n",
"\n",
"As you can see, the errors here are assumed to have an **exact** normal distribution. This enables us to use maximum likelihood estimation, because we know the distribution of $Y_i$ **conditional on $X_i$**.\n",
"\n",
"What are we estimating? The unknown parameters are $\\beta \\in \\R^K$ and $\\sigma_e^2$.\n",
"\n",
"That conditional density is\n",
"\\begin{align*}\n",
" f_{Y|X}(y | x, \\beta, \\sigma_e^2) = \\frac{1}\n",
" {\\sqrt{ 2 \\pi \\sigma_e^2}} \\exp \\left( - \\frac{1}{2 \\sigma_e^2} (y - x'\\beta)^2\n",
" \\right).\n",
"\\end{align*}\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Loading packages for this notebook\n",
"\n",
"We will be needing the following packages:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"using LinearAlgebra, Distributions, Random, Optim"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 1\n",
"\n",
"Here's a **slightly modified** version of the DGP from week 7, where we simulated Card's schooling model:\n",
"\n",
"$$\n",
"\\begin{align*}\n",
" u & \\sim \\mathcal{N}(0, \\sigma_u^2) \\\\\n",
" A & \\sim \\mathcal{N}(0, \\sigma_A^2) && \\text{(ability)}\\\\\n",
" S &= \\pi + A && \\text{(schooling)}\\\\\n",
" X & \\sim \\text{Discrete uniform on }\\{0,1,\\ldots,20\\} && \\text{(experience)} \\\\\n",
" Y &= \\exp (\\beta_1 + \\beta_2 S + \\beta_3 X + u) && \\text{(earnings)}\n",
"\\end{align*}\n",
"$$\n",
"\n",
"What's different from week 7?\n",
"\n",
"* ability $A$ is not on the rhs of the earnings equation (which effectively removes the omitted variables bias problem)\n",
"\n",
"* work experience $X$ is added on the rhs of the earnigs equation, the coefficient $\\beta_3$ is the return to work experience\n",
"\n",
"Why am I making these changes? I want to avoid endogeneity and I want to have more than one explanatory variable (hence the introduction of $X$.)\n",
"\n",
"This model is a normal regression model when you consider the logarithm of $Y$.\n",
"\n",
"Use the function `schooling_sample` (from week 7) to create a random sample of size 5,000 using these parameter values:\n",
"\n",
"| | Calibrated values |\n",
"|---------------------------|--------------------|\n",
"| $\\beta_1$ | 4.70 |\n",
"| $\\beta_2$ | 0.07 |\n",
"| $\\beta_3$ | 0.12 |\n",
"| $\\pi$ | 13.2 |\n",
"| $\\sigma_u^2$ | 0.175 |\n",
"| $\\sigma_A^2$ | 7.20 |\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 2\n",
"\n",
"As you know, the log-likelihood function for this model is\n",
"\n",
"$$\n",
"L = -\\tfrac{N}{2} \\ln \\left(2 \\pi \\sigma_e^2 \\right) \n",
"-\\frac{1}{2 \\sigma_e^2} \\sum (y_i - x_i' \\beta)^2\n",
"$$\n",
"\n",
"Use this function to obtain the MLE (for all unknown parameters).\n",
"\n",
"(Make sure, along the way, to implement a closure `nll` which is the negative log-likelihood function.)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 3\n",
"\n",
"As you also know, the asymptotic result for the MLE is\n",
"\n",
"\\begin{align*}\n",
" \\sqrt{N}\n",
" \\begin{pmatrix}\n",
" \\hat{\\beta}^\\text{ML} - \\beta \\\\\n",
" \\hat{\\sigma}_e^{2,\\text{ML}} - \\sigma_e^2\n",
" \\end{pmatrix}\n",
" \\overset{d}{\\to}\n",
" \\mathcal{N}\n",
" \\begin{pmatrix}\n",
" \\begin{pmatrix}\n",
" 0\\\\0\n",
" \\end{pmatrix}\n",
" ,\n",
" \\begin{pmatrix}\n",
" \\sigma_e^2 E(X_i X_i')^{-1} &\n",
" 0 \\\\\n",
" 0' &\n",
" 2\\sigma_e^4\n",
" \\end{pmatrix}\n",
"\\end{pmatrix}.\n",
"\\end{align*}\n",
"\n",
"Obtain estimates of the standard errors of $\\widehat{\\beta}^\\text{ML}$.\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 4\n",
"\n",
"Instead of using the **analytical** Hessian matrix to obtain the asymptotic variance, the `Optim` package together with the `NLSolversBase` package are able to **numerically** approximate the Hessian. \n",
"\n",
"The implementation is a bit clumsy. Using your likelihood closure `nll`, you would obtain the Hessian and its diagonal like so:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"using NLSolversBase\n",
"td = TwiceDifferentiable(nll, ones(4))\n",
"H = hessian!(td, vcat(beta, se2)) # that is the MLE for (β, σ^2)\n",
"diag(inv(H))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Are these results based on the numerical derivatives similar to the results based on the analytical derivate?"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Julia 1.7.2",
"language": "julia",
"name": "julia-1.7"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.7.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}