# Computational Assignment


### Help and Guidance

We have a strict policy of not offering any help or guidance with regard to the assignment. This includes so-called clarification questions, to which we give no answers.

Do your work based on the information provided here to the best of your abilities and understanding.

Good luck!


# **Estimating Female Labor Supply**

Angrist and Evans (1996) study female labor supply in the United States. Specifically, they want to know how children affect women's labor supply. Their particular focus lies on black and hispanic women. 

They use a sample of almost 32,000 black and hispanic women who are married. All women in the sample have at least two children.

The baseline model is:

* $hours = \beta_1 + \beta_2 kidcount + \beta_3 nonmomi + \beta_4 black + \beta_5 educ + \beta_6 age + \beta_7 age^2 + u$,

where the variable definitions are given in the file `labsup.des` and the paper.

## Exercise 1

Read the data into Julia. Create the vector $Y$ and matrix $X$ that you need to compute the OLS estimator.

Make sure to put the file `labsup.csv` in the same directory as your Julia notebook.

## Exercise 2

Write a function `lm_ols` (*lm* for linear model) that takes the arguments `Y` and `X` and returns

* a vector containing the OLS estimator;

* a matrix `Avar_hat` corresponding to $\widehat{\Omega}/N$ from the lecture (the estimated asymptotic covariance matrix of $\widehat{\beta}^{OLS}$)

(Throughout this entire notebook always allow for heteroskedasticity.)

## Exercise 3

Write a function `lm_inference` that takes an estimator and its covariance matrix as arguments and returns

* a vector containing the standard errors;

* a vector containing the t-statistics

## Exercise 4

Report your OLS estimate, its standard error, and t-statistic in the following table:

**TABLE 1: Estimates of $\beta_2$ in the structural equation**

|                           | OLS   | IV    | 2SLS  |
|---------------------------|-------|-------|-------|
| Point estimate            | 0.00  |0.00   |0.00   |
| Standard error            | 0.00  |0.00   |0.00   |
| t-statistic               | 0.00  |0.00   |0.00   |

## Exercise 5

Many applied economtricians believe that `kidcount` is an endogenous variable. What could be their reasons?

## Exercise 6

Angrist and Evans propose to use the following variable as an instrument:

* `samesex`: equal to one if first two children are of same biological sex, zero otherwise

What is the justification behind this IV?

## Exercise 7

The reduced form model using `samesex` is

* $kidcount = \pi_1 + \pi_2 nonmomi + \pi_3 black + \pi_4 educ + \pi_5 age + \pi_6 age^2 + \pi_7 samesex + v$.

Using your function `lm_ols`, run the reduced form regression for `kidcount`. 

Report your results in the following table:

**TABLE 2: Estimates of $\pi_7$ in the first stage regression**

|                           | $\widehat{\pi}_7$  |
|---------------------------|--------------------|
| Point estimate            | 0.00               |
| Standard error            | 0.00               |
| t-statistic               | 0.00               |

Interpret your results: Is `samesex` a good instrument?

(In your code, use the matrix naming convention from the lecture (where you split $X$ in $X_1$ and $X_2$ and likewise with $Z$).)

## Exercise 8

Write a function `lm_iv` that takes the arguments `Y` and `X_1`, `X_2`, `Z_1`, and `Z_2` and returns

* a vector containing the IV estimator;

* a matrix containing the estimated asymptotic covariance of $\widehat{\beta}^{IV}$.


## Exercise 9

Use your function `lm_iv` to obtain $\widehat{\beta}^{IV}$, and the function `lm_inferenece` to obtain its standard error and t-statistic and report your results in Table 1 above.

## Exercise 10

The data set contains a second possible instrumental variable, `multi2nd`. Write a function `lm_2sls` to be able to run a 2SLS estimation.

The function `lm_2sls` takes the arguments `Y` and `X_1`, `X_2`, `Z_1`, and `Z_2` and returns

* a vector containing the 2SLS estimates;

* a matrix containing the estimated asymptotic covariance matrix of $\widehat{\beta}^{2SLS}$.

## Exercise 11

Use the function `lm_2sls` to estimate $\beta_2$ via 2SLS using `samesex` and `multi2nd` as instrumental variables. Report your results in the Table 1 above. Also report the standard errors and t-statistic.

## Exercise 12

Take a closer look at the first stage regression:

* $kidcount = \pi_1 + \pi_2 nonmomi + \pi_3 black + \pi_4 educ + \pi_5 age + \pi_6 age^2 + \pi_7 samesex + \pi_8 multi2nd + w$.

Using your function `lm_ols`, run the reduced form regression for `kidcount`.

Report your results in the following table:

**TABLE 3: Estimates of $\pi_7$ and $\pi_8$ in the first stage regression**

|                           | $\widehat{\pi}_7$  | $\widehat{\pi}_8$  |
|---------------------------|--------------------|--------------------|
| Point estimate            | 0.00               | 0.00               |
| Standard error            | 0.00               | 0.00               |
| t-statistic               | 0.00               | 0.00               |

Interpret your results: Are `samesex` and `multi2nd` good instruments?


## Exercise 13

Write a function `lm_2sls_ftest` that takes the arguments `Y` and `X_1`, `X_2`, `Z_1`, and `Z_2` and returns

* a scalar containing the first stage F-statistic for the restriction that the coefficients of $Z_2$ equal zero.

Once you have calculated the F-statistic, you can use Stock and Yogo's weak IV cutoffs to make a decision with regard to the strength/weakness of your instruments. What do you find?

Here's a brief reminder about the first stage $F$-test (it generalized straightforwardly to any F-test in linear regression.) In the first stage you run this regression:

$$
X_{2i} = Z_{i1} \pi_1 + Z_{2i} \pi_2 + v_i
$$

You want to test if $\pi_2 = 0$ where $\dim(\pi_2) = L_2$ with $L_2$ possibly exceeding 1.

You have estimated $\pi$ by $\widehat{\pi}$ using OLS and you have estimated the covariance matrix so that

$$
\sqrt{N} (\widehat{\pi} - \pi) \overset{\text{approx}}{\sim} \mathcal{N} (0, \widehat{\Omega})
$$

Define the $L \times L_2$ dimensional matrix $R := (0_{L_2 \times K_1}, \quad I_{L_2})'$ for the purpose of selecting the appropriate elements of 
$\widehat{\pi} - \pi$. Then $R' (\widehat{\pi} - \pi) = \pi_2$. 

It follows that

$$
\sqrt{N} R' (\widehat{\pi} - \pi) \overset{\text{approx}}{\sim} \mathcal{N} (0, R' \widehat{\Omega} R).
$$

It follows, after the typical normal standardization, that for the quadratic form
$$
N (R' (\widehat{\pi} - \pi))' (R' \widehat{\Omega} R)^{-1} (R' (\widehat{\pi} - \pi))
\overset{\text{approx}}{\sim} \chi^2_{L_2}
$$

Our null hypothesis is that $\pi = 0$, consequently
$$
N (R' \widehat{\pi})' (R' \widehat{\Omega} R)^{-1} (R' \widehat{\pi})
\overset{\text{approx}}{\sim} \chi^2_{L_2}
$$

The F-statistic is the left hand side divided by $L_2$:
$$
F := 
\frac{N (R' \widehat{\pi})' (R' \widehat{\Omega} R)^{-1} (R' \widehat{\pi})}{L_2}
$$

(Notice: Hansen, in chapters 9.10 and 9.14, explains that for the purpose of calculating F one could use $\widehat{\Omega}$ that corresponds to the covariance estimator under *homoskedasticity*. Using the heteroskedasticity-robust version is ok here if you find it less confusing. The final test result happens to be unaffected.)




## Exercise 14

Calculate the F-statistic.