Last week we learned about Stock & Yogo's idea to define weak instruments.
Brief summary:
Instrument strenght is determined by first stage F (regression of X on Z)
A larger F implies, ceteris paribus, a stronger instrument
But when exactly is F large enough for an IV to be strong?
Idea: When the empirical size of the underlying t-test isn't too bad
In a textbook setting, a well behaved t-test rejects a true null 5% of the time
In a weak IV setting, this percentage might be higher
Let's say we don't want that percentage to be more then 15%
We saw that this roughly translates to the rule of thumb that first stage sample F must be at least 10
If first stage sample F is larger than 10, then we are confident that the empirical size of the t-test is lower than 15%
Keane and Neal point out that the story gets more complicated if one also pays attention to the power function
While it's great to have a test with good size properties, we would also like the power of the test to be high
Recall: the power is defined as the probability to reject the null when it is false
Let's see how we can study ower in our toy DGP
We learned a lot about true (population) F, sample F, and also the degree of endogeneity.
Keane & Neal offer a table that translates between population and sample F:
| population F | sample F |
|---|---|
| 1.82 | 8.96 |
| 2.30 | 10.00 |
| 5.78 | 16.38 |
| 10.00 | 23.10 |
| 29.44 | 50.00 |
| 73.75 | 104.70 |
(Table 1 of Keane & Neal 2024)
At the same time, we learned that values for $\rho$ that are of practical relevance fall somewhere between zero and 0.50. Therefore, for our simulations we will focus on
| values for $\rho$ that are of practicla relevance |
|---|
| 0.00 (no endogeneity) |
| 0.10 |
| 0.30 |
| 0.50 |
We will restrict ourselves to the toy model of Keane & Neal (2024).
Their DGP is on page 193, summarized here:
$$ \begin{align*} Y_i &= \beta X_i + u_i\\ X_i &= \pi Z_i + v_i\\ v_i &= \rho u_i + \sqrt{1-\rho^2} \eta_i \end{align*} $$where
$u_i \sim N(0,1)$
$\eta_i \sim N(0,1)$
$Z_i \sim N(0,1)$
$\beta = 0$
notice: $\text{Var}(v_i) = 1$
To be able to create data sets off the above DGP we need to know values for $\pi$ and $\rho$.
Determination of $\pi$ will follow a roundabout way.
Recall the definition of F from lecture 6: F is the proportion of the variance of $X$ that is explained by $Z$ divided by the proportion of the variance of $X$ that is explained by $v$:
$$ F = N \cdot \frac{R^2}{1-R^2} = N \cdot \frac{\text{ESS}}{\text{RSS}} = N \cdot \frac{\text{Var}(Z\pi)}{\sigma_v^2} = N \cdot \frac{\pi^2 \text{Var}(Z)}{\sigma_v^2} = N \cdot \pi^2 $$This allows us to set $\pi$ in the above DGP by changing the value of $F$, because $\pi = \sqrt{F/N}$.
I'm dumping a bunch of functions that will be needed for the computer simulations.
using Distributions, Random
function dgp_keane_neal(; b=0, n=1000, F, rho)
"""
Generates one sample of size n following the DGP of Keane & Neal (2024) page 193
### Input
- `b` -- structural coefficient beta (scalar)
- `n` -- sample size
- `F` -- reduced form F-stat
- `rho` -- degree of endogeneity
### Output
- `x` -- (n by 1) vector representing endogenous regressor
- `y` -- (n by 1) vector representing outcome variable
- `z` -- (n by 1) vector representing instrumental variable
"""
p = sqrt(F/n)
u = rand(Normal(0, 1), n)
eta = rand(Normal(0, 1), n)
z = rand(Normal(0, 1), n)
v = rho*u + sqrt(1-rho^2)*eta
x = p*z .+ v
y = b*x .+ u
return (; x, y, z)
end
dgp_keane_neal (generic function with 1 method)
function ols_estimator(x, y)
"""
Implements OLS estimation of linear model with one exogenous regressor.
### Input
- `x` -- (n by 1) vector representing regressor
- `y` -- (n by 1) vector representing outcome variable
### Output
- `bhat` --OLS estimate of beta
- `se` --standard error of OLS estimate
- `t` --t-statistic of OLS estimator
"""
# OLS estimator
bhat= x\y
# standard error
uhat = y-x*bhat
s = uhat'uhat/length(y)
se= sqrt(s/(x'x))
# t-statistic (absolute value)
t = bhat/se
return (; bhat, se, t) # returning named tuple
end
ols_estimator (generic function with 1 method)
function iv_estimator(x, y, z)
"""
Implements IV estimation of linear model with one endogenous variable, and one instrument.
### Input
- `x` -- (n by 1) vector representing endogenous regressor
- `y` -- (n by 1) vector representing outcome variable
- `z` -- (n by 1) vector representing instrumental variable
### Output
- `biv` -- IV estimate of beta
- `se` -- standard error of IV estimate
- `t` -- t-statistic of IV estimator
### Notes
For calculation of standard error, we're using the formula on page 190 of Keane & Neal (2024)
"""
# IV estimator
bhat= (z'y)/(x'z)
# standard error (using formula in Keane & Neal (2024))
n = length(y)
pihat = z\x # reduced form coefficient estimate
TSS = n*pihat^2*var(z)
uhat = y-x*bhat
s = uhat'*uhat/n
se = sqrt(s/TSS)
# t-statistic (absolute value)
t = bhat/se
return (; bhat, se, t) # returning `named tuple`
end
iv_estimator (generic function with 1 method)
function simulate_distribution(; b=0, F, rho, rep=10000)
"""
Creates finite sample distributions of
- IV estimator,
- standard error of IV estimator
- tstat of IV estimator
How does it create finite sample distribution? It creates `rep` number of DGPs and each time
calculates IV estimator, its standard error, and tstat.
### Input
- `F` -- reduced form F-stat
- `rho` -- degree of endogeneity
- `rep` -- number of repititions/simulations run
### Output
- `bols_dst` -- (rep by 1) vector collecting rep simulations of OLS estimator
- `sols_dst` -- (rep by 1) vector collecting rep simulations of standard error of OLS estimator
- `tols_dst` -- (rep by 1) vector collecting rep simulations of tstat of OLS estimator
- `biv_dst` -- (rep by 1) vector collecting rep simulations of IV estimator
- `siv_dst` -- (rep by 1) vector collecting rep simulations of standard error of IV estimator
- `tiv_dst` -- (rep by 1) vector collecting rep simulations of tstat of IV estimator
- `tar_dst` -- (rep by 1) vector collecting rep simulations of AR-statistic
"""
bols_dst = Array{Float64}(undef, rep)
sols_dst = Array{Float64}(undef, rep)
tols_dst = Array{Float64}(undef, rep)
biv_dst = Array{Float64}(undef, rep)
siv_dst = Array{Float64}(undef, rep)
tiv_dst = Array{Float64}(undef, rep)
ar_dst = Array{Float64}(undef, rep)
for i = 1:rep
x, y, z = dgp_keane_neal(b=b, F=F, rho=rho)
# calculating simulated distribution for bols, seols, and tols
bols_dst[i], sols_dst[i], tols_dst[i] = ols_estimator(x, y)
# calculating simulated distribution for biv, seiv, and tiv
biv_dst[i], siv_dst[i], tiv_dst[i] = iv_estimator(x, y, z)
# calculating AR statistic
# by regressing Y on Z
ar_dst[i] = ols_estimator(z, y).t
end
return (; bols_dst, sols_dst, tols_dst, biv_dst, siv_dst, tiv_dst, ar_dst)
end
simulate_distribution (generic function with 1 method)
I'm creating two containers that store my data generating processess:
dgp_zero: contains 10,000 samples generated from a DGP in which $\rho=0$ and $F=73.75$
dgps: contains 10,000 samples each for 12 parameter combinations of $\rho$ and $F$
# no endogeneity DGP, with strong IV
dgp_zero = simulate_distribution(rho=0, F=73.75)
# all other DGPS
parms_rho = (0.10, 0.30, 0.50)
parms_F = (1.82, 2.30, 10, 29.44, 73.75)
dgps = [simulate_distribution(rho = rho, F = F) for rho in parms_rho, F in parms_F];
Let's first start with simple histograms of IV standard errors under different parameter combinations
using Plots
using Plots.PlotMeasures: mm
Plots.theme(:wong2)
plt = plot(
layout=(length(parms_rho),length(parms_F)),
size = (1800, 800),
plot_title = "Empirical Distriution of standard errors (truncated at 90th percentile) for Different DGPs")
[histogram!(plt,
dgps[i,j].siv_dst,
normalize = true,
subplot = length(parms_F)*(i-1)+j,
bins = range(0, quantile(dgps[i, j].siv_dst, 0.90), length=51),
legend=false,
title = "\\rho = $rho and popF = $F")
for (i, rho) in enumerate(parms_rho), (j, F) in enumerate(parms_F)]
display(plt)
How do we read the above picture?
Again, the top right picture offers the benchmark against which to compare the other histograms.
You can see in the weak IV case (bottom left four pictures), that the tails are quite long. In fact, they would look even longer if I hadn't restricted the histogram range to only reach the 90% percentile!
What next? Keane & Neal had the nice idea to plot $\hat{\beta}_\text{IV}$ against its standard error. Let's do that!
But before we look at IV, let's look at the best possible scenario: OLS under $\rho-0$ (no endogeneity) with large variance for X
I'm using the container dgp_zero created earlier for this exercise
using Plots
using Plots.PlotMeasures: mm
Plots.theme(:wong2)
plot(dgp_zero.bols_dst, dgp_zero.sols_dst,
size=(800,600),
seriestype=:scatter,
legend=false,
title = "OLS Estimates vs Their Standard errors \\n (with \\rho = 0 and popF = 73.75)")
xlabel!("OLS estimate")
ylabel!("Standard error")
plot!(dgp_zero.bols_dst[abs.(dgp_zero.tols_dst).>1.96], dgp_zero.sols_dst[abs.(dgp_zero.tols_dst).>1.96], seriestype=:scatter, mc=:red)
plot!([0, 4], [0, 2], seriestype=:straightline, lc=:blue, linestyle=:dash)
plot!([0, 4], [0, -2], seriestype=:straightline, lc=:blue, linestyle=:dash)
Interpretation of above picture:
These are 10,000 combinations of $\hat{\beta}_{\text{IV}}$ vs its standard error
In all DGPs the true value of $\beta$ was zero
The OLS estimates range (roughly) from about -0.10 to about -0.10
At first glance, these OLS estimates appear close to zero
Of course we have to look at their standard errors to decide on precision
Their standard errors range (roughly) from about 0.028 to 0.034
The red dots indicate OLS estimates for which the OLS t-test rejects $H_0: \beta=0$
These tend to be OLS estimates that are too far away from zero
The blue lines have (absolute) slopes of 1.96/2 and effectively demarcate insignificant dots from significant dots
Important: There's no clear association between OLS estimates and their standard errors!
If you correlate the two, you obtain a value near zero:
using Statistics
cor(dgp_zero.bols_dst, dgp_zero.sols_dst)
0.005275048579007424
Now let's do this for the IV estimator under different parameter combinations
We start with the best case: low endogeneity, and strong IV
using Plots
using Plots.PlotMeasures: mm
Plots.theme(:wong2)
plot(dgps[1,5].biv_dst, dgps[1,5].siv_dst,
size=(800,600),
seriestype=:scatter,
legend=false,
title = "IV Estimates vs Their Standard errors \\n (with \\rho = 0.1 and popF = 73.75)")
xlabel!("IV estimate")
ylabel!("Standard error")
plot!(dgps[1,5].biv_dst[abs.(dgps[1,5].tiv_dst).>1.96], dgps[1,5].siv_dst[abs.(dgps[1,5].tiv_dst).>1.96], seriestype=:scatter, mc=:red)
plot!([0, 4], [0, 2], seriestype=:straightline, lc=:blue, linestyle=:dash)
plot!([0, 4], [0, -2], seriestype=:straightline, lc=:blue, linestyle=:dash)
This case looks allright, but note that everything's spread out much much more (wider ranges for IV estimates and also for standard errors)
Now the worst case: $\rho = 0.5$ and $F=1.82$
using Plots
using Plots.PlotMeasures: mm
Plots.theme(:wong2)
plot(dgps[3,1].biv_dst, dgps[3,1].siv_dst,
size=(800,600),
seriestype=:scatter,
legend=false,
title = "IV Estimates vs Their Standard errors \\n (with \\rho = 0.5 and popF = 1.82)")
xlabel!("IV estimate")
ylabel!("Standard error")
plot!(dgps[3,1].biv_dst[abs.(dgps[3,1].tiv_dst).>1.96], dgps[3,1].siv_dst[abs.(dgps[3,1].tiv_dst).>1.96], seriestype=:scatter, mc=:red)
plot!([0, 4], [0, 2], seriestype=:straightline, lc=:blue, linestyle=:dot)
plot!([0, 4], [0, -2], seriestype=:straightline, lc=:blue, linestyle=:dot)
using Plots
using Plots.PlotMeasures: mm
Plots.theme(:wong2)
plot(dgps[3,1].biv_dst, dgps[3,1].siv_dst,
size=(800,600),
xlims = (-4,4),
ylims = (0, min(4, quantile(dgps[3,1].siv_dst, 0.99))),
seriestype=:scatter,
legend=false,
title = "IV Estimates vs Their Standard errors \\n (with \\rho = 0.5 and popF = 1.82, outliers removed)")
xlabel!("IV estimate")
ylabel!("Standard error")
plot!(dgps[3,1].biv_dst[abs.(dgps[3,1].tiv_dst).>1.96], dgps[3,1].siv_dst[abs.(dgps[3,1].tiv_dst).>1.96], seriestype=:scatter, mc=:red)
plot!([0, 4], [0, 2], seriestype=:straightline, lc=:blue, linestyle=:dash)
plot!([0, 4], [0, -2], seriestype=:straightline, lc=:blue, linestyle=:dash)
vline!([0.5, 0.5], lw=:3, lc=:red, linestyle=:dot)
The above is how the scatter plot looks like when the outliers are removed
Is this a good plot?
No! There's now an evident negative association between the IV estimates and their standard errors!
The correlation in this picture is about $-0.6$
Notice: the red dotted vertical line is the OLS bias (equal to $\rho=0.5$)
Again, red dots are IV estimates that lead to rejection of $H_0: \beta=0$
The dots are red only for positive IV estimates!
Keane & Neal refer to this as power asymmetry
Roughly speaking: small standard errors are associated with larger IV estimates (i.e., IV estimates that are close to the OLS bias)
This means that these values are more likely to lead to rejection of $H_0: \beta=0$
Example: an IV estimate of $+0.5$ (equal to the degree of endogeneity) is more likely to have a low standard error than an IV estimate of $-0.5$
Conversely: negative estimates are less likely to lead to rejection
In contrast, the case of OLS under zero endogeneity illustrates that, ideally, there should be symmetry
Keane & Neal explain the source for this asymmetry
Let's take a look
Recall the DGP (everything is scalar)
$$ \begin{align*} Y_i &= \beta X_i + u_i\\ X_i &= \pi Z_i + v_i\\ \end{align*} $$The IV estimator is
$$ \hat{\beta}_{\text{IV}} = \frac{s_{ZY}}{s_{ZX}} $$(sample covariance between Z and Y divided by sample covariance between Z and X)
Alternatively $$
\frac{s{Zu}}{s{ZX}} $$
Now, Keane & Neal argue like this:
In our DGP, $\sigma_{ZX} \geq 0$ (because $\pi >0$) which translates to $s_{ZX}>0$ in the vast majority of DGPs. Therefore, let's simply assume $s_{ZX}>0$.
Furthermore, $\rho > 0$ in our examples
Now, for which samples do we obtain $\hat{\beta}_{\text{IV}} \geq \beta$?
It must be the case that $s_{Zu} \geq 0$
At the same time, positive values for $s_{Zu}$ drive down the standard error of the IV estimate
To see this, relate the the two errors via projection: $v = \rho u + \eta$ and obtain $$
\pi sZ^2 + \rho s{Zu} + s_{Z \eta} \approx \pi sZ^2 + \rho s{Zu} $$
(we're chopping off $s_{Z \eta}$ to keep things simple)
From Keane & Neal
$$ \text{se}(\hat{\beta}_{\text{IV}}) = \frac{s_u^2}{\sqrt{ESS_{X,Z}}} $$where $ESS_{X,Z} := N \cdot s_{ZX}^2/s_Z^2 \approx N \cdot \left( \pi^2 s_Z^2 + \rho s_{Zu}/s_Z^2 \right)$
(they write TSS, but it's actually the ESS)
Therefore, the standard error is decreasing in $s_{Zu}$
In conslusion, two things happen simultaneously when $s_{Zu}$ is large:
the IV estimator is biased in the direction of OLS
its standard error is spuriously small
This means that the underlying t-test rejects more often
Here the scatter plots for some other parameter combinations:
using Plots
using Plots.PlotMeasures: mm
Plots.theme(:wong2)
plt = plot(layout=(length(parms_rho),length(parms_F)), size = (1800, 800),
plot_title = "Plot of IV Estimator vs its Standard Error for different DGPs")
for (i, rho) in enumerate(parms_rho)
for (j, F) in enumerate(parms_F)
k = length(parms_F)*(i-1)+j # subplot counter
plot!(plt,
dgps[i,j].biv_dst, dgps[i,j].siv_dst,
seriestype=:scatter,
subplot = k,
ylims = (0, min(4, quantile(dgps[i,j].siv_dst, 0.99))),
legend=false,
title = "\\rho = $rho and popF = $F")
plot!(dgps[i,j].biv_dst[abs.(dgps[i,j].tiv_dst).>1.96], dgps[i,j].siv_dst[abs.(dgps[i,j].tiv_dst).>1.96], seriestype=:scatter, subplot = k, mc=:red)
plot!([0, 4], [0, 2], seriestype=:straightline, subplot = k, lc=:blue, linestyle=:dash)
plot!([0, 4], [0, -2], seriestype=:straightline, subplot = k, lc=:blue, linestyle=:dash)
vline!([rho, rho], subplot = k, lw=:3, lc=:red, linestyle=:dot)
xlims!(-4,4)
end
end
display(plt)
The above discussion suggests that we have low standard errors for some samples
Superficially, low standard errors sound like a good thing!
Doesn't it mean that our estimates are precise?
Keane & Neal argue that they are spuriously precise
Put differently: this spurious precision comes at the price of spurious imprecision for other samples
Let's look at power functions
Recall: the power is the probability the reject the null when it is false
Example: We keep conducting the same hypothesis test $H_0: \beta=0$
Last week, when we looked at the statistical size, we studied the probability of rejecting the null when it is correct
Suppose the data are generated with $\beta=0.3$ (instead of $\beta=0$)
We still conduct the test $H_0: \beta=0$
We would like to reject that hypothesis
We would like the probability of this event to be maximal
Luckily, we can simulate this too!
Our view point changes somewhat: Now we create 10,000 samples from DGPs in which the structural coefficient $\beta$ can range between $-1$ and $1$
To be precise:
fix $\rho$ and $F$ at an interesting value
fix $\beta=-1$
generate 10,000 samples, obtain 10,000 estimates, standard errors, and t-stats
Study them
fix $\beta=-0.9$
generate another 10,000 samples
and so on
function power_function(; brange=-1.00:0.10:1.00, F, rho)
"""
Calculates statistical power (probability to reject null hypothesis H0: truebeta = 0)
when the underlying true beta ranges in values determined by brange.
### Input
- `brange` -- range of values for true beta used in DGP creation
- `F` -- first stage population F-stat
- `rho` -- degree of endogeneity
### Output
- `brange` -- range of values for true beta used in DGP
- `power_t` -- power function for t-test
- `power_ar`-- power function for AR-test
"""
power_tols = similar(brange)
power_tiv = similar(brange)
power_ar = similar(brange)
for (i, b) in enumerate(brange)
simdst = simulate_distribution(b=b, F=F, rho=rho)
power_tols[i] = mean(abs.(simdst.tols_dst) .> 1.96)
power_tiv[i] = mean(abs.(simdst.tiv_dst) .> 1.96)
power_ar[i] = mean(abs.(simdst.ar_dst) .> 1.96)
end
return brange, power_tols, power_tiv, power_ar
end
power_function (generic function with 1 method)
using Plots
using Plots.PlotMeasures: mm
Plots.theme(:wong2)
brange, pow_ols, pow_iv, pow_ar = power_function(brange=-1:0.010:1,F=73.75, rho=0)
plot(brange, [pow_ols, pow_iv],
size=(800,600),
xticks=-1:0.1:1,
label=["OLS" "IV"],
linewidth=3,
linestyle=[:solid :dash],
linecolor=:black,
legend=:bottomright,
title="Empirical Power Curves Compared: \\rho=0 and popF=73.75")
hline!([0.05, 0.05], linestyle=:dash, label=false)
ylims!(0,1)
xlabel!("True \\beta")
What does this picture show?
Let's focus on the solid line (power curve for OLS)
It shows the probability to reject $H_0: \beta=0$ for various true values of $\beta$
For example, look at $\beta=0.1$
The OLS power at that value is about 90%
What does this mean? It means:
If the true structural coefficient that generated the data is $\beta=0.10$, then the probability to reject $H_0: \beta=0$ is 90%
It makes sense that the farther out your true $\beta$ is, the higher the probability to reject
Also, when true $\beta=0$, then the power coincides with the size (which here is close to 5%)
Now, look at the dashed line (power curve for IV)
That power curve lies below the OLS curve
For example, the probability to reject $H_0:\beta=0$ when $\beta=0.10$ is only about 10% (much lower than OLS)!
This demonstrates that you should always use OLS when there's no endogeneity!
Are our estimators good at detecting large effects?
Keane & Neal stipulate that a value for true $\beta$ of 0.20 is quite large (``This is a large effect in typical empirical applications'')
What makes them say this?
Well, in our DGP, if X increases by one standard deviation, then $Y$ increases by $\beta$ standard deviations (ceteris paribus)
This is indeed large considering average effect sizes in the economics literature
Let's presume that the true model has such a large effect of X on Y (that is, $\beta=0.20$)
How good are OLS and IV at distinguishing this from zero?
OLS power is 100% meaning you're very likely to detect such a large effect as real
IV power is around 40% meaning you're much less likely to conclude that a meaningful effect is present
Now back to our power curves
How do the power curves look like when there is endogeneity?
using Plots
using Plots.PlotMeasures: mm
Plots.theme(:wong2)
plt = plot(layout=(length(parms_rho),length(parms_F)), size = (1800, 800),
plot_title = "Empirical Power Curves: OLS (solid) vs IV (dashed) (empirical size based on IV)")
for (i, rho) in enumerate(parms_rho)
for (j, F) in enumerate(parms_F)
k = length(parms_F)*(i-1)+j # subplot counter
brange, pow_ols, pow_iv, pow_ar = power_function(F=F, rho=rho)
size = round(100 * pow_iv[brange.==0][], digits=3)
plot!(plt,
brange, [pow_ols, pow_iv],
label=["OLS" "IV"],
linestyle=[:solid :dash],
linecolor=:black,
legend=false,
subplot=k,
margin=5mm,
title = "\\rho = $rho and popF = $F \\n Empirical size = $size %")
hline!([0.05, 0.05], linestyle=:dot, subplot=k, legend=false)
ylims!(0,1)
xlabel!("True \\beta")
end
end
display(plt)
The power curves above only look acceptable for the larges population F value
Let's focus on the IV based t-test (dashed line):
Even population F values that are consistent with the rule of thumb (popF=2.30 or popF=10.00) produce low power
This is true even when the degree of endogeneity is low!
The empirical sizes can be poor under weak IV
The OLS based t-test is worse:
It looks like it has high power, but the whole curve is shifted left (this is a product of OLS bias)
Consider the case $\rho=0.3$ and $F=10$
The power to reject $H_0: \beta=0$ for any true value of $\beta>0$ is 100%
While this sounds good, it comes at the cost of having very low power when true $\beta$ is negative
The power to reject $H_0: \beta=0$ for any true value of $\beta$ in the interval $(-0.5,0)$ can be very low!
Keane & Neal suggest a simply remedy to the power asymmetry problem
Simply run an OLS regression of $Y$ on $Z$ and obtain the t-statistic from this exercise
When we do this, we obtain the following picture:
using Plots
using Plots.PlotMeasures: mm
Plots.theme(:wong2)
plt = plot(layout=(length(parms_rho),length(parms_F)), size = (1800, 800),
plot_title = "Empirical Power Curves for: AR (solid) vs IV (dashed) (empirical size based on AR)")
for (i, rho) in enumerate(parms_rho)
for (j, F) in enumerate(parms_F)
k = length(parms_F)*(i-1)+j # subplot counter
brange, pow_ols, pow_iv, pow_ar = power_function(F=F, rho=rho)
size = round(100 * pow_ar[brange.==0][], digits=3)
plot!(plt,
brange, [pow_ar, pow_iv],
label=["AR" "IV"],
linestyle=[:solid :dash],
linecolor=:black,
legend=false,
subplot=k,
margin=5mm,
title = "\\rho = $rho and popF = $F \\n Empirical size = $size %")
hline!([0.05, 0.05], linestyle=:dot, subplot=k, legend=false)
ylims!(0,1)
xlabel!("True \\beta")
end
end
display(plt)
So what have we learned in the last two lectures?
IV and 2SLS estimation can be very unreliable when instruments are weak
Even at low degrees of endogeneity, the power to detect a large effect is low when population F (and thus sample F) are small
The rule of thumb that sample F must exceed 10 is NOT sufficient (just look at the above picture, recall sample F is 10 when popF=2.30)!
Keane & Neal's recommendation:
do NOT use regular t-test whatsoever!
use AR test instead
sample F should be much larger than 10!
How much larger? Have a look at the above picture, even popF of 29.44 doesn't seem enough!