Lecture 8: Deeper Look at Weak Instruments: Keane and Neal (2024)¶

Summary of Stock & Yogo (2005), (Lecture 7)¶

Last week we learned about Stock & Yogo's idea to define weak instruments.

Brief summary:

Instrument strenght is determined by first stage F (regression of X on Z)

A larger F implies, ceteris paribus, a stronger instrument

But when exactly is F large enough for an IV to be strong?

Idea: When the empirical size of the underlying t-test isn't too bad

In a textbook setting, a well behaved t-test rejects a true null 5% of the time

In a weak IV setting, this percentage might be higher

Let's say we don't want that percentage to be more then 15%

We saw that this roughly translates to the rule of thumb that first stage sample F must be at least 10

If first stage sample F is larger than 10, then we are confident that the empirical size of the t-test is lower than 15%

Keane and Neal (2024)¶

Keane and Neal point out that the story gets more complicated if one also pays attention to the power function

While it's great to have a test with good size properties, we would also like the power of the test to be high

Recall: the power is defined as the probability to reject the null when it is false

Let's see how we can study ower in our toy DGP

Practical Lessons¶

We learned a lot about true (population) F, sample F, and also the degree of endogeneity.

Keane & Neal offer a table that translates between population and sample F:

population F sample F
1.82 8.96
2.30 10.00
5.78 16.38
10.00 23.10
29.44 50.00
73.75 104.70

(Table 1 of Keane & Neal 2024)

At the same time, we learned that values for $\rho$ that are of practical relevance fall somewhere between zero and 0.50. Therefore, for our simulations we will focus on

values for $\rho$ that are of practicla relevance
0.00 (no endogeneity)
0.10
0.30
0.50

Data Generating Process (DGP)¶

We will restrict ourselves to the toy model of Keane & Neal (2024).

Their DGP is on page 193, summarized here:

$$ \begin{align*} Y_i &= \beta X_i + u_i\\ X_i &= \pi Z_i + v_i\\ v_i &= \rho u_i + \sqrt{1-\rho^2} \eta_i \end{align*} $$

where

  • $u_i \sim N(0,1)$

  • $\eta_i \sim N(0,1)$

  • $Z_i \sim N(0,1)$

  • $\beta = 0$

  • notice: $\text{Var}(v_i) = 1$

To be able to create data sets off the above DGP we need to know values for $\pi$ and $\rho$.

Determination of $\pi$ will follow a roundabout way.

Recall the definition of F from lecture 6: F is the proportion of the variance of $X$ that is explained by $Z$ divided by the proportion of the variance of $X$ that is explained by $v$:

$$ F = N \cdot \frac{R^2}{1-R^2} = N \cdot \frac{\text{ESS}}{\text{RSS}} = N \cdot \frac{\text{Var}(Z\pi)}{\sigma_v^2} = N \cdot \frac{\pi^2 \text{Var}(Z)}{\sigma_v^2} = N \cdot \pi^2 $$

This allows us to set $\pi$ in the above DGP by changing the value of $F$, because $\pi = \sqrt{F/N}$.

  • Bias of OLS $\approx \rho$ (provided $\pi$ is small)

Julia Functions¶

I'm dumping a bunch of functions that will be needed for the computer simulations.

In [1]:
using Distributions, Random 

function dgp_keane_neal(; b=0, n=1000, F, rho)

    """
    Generates one sample of size n following the DGP of Keane & Neal (2024) page 193

    ### Input

    - `b`   -- structural coefficient beta (scalar)
    - `n`   -- sample size
    - `F`   -- reduced form F-stat 
    - `rho` -- degree of endogeneity

    ### Output

    - `x`   -- (n by 1) vector representing endogenous regressor
    - `y`   -- (n by 1) vector representing outcome variable
    - `z`   -- (n by 1) vector representing instrumental variable
    """

    p = sqrt(F/n)
    u = rand(Normal(0, 1), n)
    eta = rand(Normal(0, 1), n)
    z = rand(Normal(0, 1), n)
    v = rho*u + sqrt(1-rho^2)*eta
    x = p*z .+ v
    y = b*x .+ u

    return (; x, y, z)

end
dgp_keane_neal (generic function with 1 method)
In [2]:
function ols_estimator(x, y)

    """
    Implements OLS estimation of linear model with one exogenous regressor.

    ### Input

    - `x`   -- (n by 1) vector representing regressor
    - `y`   -- (n by 1) vector representing outcome variable

    ### Output

    - `bhat` --OLS estimate of beta
    - `se`   --standard error of OLS estimate
    - `t`    --t-statistic of OLS estimator
    """

        # OLS estimator
        bhat= x\y

        # standard error
        uhat = y-x*bhat
        s = uhat'uhat/length(y)
        se= sqrt(s/(x'x))

        # t-statistic (absolute value)
        t = bhat/se

        return (; bhat, se, t) # returning named tuple

end
ols_estimator (generic function with 1 method)
In [3]:
function iv_estimator(x, y, z)

    """
    Implements IV estimation of linear model with one endogenous variable, and one instrument.

    ### Input

    - `x`   -- (n by 1) vector representing endogenous regressor
    - `y`   -- (n by 1) vector representing outcome variable
    - `z`   -- (n by 1) vector representing instrumental variable

    ### Output

    - `biv`  -- IV estimate of beta
    - `se`   -- standard error of IV estimate
    - `t`    -- t-statistic of IV estimator
 
    ### Notes

    For calculation of standard error, we're using the formula on page 190 of Keane & Neal (2024)
    """

    # IV estimator
    bhat= (z'y)/(x'z)

    # standard error (using formula in Keane & Neal (2024))
    n = length(y)
    pihat = z\x         # reduced form coefficient estimate
    TSS = n*pihat^2*var(z)
    uhat = y-x*bhat
    s = uhat'*uhat/n
    se = sqrt(s/TSS)

    # t-statistic (absolute value)
    t = bhat/se
    
    return (; bhat, se, t) # returning `named tuple`

end
iv_estimator (generic function with 1 method)
In [4]:
function simulate_distribution(; b=0, F, rho, rep=10000)

    """
    Creates finite sample distributions of 
    - IV estimator,
    - standard error of IV estimator
    - tstat of IV estimator

    How does it create finite sample distribution? It creates `rep` number of DGPs and each time
    calculates IV estimator, its standard error, and tstat.
    
    ### Input

    - `F`       -- reduced form F-stat 
    - `rho`     -- degree of endogeneity
    - `rep`     -- number of repititions/simulations run

    ### Output
    
    - `bols_dst`    -- (rep by 1) vector collecting rep simulations of OLS estimator
    - `sols_dst`    -- (rep by 1) vector collecting rep simulations of standard error of OLS estimator
    - `tols_dst`    -- (rep by 1) vector collecting rep simulations of tstat of OLS estimator
    - `biv_dst`     -- (rep by 1) vector collecting rep simulations of IV estimator
    - `siv_dst`     -- (rep by 1) vector collecting rep simulations of standard error of IV estimator
    - `tiv_dst`     -- (rep by 1) vector collecting rep simulations of tstat of IV estimator
    - `tar_dst`     -- (rep by 1) vector collecting rep simulations of AR-statistic
    """

    bols_dst = Array{Float64}(undef, rep)
    sols_dst = Array{Float64}(undef, rep)
    tols_dst = Array{Float64}(undef, rep)

    biv_dst = Array{Float64}(undef, rep)
    siv_dst = Array{Float64}(undef, rep)
    tiv_dst = Array{Float64}(undef, rep)

    ar_dst = Array{Float64}(undef, rep)
    
    for i = 1:rep

        x, y, z = dgp_keane_neal(b=b, F=F, rho=rho)

        # calculating simulated distribution for bols, seols, and tols
        bols_dst[i], sols_dst[i], tols_dst[i] = ols_estimator(x, y)

        # calculating simulated distribution for biv, seiv, and tiv
        biv_dst[i], siv_dst[i], tiv_dst[i] = iv_estimator(x, y, z)

        # calculating AR statistic
        # by regressing Y on Z
        ar_dst[i] = ols_estimator(z, y).t

    end
    
    return (; bols_dst, sols_dst, tols_dst, biv_dst, siv_dst, tiv_dst, ar_dst)

end
simulate_distribution (generic function with 1 method)

Creating DGPs¶

I'm creating two containers that store my data generating processess:

  • dgp_zero: contains 10,000 samples generated from a DGP in which $\rho=0$ and $F=73.75$

  • dgps: contains 10,000 samples each for 12 parameter combinations of $\rho$ and $F$

In [5]:
# no endogeneity DGP, with strong IV
dgp_zero = simulate_distribution(rho=0, F=73.75)

# all other DGPS
parms_rho = (0.10, 0.30, 0.50)
parms_F = (1.82, 2.30, 10, 29.44, 73.75)
dgps = [simulate_distribution(rho = rho, F = F) for rho in parms_rho, F in parms_F];

Standard Errors¶

Let's first start with simple histograms of IV standard errors under different parameter combinations

In [6]:
using Plots
using Plots.PlotMeasures: mm
Plots.theme(:wong2)

plt = plot(
    layout=(length(parms_rho),length(parms_F)), 
    size = (1800, 800), 
    plot_title = "Empirical Distriution of standard errors (truncated at 90th percentile) for Different DGPs")

[histogram!(plt, 
        dgps[i,j].siv_dst,
        normalize = true, 
        subplot = length(parms_F)*(i-1)+j,
        bins = range(0, quantile(dgps[i, j].siv_dst, 0.90), length=51),
        legend=false, 
        title = "\\rho = $rho and popF = $F")
        for (i, rho) in enumerate(parms_rho), (j, F) in enumerate(parms_F)]
display(plt)

How do we read the above picture?

Again, the top right picture offers the benchmark against which to compare the other histograms.

You can see in the weak IV case (bottom left four pictures), that the tails are quite long. In fact, they would look even longer if I hadn't restricted the histogram range to only reach the 90% percentile!

Plotting IV Estimates vs Their Standard Errors¶

What next? Keane & Neal had the nice idea to plot $\hat{\beta}_\text{IV}$ against its standard error. Let's do that!

But before we look at IV, let's look at the best possible scenario: OLS under $\rho-0$ (no endogeneity) with large variance for X

I'm using the container dgp_zero created earlier for this exercise

In [8]:
using Plots
using Plots.PlotMeasures: mm
Plots.theme(:wong2)

plot(dgp_zero.bols_dst, dgp_zero.sols_dst, 
    size=(800,600),
    seriestype=:scatter, 
    legend=false, 
    title = "OLS Estimates vs Their Standard errors \\n (with \\rho = 0 and popF = 73.75)")
xlabel!("OLS estimate")
ylabel!("Standard error")
plot!(dgp_zero.bols_dst[abs.(dgp_zero.tols_dst).>1.96], dgp_zero.sols_dst[abs.(dgp_zero.tols_dst).>1.96], seriestype=:scatter, mc=:red)
plot!([0, 4], [0, 2], seriestype=:straightline, lc=:blue, linestyle=:dash)
plot!([0, 4], [0, -2], seriestype=:straightline, lc=:blue, linestyle=:dash)

Interpretation of above picture:

These are 10,000 combinations of $\hat{\beta}_{\text{IV}}$ vs its standard error

In all DGPs the true value of $\beta$ was zero

The OLS estimates range (roughly) from about -0.10 to about -0.10

At first glance, these OLS estimates appear close to zero

Of course we have to look at their standard errors to decide on precision

Their standard errors range (roughly) from about 0.028 to 0.034

The red dots indicate OLS estimates for which the OLS t-test rejects $H_0: \beta=0$

These tend to be OLS estimates that are too far away from zero

The blue lines have (absolute) slopes of 1.96/2 and effectively demarcate insignificant dots from significant dots

Important: There's no clear association between OLS estimates and their standard errors!

If you correlate the two, you obtain a value near zero:

In [8]:
using Statistics
cor(dgp_zero.bols_dst, dgp_zero.sols_dst)
0.005275048579007424

Now let's do this for the IV estimator under different parameter combinations

We start with the best case: low endogeneity, and strong IV

In [10]:
using Plots
using Plots.PlotMeasures: mm
Plots.theme(:wong2)

plot(dgps[1,5].biv_dst, dgps[1,5].siv_dst, 
    size=(800,600),
    seriestype=:scatter, 
    legend=false, 
    title = "IV Estimates vs Their Standard errors \\n (with \\rho = 0.1 and popF = 73.75)")
xlabel!("IV estimate")
ylabel!("Standard error")
plot!(dgps[1,5].biv_dst[abs.(dgps[1,5].tiv_dst).>1.96], dgps[1,5].siv_dst[abs.(dgps[1,5].tiv_dst).>1.96], seriestype=:scatter, mc=:red)
plot!([0, 4], [0, 2], seriestype=:straightline, lc=:blue, linestyle=:dash)
plot!([0, 4], [0, -2], seriestype=:straightline, lc=:blue, linestyle=:dash)

This case looks allright, but note that everything's spread out much much more (wider ranges for IV estimates and also for standard errors)

Now the worst case: $\rho = 0.5$ and $F=1.82$

In [11]:
using Plots
using Plots.PlotMeasures: mm
Plots.theme(:wong2)

plot(dgps[3,1].biv_dst, dgps[3,1].siv_dst, 
    size=(800,600),
    seriestype=:scatter, 
    legend=false, 
    title = "IV Estimates vs Their Standard errors \\n (with \\rho = 0.5 and popF = 1.82)")
xlabel!("IV estimate")
ylabel!("Standard error")
plot!(dgps[3,1].biv_dst[abs.(dgps[3,1].tiv_dst).>1.96], dgps[3,1].siv_dst[abs.(dgps[3,1].tiv_dst).>1.96], seriestype=:scatter, mc=:red)
plot!([0, 4], [0, 2], seriestype=:straightline, lc=:blue, linestyle=:dot)
plot!([0, 4], [0, -2], seriestype=:straightline, lc=:blue, linestyle=:dot)
In [12]:
using Plots
using Plots.PlotMeasures: mm
Plots.theme(:wong2)

plot(dgps[3,1].biv_dst, dgps[3,1].siv_dst, 
    size=(800,600),
    xlims = (-4,4),
    ylims = (0, min(4, quantile(dgps[3,1].siv_dst, 0.99))),
    seriestype=:scatter, 
    legend=false, 
    title = "IV Estimates vs Their Standard errors \\n (with \\rho = 0.5 and popF = 1.82, outliers removed)")
xlabel!("IV estimate")
ylabel!("Standard error")
plot!(dgps[3,1].biv_dst[abs.(dgps[3,1].tiv_dst).>1.96], dgps[3,1].siv_dst[abs.(dgps[3,1].tiv_dst).>1.96], seriestype=:scatter, mc=:red)
plot!([0, 4], [0, 2], seriestype=:straightline, lc=:blue, linestyle=:dash)
plot!([0, 4], [0, -2], seriestype=:straightline, lc=:blue, linestyle=:dash)
vline!([0.5, 0.5], lw=:3, lc=:red, linestyle=:dot)

The above is how the scatter plot looks like when the outliers are removed

Is this a good plot?

No! There's now an evident negative association between the IV estimates and their standard errors!

The correlation in this picture is about $-0.6$

Notice: the red dotted vertical line is the OLS bias (equal to $\rho=0.5$)

Again, red dots are IV estimates that lead to rejection of $H_0: \beta=0$

The dots are red only for positive IV estimates!

Keane & Neal refer to this as power asymmetry

Roughly speaking: small standard errors are associated with larger IV estimates (i.e., IV estimates that are close to the OLS bias)

This means that these values are more likely to lead to rejection of $H_0: \beta=0$

Example: an IV estimate of $+0.5$ (equal to the degree of endogeneity) is more likely to have a low standard error than an IV estimate of $-0.5$

Conversely: negative estimates are less likely to lead to rejection

In contrast, the case of OLS under zero endogeneity illustrates that, ideally, there should be symmetry

Keane & Neal explain the source for this asymmetry

Let's take a look

Recall the DGP (everything is scalar)

$$ \begin{align*} Y_i &= \beta X_i + u_i\\ X_i &= \pi Z_i + v_i\\ \end{align*} $$

The IV estimator is

$$ \hat{\beta}_{\text{IV}} = \frac{s_{ZY}}{s_{ZX}} $$

(sample covariance between Z and Y divided by sample covariance between Z and X)

Alternatively $$

\hat{\beta}_{\text{IV}} - \beta¶

\frac{s{Zu}}{s{ZX}} $$

Now, Keane & Neal argue like this:

In our DGP, $\sigma_{ZX} \geq 0$ (because $\pi >0$) which translates to $s_{ZX}>0$ in the vast majority of DGPs. Therefore, let's simply assume $s_{ZX}>0$.

Furthermore, $\rho > 0$ in our examples

Now, for which samples do we obtain $\hat{\beta}_{\text{IV}} \geq \beta$?

It must be the case that $s_{Zu} \geq 0$

At the same time, positive values for $s_{Zu}$ drive down the standard error of the IV estimate

To see this, relate the the two errors via projection: $v = \rho u + \eta$ and obtain $$

s_{ZX}¶

\pi sZ^2 + \rho s{Zu} + s_{Z \eta} \approx \pi sZ^2 + \rho s{Zu} $$

(we're chopping off $s_{Z \eta}$ to keep things simple)

From Keane & Neal

$$ \text{se}(\hat{\beta}_{\text{IV}}) = \frac{s_u^2}{\sqrt{ESS_{X,Z}}} $$

where $ESS_{X,Z} := N \cdot s_{ZX}^2/s_Z^2 \approx N \cdot \left( \pi^2 s_Z^2 + \rho s_{Zu}/s_Z^2 \right)$

(they write TSS, but it's actually the ESS)

Therefore, the standard error is decreasing in $s_{Zu}$

In conslusion, two things happen simultaneously when $s_{Zu}$ is large:

  • the IV estimator is biased in the direction of OLS

  • its standard error is spuriously small

This means that the underlying t-test rejects more often

Here the scatter plots for some other parameter combinations:

In [13]:
using Plots
using Plots.PlotMeasures: mm
Plots.theme(:wong2)

plt = plot(layout=(length(parms_rho),length(parms_F)), size = (1800, 800),
        plot_title = "Plot of IV Estimator vs its Standard Error for different DGPs")

for (i, rho) in enumerate(parms_rho)
    for (j, F) in enumerate(parms_F)
        k = length(parms_F)*(i-1)+j # subplot counter
        plot!(plt, 
            dgps[i,j].biv_dst, dgps[i,j].siv_dst, 
            seriestype=:scatter, 
            subplot = k, 
            ylims = (0, min(4, quantile(dgps[i,j].siv_dst, 0.99))),
            legend=false, 
            title = "\\rho = $rho and popF = $F")
        plot!(dgps[i,j].biv_dst[abs.(dgps[i,j].tiv_dst).>1.96], dgps[i,j].siv_dst[abs.(dgps[i,j].tiv_dst).>1.96], seriestype=:scatter, subplot = k, mc=:red)
        plot!([0, 4], [0, 2], seriestype=:straightline, subplot = k, lc=:blue, linestyle=:dash)
        plot!([0, 4], [0, -2], seriestype=:straightline, subplot = k, lc=:blue, linestyle=:dash)
        vline!([rho, rho], subplot = k, lw=:3, lc=:red, linestyle=:dot)
        xlims!(-4,4)
    end
end

display(plt)

Power Functions¶

The above discussion suggests that we have low standard errors for some samples

Superficially, low standard errors sound like a good thing!

Doesn't it mean that our estimates are precise?

Keane & Neal argue that they are spuriously precise

Put differently: this spurious precision comes at the price of spurious imprecision for other samples

Let's look at power functions

Recall: the power is the probability the reject the null when it is false

Example: We keep conducting the same hypothesis test $H_0: \beta=0$

Last week, when we looked at the statistical size, we studied the probability of rejecting the null when it is correct

Suppose the data are generated with $\beta=0.3$ (instead of $\beta=0$)

We still conduct the test $H_0: \beta=0$

We would like to reject that hypothesis

We would like the probability of this event to be maximal

Luckily, we can simulate this too!

Our view point changes somewhat: Now we create 10,000 samples from DGPs in which the structural coefficient $\beta$ can range between $-1$ and $1$

To be precise:

  • fix $\rho$ and $F$ at an interesting value

  • fix $\beta=-1$

  • generate 10,000 samples, obtain 10,000 estimates, standard errors, and t-stats

  • Study them

  • fix $\beta=-0.9$

  • generate another 10,000 samples

  • and so on

In [14]:
function power_function(; brange=-1.00:0.10:1.00, F, rho)

    """
    Calculates statistical power (probability to reject null hypothesis H0: truebeta = 0)
    when the underlying true beta ranges in values determined by brange.
    
    ### Input

    - `brange`  -- range of values for true beta used in DGP creation
    - `F`       -- first stage population F-stat
    - `rho`     -- degree of endogeneity

    ### Output
    - `brange`  -- range of values for true beta used in DGP
    - `power_t` -- power function for t-test
    - `power_ar`-- power function for AR-test
    """

    power_tols = similar(brange)
    power_tiv = similar(brange)
    power_ar = similar(brange)

    for (i, b) in enumerate(brange)
        simdst = simulate_distribution(b=b, F=F, rho=rho)
        power_tols[i] = mean(abs.(simdst.tols_dst) .> 1.96)
        power_tiv[i] = mean(abs.(simdst.tiv_dst) .> 1.96)
        power_ar[i] = mean(abs.(simdst.ar_dst) .> 1.96)
    end

    return brange, power_tols, power_tiv, power_ar

end
power_function (generic function with 1 method)
In [15]:
using Plots
using Plots.PlotMeasures: mm
Plots.theme(:wong2)

brange, pow_ols, pow_iv, pow_ar = power_function(brange=-1:0.010:1,F=73.75, rho=0)

plot(brange, [pow_ols, pow_iv], 
    size=(800,600),
    xticks=-1:0.1:1,
    label=["OLS" "IV"],
    linewidth=3,
    linestyle=[:solid :dash],
    linecolor=:black,
    legend=:bottomright,
    title="Empirical Power Curves Compared: \\rho=0 and popF=73.75")
hline!([0.05, 0.05], linestyle=:dash, label=false)
ylims!(0,1)
xlabel!("True \\beta")

What does this picture show?

Let's focus on the solid line (power curve for OLS)

It shows the probability to reject $H_0: \beta=0$ for various true values of $\beta$

For example, look at $\beta=0.1$

The OLS power at that value is about 90%

What does this mean? It means:

If the true structural coefficient that generated the data is $\beta=0.10$, then the probability to reject $H_0: \beta=0$ is 90%

It makes sense that the farther out your true $\beta$ is, the higher the probability to reject

Also, when true $\beta=0$, then the power coincides with the size (which here is close to 5%)

Now, look at the dashed line (power curve for IV)

That power curve lies below the OLS curve

For example, the probability to reject $H_0:\beta=0$ when $\beta=0.10$ is only about 10% (much lower than OLS)!

This demonstrates that you should always use OLS when there's no endogeneity!

Digression: Effect Size¶

Are our estimators good at detecting large effects?

Keane & Neal stipulate that a value for true $\beta$ of 0.20 is quite large (``This is a large effect in typical empirical applications'')

What makes them say this?

Well, in our DGP, if X increases by one standard deviation, then $Y$ increases by $\beta$ standard deviations (ceteris paribus)

This is indeed large considering average effect sizes in the economics literature

Let's presume that the true model has such a large effect of X on Y (that is, $\beta=0.20$)

How good are OLS and IV at distinguishing this from zero?

  • OLS power is 100% meaning you're very likely to detect such a large effect as real

  • IV power is around 40% meaning you're much less likely to conclude that a meaningful effect is present

Now back to our power curves

How do the power curves look like when there is endogeneity?

In [16]:
using Plots
using Plots.PlotMeasures: mm
Plots.theme(:wong2)


plt = plot(layout=(length(parms_rho),length(parms_F)), size = (1800, 800),
        plot_title = "Empirical Power Curves: OLS (solid) vs IV (dashed) (empirical size based on IV)")

for (i, rho) in enumerate(parms_rho)
    for (j, F) in enumerate(parms_F)
        k = length(parms_F)*(i-1)+j # subplot counter
        brange, pow_ols, pow_iv, pow_ar = power_function(F=F, rho=rho)
        size = round(100 * pow_iv[brange.==0][], digits=3)
        plot!(plt, 
            brange, [pow_ols, pow_iv], 
            label=["OLS" "IV"],
            linestyle=[:solid :dash],
            linecolor=:black,
            legend=false,
            subplot=k,
            margin=5mm,
            title = "\\rho = $rho and popF = $F \\n Empirical size = $size %")
        hline!([0.05, 0.05], linestyle=:dot, subplot=k, legend=false)
        ylims!(0,1)
        xlabel!("True \\beta")
    end
end

display(plt)

The power curves above only look acceptable for the larges population F value

Let's focus on the IV based t-test (dashed line):

  • Even population F values that are consistent with the rule of thumb (popF=2.30 or popF=10.00) produce low power

  • This is true even when the degree of endogeneity is low!

  • The empirical sizes can be poor under weak IV

The OLS based t-test is worse:

  • It looks like it has high power, but the whole curve is shifted left (this is a product of OLS bias)

  • Consider the case $\rho=0.3$ and $F=10$

  • The power to reject $H_0: \beta=0$ for any true value of $\beta>0$ is 100%

  • While this sounds good, it comes at the cost of having very low power when true $\beta$ is negative

  • The power to reject $H_0: \beta=0$ for any true value of $\beta$ in the interval $(-0.5,0)$ can be very low!

The AR Test¶

Keane & Neal suggest a simply remedy to the power asymmetry problem

Simply run an OLS regression of $Y$ on $Z$ and obtain the t-statistic from this exercise

When we do this, we obtain the following picture:

In [17]:
using Plots
using Plots.PlotMeasures: mm
Plots.theme(:wong2)


plt = plot(layout=(length(parms_rho),length(parms_F)), size = (1800, 800),
        plot_title = "Empirical Power Curves for: AR (solid) vs IV (dashed) (empirical size based on AR)")

for (i, rho) in enumerate(parms_rho)
    for (j, F) in enumerate(parms_F)
        k = length(parms_F)*(i-1)+j # subplot counter
        brange, pow_ols, pow_iv, pow_ar = power_function(F=F, rho=rho)
        size = round(100 * pow_ar[brange.==0][], digits=3)
        plot!(plt, 
            brange, [pow_ar, pow_iv], 
            label=["AR" "IV"],
            linestyle=[:solid :dash],
            linecolor=:black,
            legend=false,
            subplot=k,
            margin=5mm,
            title = "\\rho = $rho and popF = $F \\n Empirical size = $size %")
        hline!([0.05, 0.05], linestyle=:dot, subplot=k, legend=false)
        ylims!(0,1)
        xlabel!("True \\beta")
    end
end

display(plt)

Discussion¶

So what have we learned in the last two lectures?

IV and 2SLS estimation can be very unreliable when instruments are weak

Even at low degrees of endogeneity, the power to detect a large effect is low when population F (and thus sample F) are small

The rule of thumb that sample F must exceed 10 is NOT sufficient (just look at the above picture, recall sample F is 10 when popF=2.30)!

Keane & Neal's recommendation:

  • do NOT use regular t-test whatsoever!

  • use AR test instead

  • sample F should be much larger than 10!

  • How much larger? Have a look at the above picture, even popF of 29.44 doesn't seem enough!