Assignment 1¶
Exercise 1¶
Frankel and Rose (for short: FR), in their 2005 paper ‘’Is Trade Good or Bad for the Environment? Sorting Out the Causality.‘’ which is published in The Review of Economics and Statistics (volume 87(1), pages 85-91) empirically address the question:
Is globalization good or bad for the environment?
In particular, they examine whether countries which are more open to international trade incur more (or less) environmental damage as result, controlling for international variations in real growth rates and in political institutions. FR quantify environmental damage on seven dimensions: \(\text{SO}_2\) air concentrations, \(\text{NO}_2\) air concentrations, particulate matter air concentrations, \(\text{CO}_2\) air concentrations, deforestation, energy resources depletion, and rural clean water access. Their analytic focus is primarily on the \(\text{SO}_2\), \(\text{NO}_2\), and particulate matter air pollution impacts, however.
Overall, FR find that greater trade openness, quantified as
is actually associated with better environmental outcomes. This result might seem surprising, in view of the:
''...race-to-the-bottom hypothesis, which says that open countries in general adopt looser
standards of environmental regulation, out of fear of a loss in international competitiveness.
Alternatively, poor open countries may act as pollution havens, adopting lax environmental
standards to attract multinational corporations and export pollution-intensive goods.
Less widely recognized is the possibility of an effect in the opposite direction, which we call
the gains-from-trade hypothesis. If trade raises income, it allows countries to attain more of
what they want, which includes environmental as well as more conventional output. Openness could
have a positive effect on environmental quality (even for a given level of GDP per capita) for a
number of reasons. First, trade can spur managerial and technological innovation, which can have
positive effects on both the economy and the environment. Second, multinational corporations
tend to bring clean state-of-the-art production techniques from high-standard source countries
of origin to host countries. Third is the international ratcheting up of environmental standards
through heightened public awareness. Whereas some environmental gains may tend to occur with any
increase in income, whether taking place in an open economy or not, others may be more likely
when associated with international trade and investment. Whether the race-to-the-bottom effect
in practice dominates the gains-from-trade effect is an empirical question.''
(modified quotation from FR 2005)
In this exercise you will replicate FR’s empirical results for the \(\text{SO}_2\) measure of air pollution and diagnostically check their regression model, so as to assess the credibility of their statistical inference results.
Two features of this exercise should be noted at the outset. First, it is worth noting that the FR model is estimated using only 41 sample observations. This is an unusually small sample size for a piece of research in applied econometrics that is published in such a high-quality journal. As we have learned in class, sample estimators will have an approximate normal distribution (justified by the central limit theorem) – the larger the sample size, the better the approximation. For the purpose of this assignment, we will not worry further about the small sample size here. We keep it in the back of our minds but are otherwise happy to use and apply our standard econometric toolkit.
Second, we conduct our analysis here under the assumption that all key explanatory variables are exogenous. In particular, real per capita income and trade openness are considered to be determined outside of the model. (In Exercise 2 below we will deviate from that assumption.)
Use the Stata file
. This file contains data on 41 countries, collecting (among others) the following variables:Variable Description sulfdm
mean 1990 \(\text{SO}_2\) (sulfur dioxide) concentration
(in micrograms per cubic meter).
logarithm of real per capita GDP
(from the Penn World Tables 5.6; in 1990 dollars, PPP adjusted)
squared value of the logarithm of real per capita GDP. pwtopen
\(100 \cdot\) (Imports + Exports)/GDP from the Penn World Tables 5.6. polity
index of democratic (+10) versus autocratic (-10) institutions. lareapc
logarithm of land area per capita. oecd
dummy variable which equals 1 if country is an OECD member country country
country name Produce some basic descriptive statistics of the data. Use basic graphical tools to study the relationship between trade openness and environmental outcomes. Are these descriptive findings indicative of a positive or a negative association?
Estimate a basic regression model with
as the outcome variable using the regressorsinc
. Interpret your coefficient estimates. Do they all have the expected signs? Comment on the statistical significance of all coefficients.What do the results say about the impact of trade openness on \(\text{SO}_2\) concentrations – i.e., on the relative importance of the ‘’race-to-the- bottom’’ versus ‘’gains-from-trade’’ hypotheses alluded to earlier?
Explain (in words, not maths) what the \(R^2\) of a regression measures. How does the adjusted \(R^2\), denoted \(\bar{R}^2\), differ from this?
Using the adjusted \(R^2\) statistic, what is the fraction of the sample variation in
which is explained by these five explanatory variables? By how much does this fraction decrease once the openness variable is dropped from the model? (Note: To have Stata report the value of adjusted \(R^2\), use the command ereturn list after the regress command: adjusted \(R^2\) will be listed ase(r2_a)
.)Examine a histogram of the fitted errors for the estimated model. (Have Stata separate the fitted errors into ten categories – the
option in Stata – but also feel free to experiment with other choices.) Does this histogram suggest that distribution of the model errors is substantially non-normal?Produce the scatter plot of
against the crucial independent variablepwtopen
. Can you spot two outliers? Which countries do they correspond to?Do you think that the apparent negative relationship between
might be largely driven by these two observations? To formally check this, create two dummy variables for these two countries and add them to the regression. Does this change your findings for the significance of the openness variable?Now, make four histograms: for
as well as for their logarithms. Then make a scatter plot oflogsulfdm
. What re-specification of the FR model do these results suggest? Is there any longer a need to worry about the two outliers from before?Estimate the re-specified model, using both the logs of
. (Recall from your study of the log-log model in EMET2007 that the coefficient onlogpwtopen
can be interpreted as the elasticity ofsulfdm
with respect topwtopen
.) What do you conclude about the openness effect?Test whether the key coefficient in the model is different for OECD countries. Use the log-log model.
(Note: This exercise is from the book ‘’Fundamentals of Applied Econometrics’’ by Richard Ashley.)
Exercise 2¶
In Exercise 1 you used OLS to study the relationship between trade openness and sulfur dioxide levels (as a proxy for environmental outcomes). That analysis was done under the assumption that all explanatory variables are exogenous. The actual contribution of the paper by FR is to look deeper and examine the causal relationship between trade openness and environmental quality while both controlling for income and appropriately dealing with the likely endogeneity of both income and trade openness. To that end they used several instrumental variables to deal with these two explanatory variables. You will replicate some of these results in the current exercise.
Use the Stata file Frankel_Rose.dta. This file contains data on 41 countries, collecting (in addition to the variables mentioned in the previous exercise) the following instrumental variables:
Instrument IV for Description trade_potential
Trade potential of a country. This variable combines
information on a country’s geographical location (number of
neighbor countries, access to sea, landlock status),
population size, land area and language to construct a
measure of potential trade. For example, all else equal, a
country with access to the sea will have a higher trade
potential than a country that is landlocked. This IV is
notably correlated with the endogenous regressor
while plausibly uncorrelated with environmental outcomes.
Exogenous income of a country. While per capita income
is likely endogenous, it contains some exogenouscomponents. FR combine information on a country’s lagged
income as well as school attainment to construct the
exogenous component of income. For example, all else equal,
a country with higher average school attainment will have
higher per capita income than a country with lower average
school attainment. This IV is notably correlated with the
endogenous regressor
while plausibly uncorrelatedwith environmental outcomes.
Since their model specification also includes the square of
the logarithm of real per capita GDP (
), FR alsodefine
as the square ofinc_exog
and usethis as an instrument for
.Re-estimate the basic model from Exercise 1) part b) using instrumental variables estimation instead. Use all three instruments.
Interpret your coefficient estimates. Do they all have the expected signs? Comment on the statistical significance of all coefficients.
What do the results say about the impact of trade openness on \(\text{SO}_2\) concentrations – i.e., on the relative importance of the ‘’race-to-the- bottom’’ versus ‘’gains-from-trade’’ hypotheses alluded to earlier?
Formally test whether your instruments are weak (use the ‘rule of thumb’ explained in the lecture).
Can you test whether the instruments are exogenous?
Using the insights gained from Exercise 1, re-estimate the model, replacing
by their logarithms,logsulfdm
. Produce a histogram oftrade_potential
and of its logarithm,logtrade_potential
. Which one appears to be closer to a normal distribution? Which one should you use for 2SLS estimation? Provide the estimation results.In conclusion to Exercises 1 and 2, what is your answer to the question Is globalization good or bad for the environment? What are the strengths and weaknesses of the econometric analysis conducted here? Do you see any possible extensions that could help improve your research?
(Note: This exercise is from the book ‘’Fundamentals of Applied Econometrics’’ by Richard Ashley.)
Exercise 3¶
In the research paper ‘’Does Size Matter in Australia’‘, published in The Economic Record (Vol. 86, No. 272, March 2010, pp.71-83), Michael Kortt and Andrew Leigh address the research question:
Do taller and slimmer workers earn more?
To that effect, they consider the following linear model:
(This equation is my version of equation (1) on page 73 of their paper.) Here, \(W_i\) is the log hourly wage of person \(i\), \(\text{Height}_i\) represents a person’s height and \(\text{BMI}_i\) stands for a person’s body mass index. The remaining regressors, \(X_{i3}, \ldots, X_{ik}\) capture a person’s demographic characteristics, including gender, age (linear and quadratic) and education.
Obtain a copy of the paper (available online for ANU students and faculty) and answer the following questions.
- Kortt and Leigh begin the analysis by estimating all coefficients by OLS. Summarize their OLS results regarding the two main coefficients of interest, \(\beta_1\) and \(\beta_2\) (for height and BMI).
- Would you interpret these estimates as causal? What are the main endogeneity problems in this regression?
- Explain how Kortt and Leigh attempt to address the endogeneity problem using instrumental variables. How do their findings change?
- What is the main conclusion of the paper? Do taller and slimmer workers in Australia earn more? What is the evidence from other countries?
Assignment 2¶
Exercise 1¶
Use the Stata file PNTSPRD
, which contains information from the
Las Vegas sport betting market. The variables included in the data set are reasonably
self-explanatory. The overarching research question is whether the favorite team is more likely to
win the game.
Consider the linear probability model
where spread
is a proxy for the favorite team. A high point spread means that a team is the
favorite. Here a quick primer on point spread betting from Wikipedia:
The general purpose of spread betting is to create an active market for both sides of a binary
wager []. If the wager is simply "Will the favorite win?", more bets are likely to be made for
the favorite, possibly to such an extent that there would be very few betters willing to take
the underdog.
The point spread is essentially a handicap towards the underdog. The wager becomes "Will the
favorite win by more than the point spread?" The point spread can be moved to any level to
create an equal number of participants on each side of the wager. This allows a bookmaker to act
as a market maker by accepting wagers on both sides of the spread. The bookmaker charges a
commission, or vigorish, and acts as the counterparty for each participant. As long as the total
amount wagered on each side is roughly equal, the bookmaker is unconcerned with the actual
outcome; profits instead come from the commissions.
(excerpt up-to-date as of 7 September, 2015)
To further reduce your confusion, here is an example:
Consider two teams, A and B, playing each other. Before the game, the advertised point spread may
* A -4
* B +4
This should be interpreted as follows: Team A is favorite to win. If you bet for team A, you
only win your bet if team A beats team B by more than 4 points. If team A beats team B by
a margin of 4 then there is a tie. If team A beats team B by a lower margin than 4 then you
lose your bet. If team A gets beaten by team B, you also lose.
Conversely, if you bet for team B, you win your bet if team B beats team A outright or if team
B gets beaten by team A but by no more than 3 points.
- Explain why, if the spread incorporates all relevant information, we expect \(\beta_0=0.5\)?
- Estimate the linear probability model. Test the hypothesis \(\beta_0=0.5\) against a two-sided alternative. (Make all estimations robust to heteroskedasticity throughout this entire exercise.)
- Is spread statistically significant? What is the estimated probability that the favored team wins when \(spread=10\)?
- Now estimate the model by probit. Interpret and test the hypothesis that the intercept is equal to 0? (Why do you need to test that the intercept be equal to 0 here?)
- Use the probit model to estimate the probability that the favored team wins when \(spread=10\). Compare this with the linear probability model.
- Add the variables
, andund25
to the probit model and test joint significance of these variables. - Redo parts (d), (e), and (f) using the logit model.
- Which sport is this exercise about?
(Note: This exercise is from the book ‘’Introductory Econometrics: A Modern Approach’’ by Jeffrey Wooldridge)
Exercise 2¶
Krueger and Maleckova, in their paper ‘’Education, Poverty and Terrorism: Is There a Causal Connection?’‘, published in the Journal of Economic Perspectives (2003), attempt to estimate the causal effect of education and poverty on terrorism.
- What is the main research question of the paper?
- What econometric method do they use to estimate causal effects?
- What is the main outcome variable?
- What are the main explanatory variables?
- What other explanatory variables do they include?
- What is their main finding?
- What problems/shortcomings do you see in their research?