Before you start, and while you're going, have a look at the website for important reminders on deadlines and academic honesty!
Your job consists of
coding in Python, and
writing explanations and offering econometric interpretations.
For the coding you will need to use code cells and for the explanations you will need to use text (or markdown) cells. In the notebook below, we have included empty code and text cells in which you can type your solutions. Feel free to put additional code and text cells wherever you may need them!
Note: You may only use Python and you may only use packages that were used during the EMET2007 computer labs.
In this assignment you will study the association between earnings and weight.
You will use the Earnings_and_Height
data from the U.S. This is the same data set that we used during the computer labs in weeks 4 and 5.
Weights are measured in pounds, the following little table will be helpful in translating to the metric system:
Weight in pounds | Weight in kilograms |
---|---|
140 | 63.5 |
163 | 73.9 |
190 | 86.2 |
Answer all of the following exercises! Good luck!
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.stats.api as sms
import statsmodels.formula.api as smf
# COLAB USERS: UNCOMMENT THE FOLLOWING LINES:
# from google.colab import drive
# drive.mount('/content/drive')
# df = pd.read_csv('drive/MyDrive/EMET2007/datasets/earnings_and_height.csv')
# ANACONDA USERS: UNCOMMENT THE FOLLOWING LINE:
# df = pd.read_csv('../datasets/earnings_and_height.csv')
The original variable weight
is reported in pounds. Add a new variable weightkg
which reports weight in kilograms to the data frame. [1 mark]
Present useful descriptive analysis for weightkg
and visualize the variable in familiar ways. What do you learn about the variable weightkg
? [2 marks]
Note: As already explained earlier, you may add as many code cells and text cells as you like or need.
Add a new (categorical) variable wabove
to your data frame. The definition of wabove
is as follows: it equals 1 if weightkg
exceeds 73.9 kilograms (it is above the median weight in the original sample), and zero otherwise. [1 mark]
The dummy variable wabove
effectively creates two groups (people whose weight exceeds 73.9 kilograms, and people whose weight is at most 73.9 kilograms). Is there an earnings gap between these two groups? Can you distinguish the earnings gap statistically from zero? Write your answer in a new text cell. [1 mark]
Remember to use the custom-built t_test
function that we provided to you in week 4.
Run a regression of earnings
on weightkg
. Create a scatter plot and include the estimated population regression function. What is the direction of the statistical association between earnings and weight? Is it significant? [1 mark]
Use your estimated PRF to create predictions
for earnings of a worker who weighs: 63.5 kilograms / 73.9 kilograms / 86.2 kilograms. [1 mark]
In summary, what have you learned about the effect of weight on earnings? Provide a brief yet thoughtful discussion! [2 marks]
Submit both an ipynb-file and an html-file on the course's Wattle page. [1 mark]
We suggest that you
name the notebook file assignment_1.ipynb, and
the html file assignment_1.html
Note: We absolutely require you to submit the ipynb-file. If you struggle with the creation of an html-file, don't worry: we will not deduct partial marks if you are unable to create and submit an html-file.
Make sure to follow our instructions asking you to double check that your files have uploaded properly to Wattle/Turnitin. A digital receipt or confirmation email from Wattle/Turnitin is NOT sufficient! Check my Github site (under Assignments) for details. Double checking your upload is part of the assignment and must occur before the deadline.
This exercise is based on Empirical Exercise 4.2 of Stock and Watson, Introduction to Econometrics, 4th global edition