Before you start, please check the course website for important reminders on:
This assignment requires two types of work:
We have included empty code and markdown cells where you should enter your solutions. Feel free to add additional cells wherever needed.
Note: You may only use Python packages that were introduced during the EMET2007 computer labs.
In this assignment, you will investigate the statistical relationship between earnings and weight using the Earnings_and_Height dataset from the U.S. This is the same dataset used during the computer labs in weeks 4 and 5.
Since weights are recorded in pounds, here is a helpful conversion table:
| Weight (pounds) | Weight (kilograms) |
|---|---|
| 140 | 63.5 |
| 163 | 73.9 |
| 190 | 86.2 |
Good luck!
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.stats.api as sms
import statsmodels.formula.api as smf
from scipy import stats
df = pd.read_csv('https://raw.githubusercontent.com/juergenmeinecke/EMET2007/refs/heads/main/datasets/earnings_and_height.csv')
The original variable weight is reported in pounds. Create a new variable called weightkg that expresses weight in kilograms and add it to the dataframe.
Hint: 1 pound ≈ 0.4536 kilograms
# Your code here:
Present useful descriptive analysis for weightkg and visualize the variable.
Your task:
.describe())Note: Add as many code cells and markdown cells as needed.
# Your code here:
Your interpretation here:
Create a new categorical (dummy) variable called wabove and add it to your dataframe.
Definition:
wabove = 1 if weightkg exceeds 73.9 kg (above the median weight)wabove = 0 otherwiseHint: You can use a lambda function with .apply(), or use boolean indexing with .astype(int).
# Your code here:
The dummy variable wabove creates two groups:
Your task:
# Your code here:
Your interpretation here:
Run a simple linear regression of earnings on weightkg.
Your task:
smf.ols()# Your code here:
Your interpretation here:
Use your estimated PRF to predict the earnings for workers with the following weights:
| Weight (kg) |
|---|
| 63.5 |
| 73.9 |
| 86.2 |
Hint: Use the .predict() method from your regression results.
# Your code here:
In summary, what have you learned about the relationship between weight and earnings?
Your task: Provide a brief yet thoughtful discussion that goes beyond a mechanical interpretation of the regression output. Consider:
Your discussion here:
Submit both files on the course's Canvas page:
Note: The
.ipynbfile is required. If you have difficulty creating the HTML file, you will not lose marks for that portion.
This exercise is based on Empirical Exercise 4.2 of Stock and Watson, Introduction to Econometrics, 4th Global Edition.