EMET2007 Assignment 1¶

General Instructions¶

Before you start, please check the course website for important reminders on:

  • Deadlines for submission
  • Academic honesty requirements

What You Need to Do¶

This assignment requires two types of work:

  1. Python coding — Write your code in code cells
  2. Written explanations — Provide interpretations in markdown cells (text cells)

We have included empty code and markdown cells where you should enter your solutions. Feel free to add additional cells wherever needed.

Note: You may only use Python packages that were introduced during the EMET2007 computer labs.


Your Task: Studying the Association Between Earnings and Weight¶

In this assignment, you will investigate the statistical relationship between earnings and weight using the Earnings_and_Height dataset from the U.S. This is the same dataset used during the computer labs in weeks 4 and 5.

Unit Conversion Reference¶

Since weights are recorded in pounds, here is a helpful conversion table:

Weight (pounds) Weight (kilograms)
140 63.5
163 73.9
190 86.2

Good luck!


Setup: Imports and Loading Data¶

Run the cell below to import the required libraries.

In [ ]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.stats.api as sms
import statsmodels.formula.api as smf
from scipy import stats

Loading the Dataset¶

In [ ]:
df = pd.read_csv('https://raw.githubusercontent.com/juergenmeinecke/EMET2007/refs/heads/main/datasets/earnings_and_height.csv')

Exercise 1 (1 mark)¶

The original variable weight is reported in pounds. Create a new variable called weightkg that expresses weight in kilograms and add it to the dataframe.

Hint: 1 pound ≈ 0.4536 kilograms

In [ ]:
# Your code here:

Exercise 2 (2 marks)¶

Present useful descriptive analysis for weightkg and visualize the variable.

Your task:

  1. Calculate summary statistics (e.g., using .describe())
  2. Create appropriate visualizations (e.g., histogram, boxplot)
  3. Write an interpretation of what you observe about the distribution of weight

Note: Add as many code cells and markdown cells as needed.

In [ ]:
# Your code here:

Your interpretation here:


Exercise 3 (1 mark)¶

Create a new categorical (dummy) variable called wabove and add it to your dataframe.

Definition:

  • wabove = 1 if weightkg exceeds 73.9 kg (above the median weight)
  • wabove = 0 otherwise

Hint: You can use a lambda function with .apply(), or use boolean indexing with .astype(int).

In [ ]:
# Your code here:

Exercise 4 (1 mark)¶

The dummy variable wabove creates two groups:

  • People whose weight exceeds 73.9 kg
  • People whose weight is at most 73.9 kg

Your task:

  1. Use a t-test to determine if there is an earnings gap between these two groups.
  2. In a markdown cell, explain whether you can distinguish the earnings gap statistically from zero.
In [ ]:
# Your code here:

Your interpretation here:


Exercise 5 (1 mark)¶

Run a simple linear regression of earnings on weightkg.

Your task:

  1. Estimate the regression using smf.ols()
  2. Create a scatter plot of the data
  3. Add the estimated population regression function (PRF) to the scatter plot
  4. Discuss: What is the direction of the association? Is the relationship statistically significant?
In [ ]:
# Your code here:

Your interpretation here:


Exercise 6 (1 mark)¶

Use your estimated PRF to predict the earnings for workers with the following weights:

Weight (kg)
63.5
73.9
86.2

Hint: Use the .predict() method from your regression results.

In [ ]:
# Your code here:

Exercise 7 (2 marks)¶

In summary, what have you learned about the relationship between weight and earnings?

Your task: Provide a brief yet thoughtful discussion that goes beyond a mechanical interpretation of the regression output. Consider:

  • The economic significance (not just statistical significance) of the findings
  • Potential limitations or concerns with the analysis
  • Any caveats about causal interpretation

Your discussion here:


Exercise 8: Submission (1 mark)¶

Submit both files on the course's Canvas page:

  1. assignment_1.ipynb — the Jupyter notebook file
  2. assignment_1.html — an HTML export of your notebook

Note: The .ipynb file is required. If you have difficulty creating the HTML file, you will not lose marks for that portion.


Attribution¶

This exercise is based on Empirical Exercise 4.2 of Stock and Watson, Introduction to Econometrics, 4th Global Edition.