EMET2007 Assignment 1¶

General instructions¶

Before you start, and while you're going, have a look at the website for important reminders on deadlines and academic honesty!

Your job consists of

  • coding in Python, and

  • writing explanations and offering econometric interpretations.

For the coding you will need to use code cells and for the explanations you will need to use text (or markdown) cells. In the notebook below, we have included empty code and text cells in which you can type your solutions. Feel free to put additional code and text cells wherever you may need them!

Note: You may only use Python and you may only use packages that were used during the EMET2007 computer labs.

Your task: studying the association between earnings and weight¶

In this assignment you will study the association between earnings and weight.

You will use the Earnings_and_Height data from the U.S. This is the same data set that we used during the computer labs in weeks 4 and 5.

Weights are measured in pounds, the following little table will be helpful in translating to the metric system:

Weight in pounds Weight in kilograms
140 63.5
163 73.9
190 86.2

Answer all of the following exercises! Good luck!

Imports and loading data¶

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.stats.api as sms
import statsmodels.formula.api as smf
In [2]:
# COLAB USERS: UNCOMMENT THE FOLLOWING LINES:

# from google.colab import drive
# drive.mount('/content/drive')
# df = pd.read_csv('drive/MyDrive/EMET2007/datasets/earnings_and_height.csv')
In [3]:
# ANACONDA USERS: UNCOMMENT THE FOLLOWING LINE:

# df = pd.read_csv('../datasets/earnings_and_height.csv')

Exercise 1¶

The original variable weight is reported in pounds. Add a new variable weightkg which reports weight in kilograms to the data frame. [1 mark]

In [ ]:
 

Exercise 2¶

Present useful descriptive analysis for weightkg and visualize the variable in familiar ways. What do you learn about the variable weightkg? [2 marks]

Note: As already explained earlier, you may add as many code cells and text cells as you like or need.

In [ ]:
 

Exercise 3¶

Add a new (categorical) variable wabove to your data frame. The definition of wabove is as follows: it equals 1 if weightkg exceeds 73.9 kilograms (it is above the median weight in the original sample), and zero otherwise. [1 mark]

In [ ]:
 

Exercise 4¶

The dummy variable wabove effectively creates two groups (people whose weight exceeds 73.9 kilograms, and people whose weight is at most 73.9 kilograms). Is there an earnings gap between these two groups? Can you distinguish the earnings gap statistically from zero? Write your answer in a new text cell. [1 mark]

Remember to use the custom-built t_test function that we provided to you in week 4.

In [ ]:
 
In [ ]:
 

Exercise 5¶

Run a regression of earnings on weightkg. Create a scatter plot and include the estimated population regression function. What is the direction of the statistical association between earnings and weight? Is it significant? [1 mark]

In [ ]:
 

Exercise 6¶

Use your estimated PRF to create predictions for earnings of a worker who weighs: 63.5 kilograms / 73.9 kilograms / 86.2 kilograms. [1 mark]

In [ ]:
 

Exercise 7¶

In summary, what have you learned about the effect of weight on earnings? Provide a brief yet thoughtful discussion! [2 marks]

Exercise 8¶

Submit both an ipynb-file and an html-file on the course's Wattle page. [1 mark]

We suggest that you

  • name the notebook file assignment_1.ipynb, and

  • the html file assignment_1.html

Note: We absolutely require you to submit the ipynb-file. If you struggle with the creation of an html-file, don't worry: we will not deduct partial marks if you are unable to create and submit an html-file.

Make sure to follow our instructions asking you to double check that your files have uploaded properly to Wattle/Turnitin. A digital receipt or confirmation email from Wattle/Turnitin is NOT sufficient! Check my Github site (under Assignments) for details. Double checking your upload is part of the assignment and must occur before the deadline.

Attribution¶

This exercise is based on Empirical Exercise 4.2 of Stock and Watson, Introduction to Econometrics, 4th global edition