Create data prep file#

In the following example, we will create a simple Python file (.py) with Jupyter’s magic commands. You need to use this notebook inside the folder you want to store the file.

First, create a variable with the name of the file:

_data_preparation_file = 'case_duke_data_prep.py'

The following code block needs to start with the magic command %%writefile:

%%writefile {_data_preparation_file}

import pandas as pd

ROOT = "https://raw.githubusercontent.com/kirenz/modern-statistics/main/data/"
DATA = "duke-forest.csv"

df = pd.read_csv(ROOT + DATA)

# drop column with too many missing values
df = df.drop(['hoa'], axis=1)

# drop remaining row with one missing value
df = df.dropna()

# Drop irrelevant features
df = df.drop(['url', 'address'], axis=1)

# Convert data types
categorical_list = ['type', 'heating', 'cooling', 'parking']

for i in categorical_list:
    df[i] = df[i].astype("category")

# drop irrelavant columns
df = df.drop(['type', 'heating', 'parking'], axis=1)