Create data prep file
Create data prep file#
In the following example, we will create a simple Python file (.py) with Jupyter’s magic commands. You need to use this notebook inside the folder you want to store the file.
First, create a variable with the name of the file:
_data_preparation_file = 'case_duke_data_prep.py'
The following code block needs to start with the magic command %%writefile:
%%writefile {_data_preparation_file}
import pandas as pd
ROOT = "https://raw.githubusercontent.com/kirenz/modern-statistics/main/data/"
DATA = "duke-forest.csv"
df = pd.read_csv(ROOT + DATA)
# drop column with too many missing values
df = df.drop(['hoa'], axis=1)
# drop remaining row with one missing value
df = df.dropna()
# Drop irrelevant features
df = df.drop(['url', 'address'], axis=1)
# Convert data types
categorical_list = ['type', 'heating', 'cooling', 'parking']
for i in categorical_list:
df[i] = df[i].astype("category")
# drop irrelavant columns
df = df.drop(['type', 'heating', 'parking'], axis=1)