Pandas Introduction
Usually, we prefer to work with columns that have the following proporties:
no leading or trailing whitespace ("name" instead of " name ", " name" or "name ")
all lowercase ("name" instead of "Name")
no white spaces ("my_name" instead of "my name")
"name" to " MY NEW-NAME" (note that we include a leading whitespace)| MY NEW-NAME | height | weight | id | city | |
|---|---|---|---|---|---|
| 0 | Tom | 1.68 | 48.4 | 1 | Stuttgart |
| 1 | Lisa | 1.93 | 89.8 | 2 | Stuttgart |
| 2 | Peter | 1.72 | 84.2 | 3 | Berlin |
We use regular expressions to deal with whitespaces
To change multiple column names in df at once, we use the method df.columns = df.columns.str.replace()
To replace the spaces, we use .replace() with regex=True
Explanation for regex (see also Stackoverflow):
r (for raw) which tells Python to treat all following input as raw text (without interpreting it)^”: is line start+”: some following characters|”: is or$”: is line endTo learn more about regular expressions (“regex”), visit the following sites:
Again, we use regular expressions to deal with special characters (like -, %, &, $ etc.)
Replace “-” with “_”
We can use two simple methods to convert all columns to lowercase and replace white spaces with underscores (“_“):
.str.lower()
.str.replace(' ', '_')
Congratulations! You have completed this tutorial 👍
Next, you may want to go back to the lab’s website
Jan Kirenz