Change Column Names

Pandas Introduction

Jan Kirenz

Setup

import pandas as pd

df = pd.DataFrame({
    'name': ["Tom", "Lisa", "Peter"],
    'height': [1.68, 1.93, 1.72],
    'weight': [48.4, 89.8, 84.2],
    'id': [1, 2, 3],
    'city': ['Stuttgart', 'Stuttgart', 'Berlin']
})

Basics

Column names properties

Usually, we prefer to work with columns that have the following proporties:

  • no leading or trailing whitespace ("name" instead of " name ", " name" or "name ")

  • all lowercase ("name" instead of "Name")

  • no white spaces ("my_name" instead of "my name")

Simple rename

  • First, we rename columns by simply using a mapping
  • We rename "name" to " MY NEW-NAME" (note that we include a leading whitespace)
df = df.rename(columns={"name": " MY NEW-NAME"}, errors="raise")

Simple rename

df.head()
MY NEW-NAME height weight id city
0 Tom 1.68 48.4 1 Stuttgart
1 Lisa 1.93 89.8 2 Stuttgart
2 Peter 1.72 84.2 3 Berlin
df.columns
Index([' MY NEW-NAME', 'height', 'weight', 'id', 'city'], dtype='object')

Regular expressions

Trailing and leading spaces (with regex)

  • We use regular expressions to deal with whitespaces

  • To change multiple column names in df at once, we use the method df.columns = df.columns.str.replace()

  • To replace the spaces, we use .replace() with regex=True

Regular expressions table

Explanation for regex (see also Stackoverflow):

  • we start with r (for raw) which tells Python to treat all following input as raw text (without interpreting it)
  • ^”: is line start
  • ” “: is a white space
  • +”: some following characters
  • |”: is or
  • $”: is line end
  • ““: is an empty string

Learn more about regular expressions

To learn more about regular expressions (“regex”), visit the following sites:

Trailing and leading spaces (with regex)

  • In the following function, we search for leading spaces and replace them with an empty string:
df.columns = df.columns.str.replace(r"^ ", r"", regex=True)
df.columns
Index(['MY NEW-NAME', 'height', 'weight', 'id', 'city'], dtype='object')

Replace special characters with REGEX

Replace special characters

  • Again, we use regular expressions to deal with special characters (like -, %, &, $ etc.)

  • Replace “-” with “_

df.columns = df.columns.str.replace(r"-", r"_", regex=True)
df.columns
Index(['MY NEW_NAME', 'height', 'weight', 'id', 'city'], dtype='object')

Lowercase and whitespace

Lowercase and whitespace

  • We can use two simple methods to convert all columns to lowercase and replace white spaces with underscores (“_“):

  • .str.lower()

  • .str.replace(' ', '_')

df.columns = df.columns.str.lower().str.replace(' ', '_')
df.columns
Index(['my_new_name', 'height', 'weight', 'id', 'city'], dtype='object')

What’s next?

Congratulations! You have completed this tutorial 👍

Next, you may want to go back to the lab’s website