Selection

Pandas Introduction

Jan Kirenz

Setup

import pandas as pd

df = pd.DataFrame({
    'name': ["Tom", "Lisa", "Peter"],
    'height': [1.68, 1.93, 1.72],
    'weight': [48.4, 89.8, 84.2],
    'id': [1, 2, 3],
    'city': ['Stuttgart', 'Stuttgart', 'Berlin']
})

df['bmi'] = round(df['weight'] / (df['height'] * df['height']), 2)
df["name"] = df["name"].astype("category")
df['id'] = df['id'].astype(str)

Select with []

Getting columns with [[]]

  • Selecting a single column with [[]]:

  • Select the column city and save it as a new Pandas dataframe df_city

df_city = df[["city"]]
df_city
city
0 Stuttgart
1 Stuttgart
2 Berlin

Selecting rows with []

  • Selecting via [] slices the rows (endpoint is not included) and includes all columns:
df[0:2]
name height weight id city bmi
0 Tom 1.68 48.4 1 Stuttgart 17.15
1 Lisa 1.93 89.8 2 Stuttgart 24.11

Getting data with loc

The .loc (location) attribute is the primary access method.

Only the first row

df.loc[[0]]
name height weight id city bmi
0 Tom 1.68 48.4 1 Stuttgart 17.15

One row and one column

  • Only select location at row 0 for column “name”
df.loc[0, 'name']
'Tom'

Multiple rows and one column

  • Select row 2 to 4 for column “name” (when using .loc endpoints are included)
df.loc[2:4, 'name']
2    Peter
Name: name, dtype: category
Categories (3, object): ['Lisa', 'Peter', 'Tom']

Multiple rows and multiple columns

  • Select row 2 to 4 for columns “name” and “height” (when using .loc endpoints are included)
df.loc[2:4, ['name', 'height']]
name height
2 Peter 1.72

All rows and multiple columns

  • Select all rows for name and height
df.loc[:, ["name", "height"]]
name height
0 Tom 1.68
1 Lisa 1.93
2 Peter 1.72

Scalar value

df.loc[[0], "height"]
0    1.68
Name: height, dtype: float64

Integer based indexing: .iloc

Basics

  • Pandas provides a suite of methods in order to get purely integer based indexing.

  • Here, the .iloc attribute is the primary access method.

df.iloc[0]
name            Tom
height         1.68
weight         48.4
id                1
city      Stuttgart
bmi           17.15
Name: 0, dtype: object

Multiple rows and columns

  • When using .iloc, endpoints are not included.
df.iloc[0:2, 0:2]
name height
0 Tom 1.68
1 Lisa 1.93

Lists of integer position locations

df.iloc[[0, 2], [0, 1]]
name height
0 Tom 1.68
2 Peter 1.72

Slicing rows explicitly

df.iloc[1:3, :]
name height weight id city bmi
1 Lisa 1.93 89.8 2 Stuttgart 24.11
2 Peter 1.72 84.2 3 Berlin 28.46

Slicing columns explicitly

df.iloc[:, 1:3]
height weight
0 1.68 48.4
1 1.93 89.8
2 1.72 84.2

Getting a value explicitly

df.iloc[0, 0]
'Tom'

What’s next?

Congratulations! You have completed this tutorial 👍

Next, you may want to go back to the lab’s website