Import and Store data

Pandas Introduction

Jan Kirenz

Import pandas

To load the pandas package and start working with it, import the package.
The community agreed alias for pandas is pd

import pandas as pd

Create Data

Create a DataFrame

To manually store data in a table, create a DataFrame:

df = pd.DataFrame({
    'name': ["Tom", "Lisa", "Peter"],
    'height': [1.68, 1.93, 1.72],
    'weight': [48.4, 89.8, 84.2],
    'id': [1, 2, 3],
    'city': ['Stuttgart', 'Stuttgart', 'Berlin']
})

Show data with head()

df.head()

	name	height	weight	id	city
0	Tom	1.68	48.4	1	Stuttgart
1	Lisa	1.93	89.8	2	Stuttgart
2	Peter	1.72	84.2	3	Berlin

Import data with .read()

Import data with the prefix .read_*

Import data from GitHub

Import a CSV file in a GitHub repo

URL = "https://raw.githubusercontent.com/kirenz/datasets/master/campaign.csv"

df_github = pd.read_csv(URL, sep=",", decimal='.')

df_github.head()

	age	city	income	membership_days	campaign_engagement	target
0	56	Berlin	136748	837	3	1
1	46	Stuttgart	25287	615	8	0
2	32	Berlin	146593	2100	3	0
3	60	Berlin	54387	2544	0	0
4	25	Berlin	28512	138	6	0

Store data with .to()

Store data with the prefix .to_*

df_github.to_csv("data.csv", index=False)

By setting index=False the row index labels are not saved in the spreadsheet

Viewing data

Data overview

df

	name	height	weight	id	city
0	Tom	1.68	48.4	1	Stuttgart
1	Lisa	1.93	89.8	2	Stuttgart
2	Peter	1.72	84.2	3	Berlin

Head and tail

# show first 2 rows
df.head(2)

	name	height	weight	id	city
0	Tom	1.68	48.4	1	Stuttgart
1	Lisa	1.93	89.8	2	Stuttgart

# show last 2 rows
df.tail(2)

	name	height	weight	id	city
1	Lisa	1.93	89.8	2	Stuttgart
2	Peter	1.72	84.2	3	Berlin

Info

The info() method prints information about a DataFrame.

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   name    3 non-null      object 
 1   height  3 non-null      float64
 2   weight  3 non-null      float64
 3   id      3 non-null      int64  
 4   city    3 non-null      object 
dtypes: float64(2), int64(1), object(2)
memory usage: 252.0+ bytes

Show column names

df.columns

Index(['name', 'height', 'weight', 'id', 'city'], dtype='object')

Show data types

Show data types (dtypes).

df.dtypes

name       object
height    float64
weight    float64
id          int64
city       object
dtype: object

The data types in this DataFrame are integers (int64), floats (float64) and strings (object).

Show index

df.index

RangeIndex(start=0, stop=3, step=1)

What’s next?

Congratulations! You have completed this tutorial 👍

Next, you may want to go back to the lab’s website