Welcome#

This book contains a short introduction to pandas which is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool.

pandas offers data structures and operations for manipulating tables and time series using a so called DataFrame which is similar to an in-memory spreadsheet. Like a spreadsheet:

  • A DataFrame stores data in cells.

  • A DataFrame has named columns (usually) and numbered rows.

Note

A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table or the data.frame in R.


To learn more about pandas, visit the getting started tutorials to see:

  • What kind of data does pandas handle?

  • How to create new columns derived from existing columns?

  • How to reshape the layout of tables?

  • How to combine data from multiple tables?

  • How to handle time series data with ease?

  • How to manipulate textual data?

Furthermore, you may want to review Python for Data Analysis, 3 edition.

The tool pandas tutor lets you write Python pandas code in your browser and see how it transforms your data step-by-step: