Pandas Introduction
name object
height float64
weight float64
id int64
city object
dtype: object
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 3 non-null object
1 height 3 non-null float64
2 weight 3 non-null float64
3 id 3 non-null int64
4 city 3 non-null object
dtypes: float64(2), int64(1), object(2)
memory usage: 252.0+ bytes
There are several methods to change data types in pandas:
The most common method to change the data type is:
.astype(): Convert to a specific type (like “int32”, “float” or “catgeory”)
.astype(str): Convert to string
to_datetime: Convert argument to datetime.to_timedelta: Convert argument to timedelta.to_numeric: Convert argument to a numeric type.Categoricals are a pandas data type corresponding to categorical variables in statistics.
A categorical variable takes on a limited, and usually fixed, number of possible values (categories).
Examples are gender, social class, blood type, country affiliation, observation time or rating via Likert scales.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 3 non-null category
1 height 3 non-null float64
2 weight 3 non-null float64
3 id 3 non-null int64
4 city 3 non-null object
dtypes: category(1), float64(2), int64(1), object(1)
memory usage: 363.0+ bytes
In our example, id is not a number (we can’t perform calculations with it)
It is just a unique identifier so we should transform it to a simple string (object)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 3 non-null category
1 height 3 non-null float64
2 weight 3 non-null float64
3 id 3 non-null object
4 city 3 non-null object
dtypes: category(1), float64(2), object(2)
memory usage: 363.0+ bytes
Add a new variable called “number” to df
The new variable should have the number 42 in all rows
| Code | Example | Description |
|---|---|---|
| %a | Sun | Weekday as locale’s abbreviated name. |
| %A | Sunday | Weekday as locale’s full name. |
| %w | 0 | Weekday as a decimal number, where 0 is Sunday and 6 is Saturday. |
| %d | 8 | Day of the month as a zero-padded decimal number. |
| %-d | 8 | Day of the month as a decimal number. (Platform specific) |
| Code | Example | Description |
|---|---|---|
| %b | Sep | Month as locale’s abbreviated name. |
| %B | September | Month as locale’s full name. |
| %m | 9 | Month as a zero-padded decimal number. |
| %-m | 9 | Month as a decimal number. (Platform specific) |
| Code | Example | Description |
|---|---|---|
| %y | 13 | Year without century as a zero-padded decimal number. |
| %Y | 2013 | Year with century as a decimal number. |
| %H | 7 | Hour (24-hour clock) as a zero-padded decimal number. |
| %-H | 7 | Hour (24-hour clock) as a decimal number. (Platform specific) |
| %I | 7 | Hour (12-hour clock) as a zero-padded decimal number. |
| %-I | 7 | Hour (12-hour clock) as a decimal number. (Platform specific) |
| Code | Example | Description |
|---|---|---|
| %p | AM | Locale’s equivalent of either AM or PM. |
| %M | 6 | Minute as a zero-padded decimal number. |
| %-M | 6 | Minute as a decimal number. (Platform specific) |
| %S | 5 | Second as a zero-padded decimal number. |
| %-S | 5 | Second as a decimal number. (Platform specific) |
| %f | 0 | Microsecond as a decimal number, zero-padded on the left. |
| %z | 0 | UTC offset in the form ±HHMM[SS[.ffffff]] (empty string if the object is naive). |
| %Z | UTC | Time zone name (empty string if the object is naive). |
| %j | 251 | Day of the year as a zero-padded decimal number. |
| %-j | 251 | Day of the year as a decimal number. (Platform specific) |
| %U | 36 | Week number of the year (Sunday as the first day of the week) as a zero padded decimal number. All days in a new year preceding the first Sunday are considered to be in week 0. |
| %W | 35 | Week number of the year (Monday as the first day of the week) as a decimal number. All days in a new year preceding the first Monday are considered to be in week 0. |
| %c | Sun Sep 8 07:06:05 2013 | Locale’s appropriate date and time representation. |
| %x | 09.08.13 | Locale’s appropriate date representation. |
| %X | 07:06:05 | Locale’s appropriate time representation. |
| %% | % | A literal ‘%’ character. |
Congratulations! You have completed this tutorial 👍
Next, you may want to go back to the lab’s website
Jan Kirenz