20 SQL & Python
This tutorial will guide you through the process of transferring data from a pandas DataFrame to a PostgreSQL database. It’s a valuable skill for handling data-driven projects in Python, allowing you to store and manage your data effectively.
Before you start, ensure you have PostgreSQL installed and running on your machine.
20.1 Python Environment
First, set up an isolated Python environment using Anaconda:
- Create a new environment (if you need more detailed explanations, review this chapter about the creation of environments):
conda create -n postgres python=3.11 pip
- Activate the environment:
conda activate postgres
- Install necessary packages: We will need
psycopg2-binary
for PostgreSQL connectivity,sqlalchemy
for SQL operations, andpandas
for data manipulation.
pip install psycopg2-binary sqlalchemy ipykernel jupyter pandas
20.2 Database Connection
To interact with your PostgreSQL database from Python, set up a connection using SQLAlchemy:
Start PostgreSQL: Ensure your PostgreSQL instance is running.
Import libraries and create an engine: The engine is your gateway to the database. Replace placeholders with your actual database credentials.
import sqlalchemy as sa
# Replace placeholders with your database credentials
= sa.create_engine('postgresql://your_username:your_password@localhost:your_port/your_database') engine
Here’s what you need to replace:
your_username
: Typicallypostgres
your_password
: Your database passwordyour_port
: Commonly5432
your_database
: Default ispostgres
(you can also use db_data)For more details on configuring the engine, see the SQLAlchemy documentation.
20.3 Writing Data to PostgreSQL
With the connection set up, let’s transfer data from a pandas DataFrame to PostgreSQL:
- Load data into DataFrame:
import pandas as pd
# Load data from a CSV file
= pd.read_csv("https://raw.githubusercontent.com/kirenz/datasets/master/gapminder.csv") df
- Write DataFrame to PostgreSQL: Using the
to_sql
method, write the data to a new table namedgapminder
.
'gapminder', engine, if_exists='replace') df.to_sql(
This should output: 704.
- Verify the table in the database: Query the database to confirm that the table exists.
with engine.connect() as connection:
= connection.execute(sa.text("SELECT table_name FROM information_schema.tables WHERE table_schema='public'"))
tables for table in tables:
print(table[0])
This should output: gapminder
- Optional: Remove the table: If you need to delete the table from the database:
= sa.text("DROP TABLE IF EXISTS gapminder")
sql
with engine.connect() as connection:
connection.execute(sql)
Next, take a look at the table using PgAdmin4.