20  SQL & Python

This tutorial will guide you through the process of transferring data from a pandas DataFrame to a PostgreSQL database. It’s a valuable skill for handling data-driven projects in Python, allowing you to store and manage your data effectively.

Before you start, ensure you have PostgreSQL installed and running on your machine.

20.1 Python Environment

First, set up an isolated Python environment using Anaconda:

  1. Create a new environment (if you need more detailed explanations, review this chapter about the creation of environments):
conda create -n postgres python=3.11 pip
  1. Activate the environment:
conda activate postgres
  1. Install necessary packages: We will need psycopg2-binary for PostgreSQL connectivity, sqlalchemy for SQL operations, and pandas for data manipulation.
pip install psycopg2-binary sqlalchemy ipykernel jupyter pandas

20.2 Database Connection

To interact with your PostgreSQL database from Python, set up a connection using SQLAlchemy:

  1. Start PostgreSQL: Ensure your PostgreSQL instance is running.

  2. Import libraries and create an engine: The engine is your gateway to the database. Replace placeholders with your actual database credentials.

import sqlalchemy as sa

# Replace placeholders with your database credentials
engine = sa.create_engine('postgresql://your_username:your_password@localhost:your_port/your_database')

Here’s what you need to replace:

  • your_username: Typically postgres

  • your_password: Your database password

  • your_port: Commonly 5432

  • your_database: Default is postgres (you can also use db_data)

    For more details on configuring the engine, see the SQLAlchemy documentation.

20.3 Writing Data to PostgreSQL

With the connection set up, let’s transfer data from a pandas DataFrame to PostgreSQL:

  1. Load data into DataFrame:
import pandas as pd

# Load data from a CSV file
df = pd.read_csv("https://raw.githubusercontent.com/kirenz/datasets/master/gapminder.csv")
  1. Write DataFrame to PostgreSQL: Using the to_sql method, write the data to a new table named gapminder.
df.to_sql('gapminder', engine, if_exists='replace')

This should output: 704.

  1. Verify the table in the database: Query the database to confirm that the table exists.
with engine.connect() as connection:
      tables = connection.execute(sa.text("SELECT table_name FROM information_schema.tables WHERE table_schema='public'"))
      for table in tables:
           print(table[0])

This should output: gapminder

  1. Optional: Remove the table: If you need to delete the table from the database:
sql = sa.text("DROP TABLE IF EXISTS gapminder")

with engine.connect() as connection:
      connection.execute(sql)

Next, take a look at the table using PgAdmin4.