class: center, middle, title-slide # Introduction to SQL ## Joins ### Prof. Dr. Jan Kirenz ### HdM Stuttgart --- layout: true <div class="my-footer"><span> Introduction to SQL | Prof. Dr. Jan Kirenz </span></div> ??? --- # PostgreSQL Setup ``` pw = "your_password" ``` ```r library(DBI) library(RPostgres) con <- dbConnect(RPostgres::Postgres(), dbname = "postgres", host = "localhost", port = 5432, user = "postgres", password = pw) ``` --- class: inverse, center, middle ### The examples in this presentation are based on the excellent book "A Beginner's Guide to Storytelling with Data" from Anthony DeBarros (2018). --- # JOIN example: Departments & Employees ## CREATE the departments table: ```sql CREATE TABLE departments ( dept_id bigserial, dept varchar(100), city varchar(100), CONSTRAINT dept_key PRIMARY KEY (dept_id), CONSTRAINT dept_city_unique UNIQUE (dept, city) ); ``` -- * `dept_id` column is the table's primary key * Table includes a **UNIQUE** constraint * Guarantees that values in a column are unique. * It requires that each row have a unique pair of values for *dept* and *city* * We add these constraints to avoid duplicate data. --- class: inverse, center, middle ## PRIMARY KEY ### A primary key is a column whose values uniquely identify each row in a table. ### This column has certain contstraints: it must have a unique value for each row and it can't have missing values. --- ## CREATE employees table ```sql CREATE TABLE employees ( emp_id bigserial, first_name varchar(100), last_name varchar(100), salary integer, dept_id integer REFERENCES departments (dept_id), CONSTRAINT emp_key PRIMARY KEY (emp_id), CONSTRAINT emp_dept_unique UNIQUE (emp_id, dept_id) ); ``` --- ## CREATE employees table - `emp_id` is the table's **primary key**: it uniquely identifies each row in the employees table - `dept_id` (we added it as a constraint when creating the table) is called a **foreign key** - It requires a value entered in a column to already exist in the primary key of the table it **references**. - Values in `dept_id` in the employees table must exist in `dept_id` in the departments table - UNIQUE: each row must have a **unique** pair of emp_id and dept_id --- class: inverse, center, middle ## FOREIGN KEY ### Unlike a primary key, a **foreign key** column can be empty or it can contain duplicate values --- ## INSERT values ```sql INSERT INTO departments (dept, city) VALUES ('Tax', 'Atlanta'), ('IT', 'Boston'); ``` -- ```sql INSERT INTO employees (first_name, last_name, salary, dept_id) VALUES ('Nancy', 'Jones', 62500, 1), ('Lee', 'Smith', 59300, 1), ('Soo', 'Nguyen', 83000, 2), ('Janet', 'King', 95000, 2); ``` --- ## SELECT values ```sql SELECT * FROM departments; ``` -- ```{=html} <div id="htmlwidget-bfd044120f6fc72aa1b3" style="width:100%;height:auto;" class="datatables html-widget"></div> <script type="application/json" data-for="htmlwidget-bfd044120f6fc72aa1b3">{"x":{"filter":"none","fillContainer":false,"data":[["1","2"],[1,2],["Tax","IT"],["Atlanta","Boston"]],"container":"<table class=\"display\">\n <thead>\n <tr>\n <th> <\/th>\n <th>dept_id<\/th>\n <th>dept<\/th>\n <th>city<\/th>\n <\/tr>\n <\/thead>\n<\/table>","options":{"pageLength":4,"columnDefs":[{"className":"dt-right","targets":1},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[4,10,25,50,100]}},"evals":[],"jsHooks":[]}</script> ``` --- ## SELECT values ```sql SELECT * FROM employees; ``` -- ```{=html} <div id="htmlwidget-27abbbd53045262ed4db" style="width:100%;height:auto;" class="datatables html-widget"></div> <script type="application/json" data-for="htmlwidget-27abbbd53045262ed4db">{"x":{"filter":"none","fillContainer":false,"data":[["1","2"],[1,2],["Tax","IT"],["Atlanta","Boston"]],"container":"<table class=\"display\">\n <thead>\n <tr>\n <th> <\/th>\n <th>dept_id<\/th>\n <th>dept<\/th>\n <th>city<\/th>\n <\/tr>\n <\/thead>\n<\/table>","options":{"pageLength":4,"columnDefs":[{"className":"dt-right","targets":1},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[4,10,25,50,100]}},"evals":[],"jsHooks":[]}</script> ``` --- ## JOIN the tables ```sql SELECT * FROM employees JOIN departments ON employees.dept_id = departments.dept_id; ``` -- ```{=html} <div id="htmlwidget-2f72ceffe6bb27a7bb32" style="width:100%;height:auto;" class="datatables html-widget"></div> <script type="application/json" data-for="htmlwidget-2f72ceffe6bb27a7bb32">{"x":{"filter":"none","fillContainer":false,"data":[["1","2","3","4"],[1,2,3,4],["Nancy","Lee","Soo","Janet"],["Jones","Smith","Nguyen","King"],[62500,59300,83000,95000],[1,1,2,2],[1,1,2,2],["Tax","Tax","IT","IT"],["Atlanta","Atlanta","Boston","Boston"]],"container":"<table class=\"display\">\n <thead>\n <tr>\n <th> <\/th>\n <th>emp_id<\/th>\n <th>first_name<\/th>\n <th>last_name<\/th>\n <th>salary<\/th>\n <th>dept_id<\/th>\n <th>dept_id..6<\/th>\n <th>dept<\/th>\n <th>city<\/th>\n <\/tr>\n <\/thead>\n<\/table>","options":{"pageLength":4,"columnDefs":[{"className":"dt-right","targets":[1,4,5,6]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[4,10,25,50,100]}},"evals":[],"jsHooks":[]}</script> ``` --- # JOIN Types * Creating two tables to explore JOIN types ```sql CREATE TABLE schools_left ( id integer CONSTRAINT left_id_key PRIMARY KEY, left_school varchar(30) ); ``` -- ```sql CREATE TABLE schools_right ( id integer CONSTRAINT right_id_key PRIMARY KEY, right_school varchar(30) ); ``` --- ### Insert values ```sql INSERT INTO schools_left (id, left_school) VALUES (1, 'Oak Street School'), (2, 'Roosevelt High School'), (5, 'Washington Middle School'), (6, 'Jefferson High School'); ``` -- ```sql INSERT INTO schools_right (id, right_school) VALUES (1, 'Oak Street School'), (2, 'Roosevelt High School'), (3, 'Morrison Elementary'), (4, 'Chase Magnet Academy'), (6, 'Jefferson High School'); ``` --- # JOIN - we use JOIN or INNER JOIN, when we want to return rows that have a match in the columns we used for the join -- ```sql SELECT * FROM schools_left JOIN schools_right ON schools_left.id = schools_right.id; ``` -- <table> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:left;"> left_school </th> <th style="text-align:right;"> id..3 </th> <th style="text-align:left;"> right_school </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> </tr> </tbody> </table> --- # INNER JOIN ```sql SELECT * FROM schools_left INNER JOIN schools_right ON schools_left.id = schools_right.id; ``` -- <table> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:left;"> left_school </th> <th style="text-align:right;"> id..3 </th> <th style="text-align:left;"> right_school </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> </tr> </tbody> </table> --- # JOIN or INNER JOIN - Here the join includes all columns in both tables (*). - Then we specify the two tables to join around the JOIN keyword. - At last we specify which columns we're joining on, here the id columns of both tables. - Three school IDs match in both tables, JOIN or INNER JOIN returns only the three rows of those IDs that match. - Values (schools) that exist only in one of the two tables don't appear in the result. - Notice that the columns from the left table display on the left of the result table. - Use JOIN or INNER JOIN when you're working with well-structured, well-maintained data sets and only need to find rows that exist in all the tables you're joining. --- # INNER JOIN with USING ```sql SELECT * FROM schools_left INNER JOIN schools_right USING (id); ``` -- <table> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:left;"> left_school </th> <th style="text-align:left;"> right_school </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> <td style="text-align:left;"> Oak Street School </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> <td style="text-align:left;"> Roosevelt High School </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> <td style="text-align:left;"> Jefferson High School </td> </tr> </tbody> </table> --- # LEFT JOIN - LEFT JOIN returns all rows from the left table and displays blank rows from the right table if no matching values are found in the joined columns. -- ```sql SELECT * FROM schools_left LEFT JOIN schools_right ON schools_left.id = schools_right.id; ``` -- <table> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:left;"> left_school </th> <th style="text-align:right;"> id..3 </th> <th style="text-align:left;"> right_school </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Washington Middle School </td> <td style="text-align:right;"> NA </td> <td style="text-align:left;"> NA </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> </tr> </tbody> </table> --- class: inverse, center, middle ## RIGHT JOIN & LEFT JOIN ### Use either of these join types when you want your query results to contain **all** the rows from one of the tables... ### ... or when you want to look for **missing values** in one of the tables. --- # RIGHT JOIN - RIGHT JOIN returns all rows from the right table and displays blank rows from the left table if no matching values are found in the joined columns -- ```sql SELECT * FROM schools_left RIGHT JOIN schools_right ON schools_left.id = schools_right.id; ``` -- <table> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:left;"> left_school </th> <th style="text-align:right;"> id..3 </th> <th style="text-align:left;"> right_school </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> </tr> <tr> <td style="text-align:right;"> NA </td> <td style="text-align:left;"> NA </td> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Morrison Elementary </td> </tr> <tr> <td style="text-align:right;"> NA </td> <td style="text-align:left;"> NA </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> Chase Magnet Academy </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> </tr> </tbody> </table> --- # Aliasing (AS) * Hint: Instead of writing the full table name, you can use table aliasing as a shortcut: ```sql SELECT e.salary, d.city FROM employees AS e INNER JOIN departments AS d ON e.dept_id = d.dept_id; ``` -- <table> <thead> <tr> <th style="text-align:right;"> salary </th> <th style="text-align:left;"> city </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 62500 </td> <td style="text-align:left;"> Atlanta </td> </tr> <tr> <td style="text-align:right;"> 59300 </td> <td style="text-align:left;"> Atlanta </td> </tr> <tr> <td style="text-align:right;"> 83000 </td> <td style="text-align:left;"> Boston </td> </tr> <tr> <td style="text-align:right;"> 95000 </td> <td style="text-align:left;"> Boston </td> </tr> </tbody> </table> --- # FULL OUTER JOIN - Shows **all** rows from both tables in a join, regardless of whether any match: ```sql SELECT * FROM schools_left FULL OUTER JOIN schools_right ON schools_left.id = schools_right.id; ``` -- <table> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:left;"> left_school </th> <th style="text-align:right;"> id..3 </th> <th style="text-align:left;"> right_school </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Washington Middle School </td> <td style="text-align:right;"> NA </td> <td style="text-align:left;"> NA </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> </tr> <tr> <td style="text-align:right;"> NA </td> <td style="text-align:left;"> NA </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> Chase Magnet Academy </td> </tr> <tr> <td style="text-align:right;"> NA </td> <td style="text-align:left;"> NA </td> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Morrison Elementary </td> </tr> </tbody> </table> --- ## FULL OUTER JOIN ```sql SELECT * FROM schools_right FULL OUTER JOIN schools_left ON schools_right.id = schools_left.id; ``` -- <table> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:left;"> right_school </th> <th style="text-align:right;"> id..3 </th> <th style="text-align:left;"> left_school </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Morrison Elementary </td> <td style="text-align:right;"> NA </td> <td style="text-align:left;"> NA </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> Chase Magnet Academy </td> <td style="text-align:right;"> NA </td> <td style="text-align:left;"> NA </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> </tr> <tr> <td style="text-align:right;"> NA </td> <td style="text-align:left;"> NA </td> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Washington Middle School </td> </tr> </tbody> </table> --- # FULL OUTER JOIN with USING ```sql SELECT * FROM schools_left FULL OUTER JOIN schools_right USING (id); ``` -- <table> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:left;"> left_school </th> <th style="text-align:left;"> right_school </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> <td style="text-align:left;"> Oak Street School </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> <td style="text-align:left;"> Roosevelt High School </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Washington Middle School </td> <td style="text-align:left;"> NA </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> <td style="text-align:left;"> Jefferson High School </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> Chase Magnet Academy </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> Morrison Elementary </td> </tr> </tbody> </table> --- # CROSS JOIN ```sql SELECT * FROM schools_left CROSS JOIN schools_right; ``` * Lines up each row in the left table with each row in the right table to present all possible combinations of row. * Because the join doesn't need to find matches between key fields, there is no need to provide the clause using ON or USING. --- class: center, middle <table> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:left;"> left_school </th> <th style="text-align:right;"> id..3 </th> <th style="text-align:left;"> right_school </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Morrison Elementary </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> Chase Magnet Academy </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Morrison Elementary </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> Chase Magnet Academy </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Washington Middle School </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Washington Middle School </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Washington Middle School </td> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Morrison Elementary </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Washington Middle School </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> Chase Magnet Academy </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Washington Middle School </td> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Oak Street School </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Roosevelt High School </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Morrison Elementary </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> Chase Magnet Academy </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Jefferson High School </td> </tr> </tbody> </table> --- # Using NULL to find missing values ```sql SELECT * FROM schools_left LEFT JOIN schools_right ON schools_left.id = schools_right.id WHERE schools_right.id IS NULL; ``` -- <table> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:left;"> left_school </th> <th style="text-align:right;"> id..3 </th> <th style="text-align:left;"> right_school </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Washington Middle School </td> <td style="text-align:right;"> NA </td> <td style="text-align:left;"> NA </td> </tr> </tbody> </table> - The result shows only the one row from the left table that didn't have a match on the right side. --- class: center, middle ## Thank you! [Prof. Dr. Jan Kirenz](https://www.kirenz.com) HdM Stuttgart Nobelstraße 10 70569 Stuttgart ![](https://upload.wikimedia.org/wikipedia/commons/5/5e/HdM_Logo.svg)