Introduction to SQL

# Introduction to SQL
## Grouping and Summarizing
### Prof. Dr. Jan Kirenz
### HdM Stuttgart

---

layout: true
  
<div class="my-footer"><span> Introduction to SQL | Prof. Dr. Jan Kirenz </span></div>

---

# PostgreSQL Setup

```
pw = "your_password"
```

```r
library(DBI)
library(RPostgres)

con <- dbConnect(RPostgres::Postgres(),
                 dbname = "postgres",
                  host = "localhost", 
                  port = 5432,
                  user = "postgres", 
                  password = pw)
```

---
class: center, middle

### The examples in this presentation are based on the excellent book "A Beginner's Guide to Storytelling with Data" from Anthony DeBarros (2018).

---

# CREATE TABLES and INDEX

* Creating and filling the 2014 Public Libraries Survey table.

* We only take a look at the first few lines:

```sql
CREATE TABLE pls_fy2014_pupld14a (
    stabr varchar(2) NOT NULL,
    fscskey varchar(6) CONSTRAINT fscskey2014_key PRIMARY KEY,
    libid varchar(20) NOT NULL,
    libname varchar(100) NOT NULL,
    obereg varchar(2) NOT NULL,
    rstatus integer NOT NULL,
    statstru varchar(2) NOT NULL,
    statname varchar(2) NOT NULL,
    stataddr varchar(2) NOT NULL,
    longitud numeric(10,7) NOT NULL,
    latitude numeric(10,7) NOT NULL,
    fipsst varchar(2) NOT NULL,
    fipsco varchar(3) NOT NULL,
    address varchar(35) NOT NULL,
    city varchar(20) NOT NULL,
    zip varchar(5) NOT NULL,
    zip4 varchar(4) NOT NULL,
    cnty varchar(20) NOT NULL,
    phone varchar(10) NOT NULL,
    c_relatn varchar(2) NOT NULL,
    c_legbas varchar(2) NOT NULL,
    c_admin varchar(2) NOT NULL,
    geocode varchar(3) NOT NULL,
    lsabound varchar(1) NOT NULL,
    startdat varchar(10),
    enddate varchar(10),
    popu_lsa integer NOT NULL,
    centlib integer NOT NULL,
    branlib integer NOT NULL,
    bkmob integer NOT NULL,
    master numeric(8,2) NOT NULL,
    libraria numeric(8,2) NOT NULL,
    totstaff numeric(8,2) NOT NULL,
    locgvt integer NOT NULL,
    stgvt integer NOT NULL,
    fedgvt integer NOT NULL,
    totincm integer NOT NULL,
    salaries integer,
    benefit integer,
    staffexp integer,
    prmatexp integer NOT NULL,
    elmatexp integer NOT NULL,
    totexpco integer NOT NULL,
    totopexp integer NOT NULL,
    lcap_rev integer NOT NULL,
    scap_rev integer NOT NULL,
    fcap_rev integer NOT NULL,
    cap_rev integer NOT NULL,
    capital integer NOT NULL,
    bkvol integer NOT NULL,
    ebook integer NOT NULL,
    audio_ph integer NOT NULL,
    audio_dl float NOT NULL,
    video_ph integer NOT NULL,
    video_dl float NOT NULL,
    databases integer NOT NULL,
    subscrip integer NOT NULL,
    hrs_open integer NOT NULL,
    visits integer NOT NULL,
    referenc integer NOT NULL,
    regbor integer NOT NULL,
    totcir integer NOT NULL,
    kidcircl integer NOT NULL,
    elmatcir integer NOT NULL,
    loanto integer NOT NULL,
    loanfm integer NOT NULL,
    totpro integer NOT NULL,
    totatten integer NOT NULL,
    gpterms integer NOT NULL,
    pitusr integer NOT NULL,
    wifisess integer NOT NULL,
    yr_sub integer NOT NULL
);
```

---

## CREATE INDEX

```sql
CREATE INDEX libname2014_idx ON pls_fy2014_pupld14a (libname);
```

```sql
CREATE INDEX stabr2014_idx ON pls_fy2014_pupld14a (stabr);
```

```sql
CREATE INDEX city2014_idx ON pls_fy2014_pupld14a (city);
```

```sql
CREATE INDEX visits2014_idx ON pls_fy2014_pupld14a (visits);
```

---

## COPY FROM

* Data: [pls_fy2014_pupld14a.csv](https://github.com/kirenz/datasets/blob/master/pls_fy2014_pupld14a.csv)

```sql
COPY pls_fy2014_pupld14a
FROM '/tmp/pls_fy2014_pupld14a.csv'
WITH (FORMAT CSV, HEADER);
```

---

## CREATE TABLE

* Creating and filling the 2009 Public Libraries Survey table.

* We only take a look at the first few lines:

```sql
CREATE TABLE pls_fy2009_pupld09a (
    stabr varchar(2) NOT NULL,
    fscskey varchar(6) CONSTRAINT fscskey2009_key PRIMARY KEY,
    libid varchar(20) NOT NULL,
    libname varchar(100) NOT NULL,
    address varchar(35) NOT NULL,
    city varchar(20) NOT NULL,
    zip varchar(5) NOT NULL,
    zip4 varchar(4) NOT NULL,
    cnty varchar(20) NOT NULL,
    phone varchar(10) NOT NULL,
    c_relatn varchar(2) NOT NULL,
    c_legbas varchar(2) NOT NULL,
    c_admin varchar(2) NOT NULL,
    geocode varchar(3) NOT NULL,
    lsabound varchar(1) NOT NULL,
    startdat varchar(10),
    enddate varchar(10),
    popu_lsa integer NOT NULL,
    centlib integer NOT NULL,
    branlib integer NOT NULL,
    bkmob integer NOT NULL,
    master numeric(8,2) NOT NULL,
    libraria numeric(8,2) NOT NULL,
    totstaff numeric(8,2) NOT NULL,
    locgvt integer NOT NULL,
    stgvt integer NOT NULL,
    fedgvt integer NOT NULL,
    totincm integer NOT NULL,
    salaries integer,
    benefit integer,
    staffexp integer,
    prmatexp integer NOT NULL,
    elmatexp integer NOT NULL,
    totexpco integer NOT NULL,
    totopexp integer NOT NULL,
    lcap_rev integer NOT NULL,
    scap_rev integer NOT NULL,
    fcap_rev integer NOT NULL,
    cap_rev integer NOT NULL,
    capital integer NOT NULL,
    bkvol integer NOT NULL,
    ebook integer NOT NULL,
    audio integer NOT NULL,
    video integer NOT NULL,
    databases integer NOT NULL,
    subscrip integer NOT NULL,
    hrs_open integer NOT NULL,
    visits integer NOT NULL,
    referenc integer NOT NULL,
    regbor integer NOT NULL,
    totcir integer NOT NULL,
    kidcircl integer NOT NULL,
    loanto integer NOT NULL,
    loanfm integer NOT NULL,
    totpro integer NOT NULL,
    totatten integer NOT NULL,
    gpterms integer NOT NULL,
    pitusr integer NOT NULL,
    yr_sub integer NOT NULL,
    obereg varchar(2) NOT NULL,
    rstatus integer NOT NULL,
    statstru varchar(2) NOT NULL,
    statname varchar(2) NOT NULL,
    stataddr varchar(2) NOT NULL,
    longitud numeric(10,7) NOT NULL,
    latitude numeric(10,7) NOT NULL,
    fipsst varchar(2) NOT NULL,
    fipsco varchar(3) NOT NULL
);
```

---

## CREATE INDEX

```sql
CREATE INDEX libname2009_idx ON pls_fy2009_pupld09a (libname);
```

```sql
CREATE INDEX stabr2009_idx ON pls_fy2009_pupld09a (stabr);
```

```sql
CREATE INDEX city2009_idx ON pls_fy2009_pupld09a (city);
```

```sql
CREATE INDEX visits2009_idx ON pls_fy2009_pupld09a (visits);
```

---

## COPY

* Data: [pls_fy2009_pupld09a.csv](https://github.com/kirenz/datasets/blob/master/pls_fy2009_pupld09a.csv)

```sql
COPY pls_fy2009_pupld09a
FROM '/tmp/pls_fy2009_pupld09a.csv'
WITH (FORMAT CSV, HEADER);
```

---
class: center, middle

# Exploring Data with Aggregate Functions

### Aggregate functions combine values from multiple rows and return a single result based on an operation on those values.

---

# Counting Rows and Values Using COUNT

- The `COUNT` aggregate function makes it easy to check the number of rows and peform other counting tasks.

- If we supply an **asterisk** as an input, such as `COUNT(*)`, the asterisk acts as a wildcard, so the function returns the number of table rows regardless of whether they include NULL (missing) values.

- Check number of rows for `pls_fy2014_pupld14a`:

```sql
SELECT COUNT(*)
FROM pls_fy2014_pupld14a;
```

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> count </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 9305 </td>
  </tr>
</tbody>
</table>

---

## Counting Rows and Values Using COUNT

* Check number of rows for `pls_fy2009_pupld09a`:

```sql
SELECT COUNT(*)
FROM pls_fy2009_pupld09a;
```

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> count </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 9299 </td>
  </tr>
</tbody>
</table>

---

## Counting Rows and Values Using COUNT

- Counting the number of `salaries` column 2014:

```sql
SELECT COUNT(salaries)
FROM pls_fy2014_pupld14a;
```

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> count </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 5983 </td>
  </tr>
</tbody>
</table>

- This number is far lower than the number of rows that exist in the table

---

# Counting Distinct Values in a Table

- When added after `SELECT`, `DISTINCT` returns a list of unique values

- When added to the `COUNT` function, `DISTINCT` causes the function to return a count of distinct values from a column

- Query counts all values in the 2014 table's `libname` column:

```sql
SELECT COUNT(libname)
FROM pls_fy2014_pupld14a;
```

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> count </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 9305 </td>
  </tr>
</tbody>
</table>

---

## Counting Distinct Values in a Table

- Counts all values in the 2014 table's libname column but includes `DISTINCT` in front of the column name:

```sql
SELECT COUNT(DISTINCT libname)
FROM pls_fy2014_pupld14a;
```

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> count </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 8515 </td>
  </tr>
</tbody>
</table>

---

# Finding Maximum and Minimum Values

### MAX

- Use a `SELECT` statement followed by the function `MAX` with the name of a column `visits` supplied:

```sql
SELECT MAX(visits)
FROM pls_fy2014_pupld14a;
```

---
## Finding Maximum and Minimum Values

### MIN

- Use a `SELECT` statement followed by the function `MIN` with the name of a column `visits` supplied:

```sql
SELECT MIN(visits)
FROM pls_fy2014_pupld14a;
```

* Note: `-3` is used to indicate "not applicable" and is used when a library agency has closed either temporarily or permanently.

---

## Aggregating Data Using GROUP BY

- When you use the `GROUP BY` clause with aggregate functions, you can group results according to the values in one or more columns.

- `GROUP BY` statement follows the `FROM` clause and includes the column name to group.

- The `stabr` (state abbreviation) grouped results:

```sql
SELECT stabr
FROM pls_fy2014_pupld14a
GROUP BY stabr;
```

---

class:center, middle

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> stabr </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> NV </td>
  </tr>
  <tr>
   <td style="text-align:left;"> OH </td>
  </tr>
  <tr>
   <td style="text-align:left;"> NY </td>
  </tr>
  <tr>
   <td style="text-align:left;"> WV </td>
  </tr>
  <tr>
   <td style="text-align:left;"> AR </td>
  </tr>
  <tr>
   <td style="text-align:left;"> CT </td>
  </tr>
</tbody>
</table>

---

# ORDER BY

- The `stabr` grouped results are in alphabetical order:

```sql
SELECT stabr
FROM pls_fy2014_pupld14a
GROUP BY stabr
ORDER BY stabr;
```

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> stabr </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> AK </td>
  </tr>
  <tr>
   <td style="text-align:left;"> AL </td>
  </tr>
  <tr>
   <td style="text-align:left;"> AR </td>
  </tr>
  <tr>
   <td style="text-align:left;"> AS </td>
  </tr>
  <tr>
   <td style="text-align:left;"> AZ </td>
  </tr>
  <tr>
   <td style="text-align:left;"> CA </td>
  </tr>
</tbody>
</table>

---

## ORDER BY

- Results get sorted by `city` and then by `stabr`:

```sql
SELECT city, stabr
FROM pls_fy2014_pupld14a
GROUP BY city, stabr
ORDER BY city, stabr;
```

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> city </th>
   <th style="text-align:left;"> stabr </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> ABBEVILLE </td>
   <td style="text-align:left;"> AL </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ABBEVILLE </td>
   <td style="text-align:left;"> LA </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ABBEVILLE </td>
   <td style="text-align:left;"> SC </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ABBOTSFORD </td>
   <td style="text-align:left;"> WI </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ABERDEEN </td>
   <td style="text-align:left;"> ID </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ABERDEEN </td>
   <td style="text-align:left;"> SD </td>
  </tr>
</tbody>
</table>

---

# Combining GROUP BY with COUNT

- We can get a count of agencies by state and sort them to see which states have the most.

- To sort the results and have the state with the largest number of agencies at the top, we can ORDER BY the COUNT function in descending order using DESC:

```sql
SELECT stabr, COUNT(*)
FROM pls_fy2014_pupld14a
GROUP BY stabr
ORDER BY COUNT(*) DESC;
```

---

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> stabr </th>
   <th style="text-align:right;"> count </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> NY </td>
   <td style="text-align:right;"> 756 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> IL </td>
   <td style="text-align:right;"> 625 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> TX </td>
   <td style="text-align:right;"> 556 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> IA </td>
   <td style="text-align:right;"> 543 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> PA </td>
   <td style="text-align:right;"> 455 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> MI </td>
   <td style="text-align:right;"> 389 </td>
  </tr>
</tbody>
</table>

---

# GROUP BY on Multiple Columns with COUNT

- The `stataddr` column contains a code indicating whether the agency's address changed in the last year.

- The values in stataddr are 00 (= no change from last year), 07 (=moved to a new location), 15 (=minor address change).

---

## GROUP BY on Multiple Columns with COUNT

- This code is counting the number of agencies in each state that moved, had a minor address change, or had no change using the `GROUP BY` with stabr and stataddr and adding `COUNT`:

```sql
SELECT stabr, stataddr, COUNT(*)
FROM pls_fy2014_pupld14a
GROUP BY stabr, stataddr
ORDER BY stabr, COUNT(*) DESC;
```

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> stabr </th>
   <th style="text-align:left;"> stataddr </th>
   <th style="text-align:right;"> count </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> AK </td>
   <td style="text-align:left;"> 00 </td>
   <td style="text-align:right;"> 70 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> AK </td>
   <td style="text-align:left;"> 15 </td>
   <td style="text-align:right;"> 10 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> AK </td>
   <td style="text-align:left;"> 07 </td>
   <td style="text-align:right;"> 5 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> AL </td>
   <td style="text-align:left;"> 00 </td>
   <td style="text-align:right;"> 221 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> AL </td>
   <td style="text-align:left;"> 07 </td>
   <td style="text-align:right;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> AR </td>
   <td style="text-align:left;"> 00 </td>
   <td style="text-align:right;"> 58 </td>
  </tr>
</tbody>
</table>

---

## Using GROUP BY on Multiple Columns with COUNT

- The effect of grouping by *two columns* is that `COUNT` will show the number of unique combinations of `stabr` and `stataddr`.

- The first few rows of the results show that the code "00" is the most common value for each state.

- This makes sense because it's likely there are more agencies that haven't changed address than those that have.

---

# Use SUM to aggregate values

- Query total visits to libraries in 20014:

- Use `SUM` to view total visits to libraries in 2014 (AS visits_2014).

- Use a `WHERE` clause to specify that the result should include only those rows where visits are greater than or equal to "0".

```sql
SELECT SUM(visits) AS visits_2014
FROM pls_fy2014_pupld14a
WHERE visits >= 0;
```

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> visits_2014 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 1425930900 </td>
  </tr>
</tbody>
</table>

---

## Use SUM to aggregate values

* Query total visits to libraries in 2009:

* Use `SUM` to view total visits to libraries in 2009 (AS visits_2009).

* Use a `WHERE` clause to specify that the result should include only those rows where visits are greater than or equal to "0".

```sql
SELECT SUM(visits) AS visits_2009
FROM pls_fy2009_pupld09a
WHERE visits >= 0;
```

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> visits_2009 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 1591799201 </td>
  </tr>
</tbody>
</table>

---

# Use SUM and JOIN to aggregate values

* Using SUM to query total visits on __joined__ 2014 and 2009 library tables:

* Use ALIAS: "pls_fy2014_pupld14a AS pls14" and "pls_fy2009_pupld09a AS pls09"
 * Use a `WHERE` clause to specify that the result should include only those rows where visits are greater than or equal to 0.
 * `JOIN` the two tables `ON` the primary key `fscskey` 
 
--

```sql
SELECT SUM(pls14.visits) AS visits_2014,
        SUM(pls09.visits) AS visits_2009        
FROM pls_fy2014_pupld14a AS pls14 JOIN pls_fy2009_pupld09a AS pls09
ON pls14.fscskey = pls09.fscskey
WHERE pls14.visits >= 0 AND pls09.visits >= 0;
```

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> visits_2014 </th>
   <th style="text-align:right;"> visits_2009 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 1417299241 </td>
   <td style="text-align:right;"> 1585455205 </td>
  </tr>
</tbody>
</table>

---

## Use SUM and JOIN to aggregate values

- In our next `SELECT` statement, we use `SUM` to total visits columns from 2014 and 2009 tables.

- We declare `pls14` as the __alias__ for the 2014 table and `pls09` as the __alias__ for the 2009 table to avoid having to write the full table names throughout the query.

- We use a standard `JOIN` or `INNER JOIN`, so the query results will only include rows where the primary key values of both tables match (`fscskey`).

- In the WHERE clause, we specify that the result should include only those rows where visits are greater than or equal to 0.

---

## Use SUM to aggregate values

* Query:

```sql
SELECT (SUM(pls14.visits) +
        SUM(pls09.visits)) AS total_visits    
FROM pls_fy2014_pupld14a AS pls14 JOIN pls_fy2009_pupld09a AS pls09
ON pls14.fscskey = pls09.fscskey
WHERE pls14.visits >= 0 AND pls09.visits >= 0;
```

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> total_visits </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 3002754446 </td>
  </tr>
</tbody>
</table>

---

# Using GROUP BY to track percent changes

- Now that we know library visits for the United States dropped as a whole between 2009 and 2014, we might want to know, whether every part of the country saw a decrease?

- Using `GROUP BY` to track percent change in library visits by state.

---

```sql
SELECT pls14.stabr,
       SUM(pls14.visits) AS visits_2014,
       SUM(pls09.visits) AS visits_2009,
       ROUND( (CAST(SUM(pls14.visits) AS decimal(10,1)) - SUM(pls09.visits)) /
                    SUM(pls09.visits) * 100, 2 ) AS pct_change
FROM pls_fy2014_pupld14a AS pls14 JOIN pls_fy2009_pupld09a AS pls09
ON pls14.fscskey = pls09.fscskey
WHERE pls14.visits >= 0 AND pls09.visits >= 0
GROUP BY pls14.stabr
ORDER BY pct_change DESC;
```

```{=html}
<div id="htmlwidget-cc0ae150bc759e0338a9" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-cc0ae150bc759e0338a9">{"x":{"filter":"none","fillContainer":false,"data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53"],["GU","DC","LA","MT","AL","AR","KY","CO","SC","SD","MA","AK","ID","NH","WY","MO","MS","VT","ME","WA","NM","AZ","IA","UT","TX","PA","NV","NE","ND","NJ","TN","CT","WI","CA","OH","KS","NY","VA","WV","IL","DE","OR","MI","IN","FL","MN","HI","MD","RI","NC","PR","GA","OK"],[103593,4230790,17242110,4582604,17113602,10762521,19256394,32978245,18178677,3899554,42011647,3486955,8730670,7508751,3666825,29527698,9155488,3779545,6909414,41151390,7340034,27552597,18042685,17524026,72876601,44791865,10525170,8333692,2181930,45848755,19457120,21972583,32661683,162787836,82495138,13931862,106453546,37405643,5367596,72598213,3834672,21472271,49775454,35441059,73165352,24541150,4875365,27793912,5259143,33952977,193279,28891017,13678542],[60763,2944774,15591805,4386504,16933967,10660058,19113478,32782247,18105931,3890392,42237888,3525093,8847034,7675823,3756294,30330912,9456124,3919746,7190141,43451221,7800479,29381343,19466908,18931103,78838400,49015763,11574976,9179047,2409123,50701852,21544350,24454917,36467048,182181408,92402369,15673977,119810969,42300780,6081791,82438755,4357495,24511629,59037718,42199118,87730886,29534616,6019676,34474171,6612167,43111094,257032,40922598,21171452],[70.49,43.67,10.58,4.47,1.06,0.96,0.75,0.6,0.4,0.24,-0.54,-1.08,-1.32,-2.18,-2.38,-2.65,-3.18,-3.58,-3.9,-5.29,-5.9,-6.22,-7.32,-7.43,-7.56,-8.62,-9.07,-9.21,-9.43,-9.57,-9.69,-10.15,-10.44,-10.65,-10.72,-11.11,-11.15,-11.57,-11.74,-11.94,-12,-12.4,-15.69,-16.01,-16.6,-16.91,-19.01,-19.38,-20.46,-21.24,-24.8,-29.4,-35.39]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>stabr<\/th>\n      <th>visits_2014<\/th>\n      <th>visits_2009<\/th>\n      <th>pct_change<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"pageLength":4,"columnDefs":[{"className":"dt-right","targets":[2,3,4]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[4,10,25,50,100]}},"evals":[],"jsHooks":[]}</script>
```

???

- The top of the results shows 10 states with an increase in visits from 2009 to 2014.
- The rest of the results show a decline.

---

# Using HAVING to filter the results of an aggregate query

- We can refine our analysis by examining a subset of states and territories that share similar characteristics.

- In a small state, one library closure could have a significant effect, whereas a single closure in a large state might be scarcely noticed in a statewide count.

- To look at states with similar volume in visits, we could sort the results by either of the visits column, but it is cleaner to get a smaller result set in our query.

- We are familiar with using `WHERE` for filtering, but aggregate functions, such as `SUM`, can't be used within a `WHERE` clause because `WHERE` operates at a row level, and aggregate functions work across rows.

- `HAVING` places conditions on groups created by aggregating.

---

## Using HAVING to filter the results of an aggregate query

* - We set our query to show only the very larges states (visits greater than 50 million)

---

## Using HAVING to filter the results of an aggregate query

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> stabr </th>
   <th style="text-align:right;"> visits_2014 </th>
   <th style="text-align:right;"> visits_2009 </th>
   <th style="text-align:right;"> pct_change </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> TX </td>
   <td style="text-align:right;"> 72876601 </td>
   <td style="text-align:right;"> 78838400 </td>
   <td style="text-align:right;"> -7.56 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> CA </td>
   <td style="text-align:right;"> 162787836 </td>
   <td style="text-align:right;"> 182181408 </td>
   <td style="text-align:right;"> -10.65 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> OH </td>
   <td style="text-align:right;"> 82495138 </td>
   <td style="text-align:right;"> 92402369 </td>
   <td style="text-align:right;"> -10.72 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> NY </td>
   <td style="text-align:right;"> 106453546 </td>
   <td style="text-align:right;"> 119810969 </td>
   <td style="text-align:right;"> -11.15 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> IL </td>
   <td style="text-align:right;"> 72598213 </td>
   <td style="text-align:right;"> 82438755 </td>
   <td style="text-align:right;"> -11.94 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> FL </td>
   <td style="text-align:right;"> 73165352 </td>
   <td style="text-align:right;"> 87730886 </td>
   <td style="text-align:right;"> -16.60 </td>
  </tr>
</tbody>
</table>

- `HAVING` reduces the number of rows in the output to just six

- Each of the six states has experienced a decline in vists, but the percentage variation isn't as wide as in the full set of states.

---

## Thank you!

[Prof. Dr. Jan Kirenz](https://www.kirenz.com)

HdM Stuttgart  
Nobelstraße 10  
70569 Stuttgart

![](https://upload.wikimedia.org/wikipedia/commons/5/5e/HdM_Logo.svg)