Data Exploration in R
2020-12-16
Welcome
This book provides an introduction to data exploration in R. To use the code in this book, activate the following packages:
library(tidyverse)
library(gt)
To illustrate the different data exploration methods, we use the dataset wage
from James et al. (2000), which contains wage and other data for a group of 3000 male workers in the Mid-Atlantic region.
library(tidyverse)
<- read_csv("https://raw.githubusercontent.com/kirenz/datasets/master/wage.csv") wage_df
The data frame includes 3000 observations on the following 11 variables:
X1
: An ID variableyear
: Year that wage information was recordedage
: Age of workermaritl
: A factor with levels: 1. Never Married 2. Married 3. Widowed 4. Divorced and 5. Separated indicating marital statusrace
: A factor with levels: 1. White 2. Black 3. Asian and 4. Other indicating raceeducation
: A factor with levels: 1. < HS Grad 2. HS Grad 3. Some College 4. College Grad and 5. Advanced Degree indicating education levelregion
: Region of the country (mid-atlantic only)jobclass
: A factor with levels: 1. Industrial and 2. Information indicating type of jobhealth
: A factor with levels: 1. <=Good and 2. >=Very Good indicating health level of workerhealth_ins
: A factor with levels: 1. Yes and 2. No indicating whether worker has health insurancelogwage
: Log of workers wagewage
: Workers raw wage
Note that this book mainly covers the use of a collection of R packages called the tidyverse, an ecosystem of packages designed with common APIs and a shared philosophy. An R package is simply a bundle of functions, documentation, and data sets. There are about 25 packages in the tidyverse and they are especially designed for data science and share an underlying design philosophy, grammar, and data structures.
This online book is licensed using the Creative Commons Attribution-NonCommercial 2.0 Generic (CC BY-NC 2.0) License.