Introduction to R

Make your first steps in R.

Jan Kirenz https://www.kirenz.com (HdM Stuttgart, University of Applied Sciences)

Welcome

The R programming language is a powerful open source tool for statistics, machine learning and data science. Learning to program in R is quite similar to learning a foreign language. And just as with learning any foreign language you need lots of practice to improve your skills. The purpose of this site is to provide you with helpful resources to get started learning R.

Note that this overview mainly covers the use of a collection of R packages called the tidyverse, an ecosystem of packages designed with common APIs and a shared philosophy. An R package is simply a bundle of functions, documentation, and data sets. There are about 25 packages in the tidyverse and they are especially designed for data science and share an underlying design philosophy, grammar, and data structures.


Getting Started

Before we can start analyzing data in R, there are some key concepts you need to understand first (Ismay and Kim 2022):

R & RStudio

What are R and RStudio and how to install them?

RStudio Cloud

If you should have troubles installing R or RStudio on your machine you can also use RStudio Cloud at https://rstudio.cloud/.

RStudio Cloud is a lightweight, cloud-based solution that allows you to use R and RStudio online. Choose the option Get Started for Free and create an account.

Coding

How do I code in R?

Packages

What are R packages?

Inspect a dataset

Inspect a dataset

Tidyverse

Import data

Before you can manipulate data with R, you need to import the data into R’s memory, or build a connection to the data that R can use to access the data remotely (Grolemund 2020):

Import data

Programming

Program

Visualize data

Visualize data

Exploratory data analysis

This book chapter will show you how to use visualisation and transformation to explore your data in a systematic way, a task that statisticians call exploratory data analysis, or EDA for short (Wickham, Çetinkaya-Rundel, and Grolemund 2023):

Exploratory data analysis

Transform data

The dplyr package is a part of the tidyverse and provides simple “verbs”, functions that correspond to the most common data manipulation tasks, to help you translate your thoughts into code:

Transform data

Tidy data

In this chapter, you will learn a consistent way to organise your data in R, an organisation called tidy data (Wickham, Çetinkaya-Rundel, and Grolemund 2023):

Tidy data

Quarto

Quarto is an open-source scientific and technical publishing system which you can use to create dynamic content with Python, R, Julia, and Observable:

Learn how to setup and use Quarto

Acknowledgments

This is a website made with the distill package.

Grolemund, Garrett. 2020. The Tidyverse Cookbook.” https://rstudio-education.github.io/tidyverse-cookbook/.
Ismay, Chester, and Albert Y Kim. 2022. Statistical Inference via Data Science: A ModernDive into R and the Tidyverse. CRC Press. https://moderndive.netlify.app.
Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund. 2023. R for data science (2ed). O’Reilly Media, Inc. https://r4ds.hadley.nz/.

References

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. Source code is available at https://github.com/kirenz/introduction-to-r, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".