Make your first steps in R.
The R programming language is a powerful open source tool for statistics, machine learning and data science. Learning to program in R is quite similar to learning a foreign language. And just as with learning any foreign language you need lots of practice to improve your skills. The purpose of this site is to provide you with helpful resources to get started learning R.
Note that this overview mainly covers the use of a collection of R packages called the tidyverse, an ecosystem of packages designed with common APIs and a shared philosophy. An R package is simply a bundle of functions, documentation, and data sets. There are about 25 packages in the tidyverse and they are especially designed for data science and share an underlying design philosophy, grammar, and data structures.
Before we can start analyzing data in R, there are some key concepts you need to understand first (Ismay and Kim 2022):
What are R and RStudio and how to install them?
If you should have troubles installing R or RStudio on your machine you can also use RStudio Cloud at https://rstudio.cloud/.
RStudio Cloud is a lightweight, cloud-based solution that allows you to use R and RStudio online. Choose the option Get Started for Free and create an account.
Before you can manipulate data with R, you need to import the data into R’s memory, or build a connection to the data that R can use to access the data remotely (Grolemund 2020):
This book chapter will show you how to use visualisation and transformation to explore your data in a systematic way, a task that statisticians call exploratory data analysis, or EDA for short (Wickham, Çetinkaya-Rundel, and Grolemund 2023):
The dplyr package is a part of the tidyverse and provides simple “verbs”, functions that correspond to the most common data manipulation tasks, to help you translate your thoughts into code:
In this chapter, you will learn a consistent way to organise your data in R, an organisation called tidy data (Wickham, Çetinkaya-Rundel, and Grolemund 2023):
Quarto is an open-source scientific and technical publishing system which you can use to create dynamic content with Python, R, Julia, and Observable:
Learn how to setup and use Quarto
This is a website made with the distill package.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. Source code is available at https://github.com/kirenz/introduction-to-r, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".