01| Setup for R

Miles Robertson, 10.25.23 (edited 01.10.24)

Introduction

There's a little setup required in order to get R up and running on your computer. In addition, there are some other programs that are very important for collaborating on code with other people. This tutorial will walk you through the process of setting up R and RStudio, as well as Git, and talk about the role that each of these programs play.

The R Programming Language

R is a programming language with focus on statistical applications. Generally speaking, a programming language is a defined syntax that, if followed, can then be turned into machine code. In other words, the "R language" is a set of notation rules for us as coders to follow, and if we follow them, our typed-out code is turned into ones and zeros (i.e., machine code) that the computer can execute. R has been around since 1993 (compare to: C in 1972, Python in 1991, Java in 1995, Julia in 2012), designed by Ross Ihaka and Robert Gentleman, with heavy inspiration from the S programming language (started 1975). As of October 2023, R is ranked as the 17th most popular coding languages by the TIOBE index, with Python, C and C++ leading the pack.

It is a popular language because it is free, open-source (which means that anyone can legally change the "background" code that makes R work), and is well-established amongst many data scientists. It has myriad packages relating to all realms of biology, which have been developed by academics throughout the world. Despite its positive qualities, R has some shortcomings that make it a difficult first language to learn. For example, R has inconsistent naming conventions, which means there are several different patterns used for naming variables and functions. That makes variable and function names hard to remember and distinguish. Contrary to common belief, the messy nature of R is not because it is an open-sourced language (e.g., Python is an open-source language and has very consistent naming conventions), but instead because R conventions are poorly defined and enforced. Additionally, those who design R packages often try to make sure that complete statistical analyses can be computed with just a few lines of code. This is good in that you'll almost always get the output you need, but it might be buried in a lot of results that you don't need. However, the fact that R is so common in data science means that there are many resources available to help overcome problems.

"Downloading R" has the effect of telling your computer how to interpret and run .R files. This is different from the purpose of RStudio, which is discussed in the next section. In the box below, instructions are provided on how to get R onto your computer.

TODO: Install R

  • Go to https://cran.r-project.org/
  • Click on the link for your operating system (e.g., Windows, Mac, Linux).
    • Windows users: click the base link, and then click the top link that looks like Download R-#.#.# for Windows, whatever the version number may be.
    • Mac users: Look for the title that says "Latest release". Depending on which type of Mac you have, you'll click on either the link under "For Apple silicon (M1/M2) Macs" or the link under "For older Intel Macs".
  • Follow the instructions given as you go through those steps.

RStudio

RStudio is distinct from R in that it is an integrated development environment (IDE). An IDE is a graphical program that allows you to write code, run code, and see the output of your code all in one place. RStudio is a popular IDE for R, and it is free to download and use. It is not necessary to use RStudio to write R code, but it is used by most R users. In fact, RStudio has recently pivoted to also support editing of Python code.

Instructions to install RStudio are provided in the box below.

TODO: Install RStudio

  • Go to https://posit.co/download/rstudio-desktop/
  • Click on the link to download RStudio.
    • Mac users: At some point, you'll see a window pop up that shows the Applications folder and the RStudio app icon. Make sure to drag the RStudio icon into the Applications folder.

Git

Git is a version control system (VCS). A VCS is a program that allows you to track changes to files. It is most commonly used for code, but can be used for any type of file. Git is a popular VCS, and it is free to download and use. It is not necessary to use Git to write code, but it is used by most programmers. Git is a command-line program, which means that it is run from the terminal (more info on that later). Git is a powerful tool that allows you to track changes to files, revert to previous versions of files, and collaborate with others on code. It is important to note that Git is not the same as GitHub, which will be discussed later.

Luckily for Mac users, Git comes pre-installed and no setup is required. Windows users will have to follow the instructions in the box below to install Git.

TODO: Install Git

  • Windows users: Go to https://git-scm.com/download/win/ and click the link to download Git. Doing so will also install Git Bash for Windows, which is a terminal we will use later.

Conclusion

Now we've gone over the purpose of R, RStudio, and Git. Below, you'll be able to practice using R and RStudio. Using Git and the terminal will be handled in future sections.


Practice

Setting up R Studio

When you open RStudio for the first time, you'll see three windows. The left window is the console, which is where you can type code and see the output of that code. When you create or open a file, this left side is then broken into two windows, one for the file and one for the console. The top right window is the environment, which is where you can see all of the variables that are currently defined. The bottom right window is used for many purposes, but is frequently used to look at help documents, plot outputs and files.

There's a few settings to be aware of. Go to Tools > Global Options in the top ribbon. In the window that pops up, click on the Code tab on the left. It should automatically have the Editing tab open. Change Tab width: from 2 to 4. This is just saying how many spaces are put into your code when you hit tab. The default of 2 is far too small for code readability. Then, click on the Display tab on the top. Make sure that the Highlight R function calls and Use rainbow parentheses boxes are checked. Both options will help you see what is happening in your code more clearly. Click Apply to save your changes.

If desired, you can also change the theme of RStudio. This can be done by clicking Appearance in the same Global Options page, and then selecting a theme from the dropdown menu. Click Apply to save your changes.

Creating your first R file

As stated above, the console window in RStudio can run R code. However, it does not save the code that you type. In order to save code, you need to create an R script. To do this, click File > New File > R Script in the top ribbon. This will create a new, unsaved, untitled file, which opens above the console in a new fourth window. In this window, type the following code:

print("Hello world!")

Place your cursor on that line of code, and press Ctrl + Enter (or Cmd + Return for Mac), and you'll see your line in the console window, as well as the message "Hello world!". Now, press Ctrl + S (or Cmd + S for Mac) to save your new file under a name and folder of your choice.

Keyboard Shortcuts

Return to the previous chapter and try out all 14 keyboard shortcuts introduced there. Pick two or three to practice until you have them memorized.