Chapter 4

Chapter 4 is an introduction to the RStudio environment and scripting. The authors assume that to this point you have been typing all of your code into the console to run it. As the code that we need to run gets lengthier, the use of R scripts becomes more advantageous.

New scripts (.R files) can be created by clicking on File -> New File -> R Script. The keyboard shortcuts are CMD-Shift-N or Ctrl-Shift-N for Macs and Windows machines respectively.

Code can be typed into the script window (upper left) and then submitted — see the next section. Alternatively, you can run code using the console and copy the stuff that works into the script window. The former approach is preferable because if encourages saving working code regularly.

Running Code

The script editor is a good place to write and run code. It is difficult to see the flow of code in the console, and output gets in the way of the input. To run an individual command (either single or multiple lines of code), just place the cursor somewhere in the command code and hit either CMD-Enter (Mac) or Ctrl-Enter (Windows). You can also run the entire script by hitting CMD-Shift-S (Mac) or Ctrl-Shift-S (Windows).

The text does not mention it, but the “History” is a great place to get code for a script. You can view the history in the second tab of the upper right RStudio window. The “To Console” button will send highlighted text to the console. Yes, the “To Source” button sends hightlighted text to the script at the current cursor location.

Generally, I prefer to use R Markdown files for code and text. You have seen R markdown already and the book covers it in Chapters 21, 23, and 24. R markdown lends itself to reproducibility — something that is big in the statistics community.

Scripts are very useful when we wish to run code on a regular basis — often unattended. We can use R’s command line interface to run scripts in a batch mode. This is helpful if, for example, we wish to do nightly updates of the PITCHf/x baseball data to get the latest MLB game information (see the pitchRx package).

Conventions and Style

Most R programmers prefer to load libraries at the beginning of a script. This allows people who borrow your code to see what they need to have installed before attempting to run the script. At the same time, installing packages and changing system settings is frowned upon. You should comment out lines that do nasty things so that suggestions for installation methods and settings are available but not run.

RStudio Diagnostics

R, and by extension RStudio, diagnostics suck. Ask Rick Cornez how much he hates them. If you are used to programming in a “real” language, you will find the minimal information provided by R to be frustrating. Remember that R is interpreted and not compiled. R is always open for input and this means that it has a harder time guessing what you are thinking. It does try to provide you some clues as to why it is upset. After some practice, you will learn to spot the more common mistakes (e.g. missing closing parentheses or quotation marks) fairly easily.

The RStudio environment adds to R’s diagnostics. It often will provide a workaround for code that would cause R to fail. Keep an eye on the left hand side of the editor window for red indicators of problems. You might also get a squiggly line in the code at a point where RStudio thinks R will be confused.

In the RStudio editor we can hover over the indicated error to get help. This code is so bad that it will not run.

  i j <- 1:10  # Hover over the code to see "unexpected token..."
## Error: <text>:1:5: unexpected symbol
## 1:   i j
##         ^
  i <- j <- 1:10    # This, while hard to read, is correct 
  i == j       # Confirming that i is a copy of j
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
  i == NaN     # Hover over the code to see "use is.nan..."
##  [1] NA NA NA NA NA NA NA NA NA NA
  is.nan(i)    # Place the cursor between the n and the ( to see a prompt 
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Note that i is equal to j in all rows. R does not know what to do with a comparison against NaN and returns missings (NA). The use of is.nan is succesful and we see that all of the elements of i are not not numbers.