Rstudio is a terrific "development environment" for R. It includes:
- An editor that is aware of how R works. It colorizes different bits of code; indents in an informative way; completes your thought when you hit TAB; and deals with several varieties of R code files.
- Separate panels for keeping track of your objects, displaying graphics and showing the help screens.
- Built in templates for complicated things like shiny projects.
- Searchable help facility
- State persistence across network failures.
It makes R a lot easier to use and like R itself, there is a free version that you can install on almost any platform. (Unlike R there is also a paid version -- but never mind that).
If you are unfamiliar with R or Rstudio you can learn about them at https://www.rstudio.com/resources/webinars/ -- and of course wherever Google takes you. But if your an experienced R user, you'll figure out Rstudio pretty fast.
One thing to keep in mind as you muck around in Rstudio is that the R process that runs inside Rstudio is independent of Rstudio. This can be a bit confusing in that bad things can happen to either R or Rstudio separately. When things go wrong, it is often because you have asked R to do the impossible -- but Rstudio chugs on regardless.
Projects, home directories and reproducible results
By default Rstudio opens in your home directory. This is not a crime, but it should offend your sense of order and impel you to store your work in "projects" which are essentially directories which keep all the related bits of your scientific project together. Project directories are particularly helpful for those of you who would like reproducible results.
Creating a new project in Rstudio
- Step by step:
You can choose to put the new project in an existing directory -- one that might already contain some R and data files. Or you can create a new empty folder (directory) to hold your new project.
- When you're finished, you'll have before you a pristine workspace -- not contaminated by variables from other R sessions
- To access the various projects that you are working on, Look in the upper right corner of the Rstudio screen:
Deleting an obsolete project
There is no official Rstudio inbuilt way of deleting a no longer relevant project. But since projects are just directories with a special Rstudio created file in them, it's pretty easy to just delete the project directory. Once the project directory is removed or re-named, Rstudio will detect it's absence the next time you try to open the project in Rstudio. When that happens you'll get a warning message about the project's absence. You simply hit the "OK" button and the project will be removed from the selection list.
Best practices and Gotchas
One of Rstudio's essential features is that after 30 idle minutes, it goes to "sleep" by writing its state to a big file and then giving back all the RAM and other resources that your session was consuming. Note that this is different from R's habit of storing your workspace in a big file when you quit R. In this case you aren't quitting R -- Rstudio (including R) are just going into hibernation -- in the form of a (potentially very) large file stored in your home directory.
For moderate sized projects, Rstudio wakes up nearly instantly and one might never even be aware of this feature. However, when your R environment is large, Rstudio can tank minutes to wake up. In fact, your environment is large enough, the browser might time out before Rstudio wakes up.
The best way to cope with this, when working on big projects is to explicitly quit Rstudio without saving your R environment whenever you are done computing. This of course is hard to do if your project is long running and it's past your nap time. I In that case you'll want to ultimately run that project in BATCH mode. To do so, write your R program such that everything is either a comment or else it's code that needs to run. (Why wouldn't you do this always anyway?). Then from a Linux shell:
nohup R CMD BATCH --no-save comandfile.R commandfile.log &
This is best done from a terminal window such as ssh. Note that "nohup" tells the computer to keep running the program even after you logoff, '--no-save' tells R not to save its workspace on exiting and, '&' tells the computer to run the command in the "background" -- that is without talking to the terminal. NB: unless you want to annoy other users, please do not launch intensive computational tasks on the NoMachine gateway machine (quigley). It is really not designed to handle CPU intensive work nor does it have heaps of memory. Instead, ssh in your terminal window to server keyfitz or another compute server for such tasks. A further detail is that the nohup prefix is not really necessary if you wish to stay connected to the compute server in the terminal window, which is mostly costless in NoMachine.
saving and loading files in R
When working with large projects and trying to avoid Rstudio hibernation, it is often useful to write your data objects to disk using R's "save()" function so that they can be read back into R using the "load()" function. There are other ways of course to read and write data but save() and load() are fast and can deal with complex objects. The speed thing is what's important if one must quit and restart Rstudio repeatedly. (for example: https://www.r-bloggers.com/load-save-and-rda-files/)
Don't let R save your workspace on q()
By default, R wants to save all the objects in your workspace to a file called RData and reads those objects back in whenever you start R from that same directory. This an anachronism from a time when scientists cared much less about being able to reproduce their results.(Here's a rant if you need convincing). The problem is that when you are not sure what objects are in your workspace, it's very easy to accidentally midspell something and thereby use something that you had not intended to in a computation. It's even more fraught with peril if you intentionally store objects in RData that you intend to re-use. If you do that, then you can never be sure that you didn't change the object in a way that is not reflected in your stored R code. Once again, not a great thing if you want to be able to reproduce your results.
To prevent R/Rstudio from storing and re-reading your workspace when you quit R:
- In Rstudio goto Tools -> Global Options and
uncheck "Restore data from .RData into workspace on startup" set Save workspace to .RData on exit to "Never"
That will help you develop the good, clean and efficient habits of:
- Always writing code that works in batch mode
- Explicitly saving what should be saved and not what shouldn't