Rstudio

Rstudio

Rstudio is a terrific "development environment" for R. It includes:

  • An editor that is aware of how R works. It colorizes different bits of code; indents in an informative way; completes your thought when you hit TAB; and deals with several varieties of R code files.
  • Separate panels for keeping track of your objects, displaying graphics and showing the help screens.
  • Built in templates for complicated things like shiny projects.
  • Searchable help facility
  • State persistence across network failures.

It makes R a lot easier to use and like R itself, there is a free version that you can install on almost any platform. (Unlike R there is also a paid version -- but never mind that).

An interesting feature of Rstudio is that it is mostly javascript and it's standalone application is essentially a browser. As a consequence, running Rstudio on the Demography Lab cloud via your browser is remarkably fast and efficient. Which is why we do it.

If you are unfamiliar with R or Rstudio you can learn about them at https://www.rstudio.com/resources/webinars/ -- and of course wherever Google takes you. But if your an experienced R user, you'll figure out Rstudio pretty fast.

Visit https://rstudio.demog.berkeley.edu and give your Demography Lab username and password. Don't have those?

One thing to keep in mind as you muck around in Rstudio is that the R process that runs inside Rstudio is independent of Rstudio. This can be a bit confusing in that bad things can happen to either R or Rstudio separately. When things go wrong, it is often because you have asked R to do the impossible -- but Rstudio chugs on regardless.

Projects, home directories and reproducible results

By default Rstudio opens in your home directory. This is not a crime, but it should offend your sense of order and impel you to store your work in "projects" which are essentially directories which keep all the related bits of your scientific project together. Project directories are particularly helpful for those of you who would like reproducible results.

Creating a new project in Rstudio

  • Step by step:

    You can choose to put the new project in an existing directory -- one that might already contain some R and data files. Or you can create a new empty folder (directory) to hold your new project.
  • When you're finished, you'll have before you a pristine workspace -- not contaminated by variables from other R sessions
  • To access the various projects that you are working on, Look in the upper right corner of the Rstudio screen:

Deleting an obsolete project

There is no official Rstudio inbuilt way of deleting a no longer relevant project. But since projects are just directories with a special Rstudio created file in them, it's pretty easy to just delete the project directory. Once the project directory is removed or re-named, Rstudio will detect it's absence the next time you try to open the project in Rstudio. When that happens you'll get a warning message about the project's absence. You simply hit the "OK" button and the project will be removed from the selection list.

Best practices and Gotchas

One of Rstudio's essential features is that after 30 idle minutes, it goes to "sleep" by writing its state to a big file and then giving back all the RAM and other resources that your session was consuming. Note that this is different from R's habit of storing your workspace in a big file when you quit R. In this case you aren't quitting R -- Rstudio (including R) are just going into hibernation -- in the form of a (potentially very) large file stored in your home directory.

For moderate sized projects, Rstudio wakes up nearly instantly and one might never even be aware of this feature. However, when your R environment is large, Rstudio can tank minutes to wake up. In fact, your environment is large enough, the browser might time out before Rstudio wakes up.

The best way to cope with this, when working on big projects is to explicitly quit Rstudio without saving your R environment whenever you are done computing. This of course is hard to do if your project is long running and it's past your nap time. I In that case you'll want to ultimately run that project in BATCH mode. To do so, write your R program such that everything is either a comment or else it's code that needs to run. (Why wouldn't you do this always anyway?). Then from a Linux shell:

nohup R CMD BATCH --no-save comandfile.R commandfile.log &

This is best done from a terminal window such as ssh. Note that "nohup" tells the computer to keep running the program even after you logoff, '--no-save' tells R not to save its workspace on exiting and, '&' tells the computer to run the command in the "background" -- that is without talking to the terminal. NB: unless you want to annoy other users, please do not launch intensive computational tasks on the NoMachine gateway machine (quigley). It is really not designed to handle CPU intensive work nor does it have heaps of memory. Instead, ssh in your terminal window to server keyfitz or another compute server for such tasks. A further detail is that the nohup prefix is not really necessary if you wish to stay connected to the compute server in the terminal window, which is mostly costless in NoMachine.

saving and loading files in R

When working with large projects and trying to avoid Rstudio hibernation, it is often useful to write your data objects to disk using R's "save()" function so that they can be read back into R using the "load()" function. There are other ways of course to read and write data but save() and load() are fast and can deal with complex objects. The speed thing is what's important if one must quit and restart Rstudio repeatedly. (for example: https://www.r-bloggers.com/load-save-and-rda-files/)

Don't let R save your workspace on q()

By default, R wants to save all the objects in your workspace to a file called RData and reads those objects back in whenever you start R from that same directory. This an anachronism from a time when scientists cared much less about being able to reproduce their results.(Here's a rant if you need convincing). The problem is that when you are not sure what objects are in your workspace, it's very easy to accidentally midspell something and thereby use something that you had not intended to in a computation. It's even more fraught with peril if you intentionally store objects in RData that you intend to re-use. If you do that, then you can never be sure that you didn't change the object in a way that is not reflected in your stored R code. Once again, not a great thing if you want to be able to reproduce your results.

To prevent R/Rstudio from storing and re-reading your workspace when you quit R:

  • In Rstudio goto Tools -> Global Options and
    uncheck "Restore data from .RData into workspace on startup" set Save workspace to .RData on exit to "Never"

That will help you develop the good, clean and efficient habits of:

  • Always writing code that works in batch mode
  • Explicitly saving what should be saved and not what shouldn't

When things go wrong...

On occasion, Rstudio will stop working for you, e.g. your get 402 or timeout messages in your browser, or a spinning icon never gives over to your session. Resetting the Rstudio session assigned to you should get you back on track. How to do this? Here are some steps that may allow you to do this yourself:

  1. Use a different server. Our backup Rstudio server is rstudio-coale.demog.berkeley.edu which for all intents and purposes should run identically to rstudio.demog.berkeley.edu. This does not resolve the underlying problem, but does allow you a quick workaround to get critical work finished. The steps below apply as well to stuck sessions on rstudio-coale as well.
  2. Try killing the session. This requires console or shell access to the machine that the rstudio server is running on. E.g, for rstudio.demog.berkeley.edu this might be keyfitz (subject to change). From NoMachine / nmx.demog.berkeley.edu aka quigley, you would ssh keyfitz.demog.berkeley.edu and from there find out if you have an active session that is running:
    rstudio-server active-sessions
    and see if your user ID shows up in the list. If so, terminate your browser window or tab that is trying to talk to Rstudio, and then run the following on the keyfitz console
    rstudio-server kill-session PID
    where PID is the number in the first column next to your active session list. The retry access via your browser to rstudio.demog.berkeley.edu.
  3. Removing Rstudio session data cache. If the above step does not resolve things, it is likely owning to damaged metadata for you Rstudio session. Removing this damaged data should allow you to reestablish an Rstudio session.
    • navigate to where the session data is stored. This will be either ~/.local/share/rstudio/sessions/active/ or /var/tmp/XDG_DATA_HOME/${USER}/rstudio/sessions/active/ on coale. The session ID folders will look like session-f016540d' and you would just delete any / all of these folders to force Rstudio to reestablish the session info for you project. After removing session data, repeat the above step, and you should be good to go.