R and RStudio
Updated: October 6, 2023
Edit this Page via GitHub Comment by Filing an Issue Have Questions? Ask them here.R is a common statistical and computing language used in a variety of biomedical data analyses, visualizations and computing settings. R itself can be downloaded to install it on your local computer from the Comprehensive R Archive Network project, or CRAN, or via the FH Center IT’s Self Service Tools (on Macs or on PCs). Call the IT Helpdesk if you do not have permissions to install or update R on your local computer.
RStudio
The RStudio IDE is a free and open source, popular interface to R, which supports a number of added features including a GUI that can be customized to view aspects of your work of most importance (such as plots, files, environment variables, scripts, workspaces, etc). RStudio can be downloaded here and requires an installation of R itself first. Keep in mind that updates to R/RStudio subsequently will require a two step process, one to update R itself, and the other to update the interface to R, RStudio.
RStudio has a few particularly useful features:
- Support for R Markdowns/Notebooks
- Integration with git or SVN
- Concurrent views of code, plots, files, and environment variables with customized panels.
- Direct deployment of Shiny apps via Shinyapps.io
- R package management and direct installation capabilities
R Packages and Extensions
There are a number of available resources built on R that are free and open source that can greatly expand the utility of R and RStudio for research purposes. There are currently three main sources of R packages that are of interest to a majority of the research community.
Bioconductor
Bioconductor is a public repository of R bioinformatics packages. Bioconductor packages are curated for intercompatibility and grouped into workflows (eg. CyTOF, ChIP-seq, eQTL, etc…). New Bioinformatic tools often result in a submission of the corresponding packages to Bioconductor. These are reliable, well vetted packages that undergo a rigorous process for submission.
CRAN
CRAN, (Comprehensive R Archive Network) is a public repository of numerous R packages along with R itself. Numerous packages are available, though packages are not vetted as heavily as Bioconductor and generally are required to successfully be built, but may not always perform reliably, or be fully documented.
GitHub
GitHub hosts many open source R packages. As they are not vetted or peer-reviewed, these packages can be more experimental than those on CRAN or Bioconductor and thus you will want to proceed with caution. Some basic instructions on how to install packages into your local R/RStudio are included in this vignette.
Local (Desktop) Use
When using R/RStudio locally, you have the option to install a number of different packages from multiple sources. Depending on the source of the package, you may approach downloading and installing them slightly differently but you manage the various packages installed, the versions of them as well as the version of R you are using them with.
Remote (Rhino
and Gizmo
) Use
If computing resources beyond what is available via your desktop are required, you may consider running R scripts from the rhinos
or gizmo
. SciComp
makes pre-built R modules available for your use in order to facilitate more reproducible and reliable use of software on the local cluster.
Current R Modules on Rhino
/Gizmo
SciComp maintains a range of various builds of R on Rhino
and Gizmo
for use by researchers. Each build has different packages installed and versions of R itself, thus identifying if an existing R build matches your needs is a first step to using R on Rhino
or Gizmo
. Specific information about which R Modules are available- including more information about packages installed in them- can be found on our dedicated R Module page. If you do not see the software you are looking for, email scicomp
to request it or add your own GitHub issue in the easybuild-life-sciences repo. Either way, please be specific about the source and version of the software you are interested in.
Rhino
You can connect to a rhino
machine either with ssh
(use PuTTY
on Windows) or NoMachine. Once on a rhino
machine, choose a version of R. You can see a list of R versions available with the command
module spider R
Choose one and invoke it with ml
. For example:
ml R/3.6.2-foss-2016b-fh1
Note that you can use tab-completion with the above command. For example, you can type
ml R/
and press the tab key, and you’ll see a list of options that begin with R/
. You can narrow this down further by typing more characters, so typing
ml R/3.6
and then pressing tab will show you all versions of R whose version number starts with 3.6
.
Once you’ve chosen a version of R
you can invoke it just by typing:
R
Gizmo
To run R
on a gizmo node, you can follow the same instructions as for rhino
above. If you want to run RStudio
, see the next section.
Run RStudio Server on an HPC machine
To run RStudio Server on the gizmo
compute cluster, simply open a browser and go to
https://rstudio-launcher.fredhutch.org. You will be prompted to log in with your Fred Hutch HutchNet ID and password. This requires that you be on campus or using VPN.
This site will help you launch, manage and kill RStudio sessions on gizmo
without having to do so manually via terminal/rhino
. When you create new RStudio sessions via the application, this single site will manage the launch process given the parameters you specify. It will return the information you’ll need to access and manage your sessions to the table in the site. You can have mulitple RStudio sessions running simultaneously, and each session will have its own specific URL where you will be able to use RStudio through your browser.
When starting a new RStudio session, you can choose which version of R to run (beginning in April 2022 all new versions of R will be supported, but the only older version that will work is R-4.0.2
). You can also specify how many CPU cores and how many GB of memory you want, as well as whether you need a GPU and how long you want the server to run if the defaults specified do not meet your needs. These parameters can be different for each RStudio session you create. Keep in mind that the larger the resources requested are, the longer it will take for your server to start up.
If you have issues or questions in using this application, please email helpdesk
and describe the issues you’re having.
Plotting in RStudio
You may discover that plots (specifically axis labels) look very low-resolution.
The solution to that is to go to the Tools
menu in RStudio Server, choose
Global Options...
, and under General
, click on the Graphics tab.
In the Backend
dropdown, choose AGG
. Now click Apply
and then OK
.
The improved resolution will be visible with any new plot you create
(it won’t apply to existing plots in the Plots
tab).
To make the same change within RMarkdown documents created with the knitr
package,
put the following line at the beginning of your first
code chunk. This should cause plots and other graphics to render
with a higher resolution.
knitr::opts_chunk$set(dev="CairoPNG")
Run a Jupyter Notebook or Lab on a cluster node
You can run a Jupyter Lab on a cluster node, with the R language (go here if you want to use Jupyter with Python).
This requires that you be on campus or connected to VPN.
Install a Jupyter Kernel in your desired version of R
You’ll need to install and activate the IRkernel
package in the
minor version of R
All versions of R that share the same y
of the x.y.z
version number. For example, R-4.2.1 and R-4.2.2 share the same minor version number. R-4.1.1 and R-4.2.2 do not.
you want to use with Jupyter. You only need to do this once for each minor version.
To do this, connect to a rhino
node as described above and load the desired R module and start R. For example:
ml fhR/4.3.1-foss-2022b
R
In R, run these commands:
install.packages("IRkernel", repos="https://cran.r-project.org")
IRkernel::installspec()
Set up your .Rprofile file
By default, R expects an X11 server to be running, and you need to tell it you will not be using one. To do that, edit the
.Rprofile
file in your home directory. Add these contents:
if (!is.na(Sys.getenv("JPY_PARENT_PID", unset = NA))) {
options(bitmapType = 'cairo')
}
This says that if you are running inside a Jupyter lab
or notebook, to use the Cairo
package instead of X11.
If you are not running in Jupyter, no change is made.
You only need to do this once.
Grab a node
You will need to start a Jupyter server on a gizmo node.
Run the grabnode
command to get a node, and specify how many CPU cores you want, how much memory, and how long you will want your session to run.
At the node’s command prompt, load the R module you loaded in a previous step, and then the most recent (default) version of the JupyterLab
module, for example:
ml fhR/4.3.1-foss-2022b
ml JupyterLab
You will be using Jupyter Lab, not Jupyter Notebook. This should not be a problem as Lab is newer and is the recommended and supported choice going forward.
To start Jupyter Lab, run this command:
jupyter lab --ip=$(hostname) --port=$(fhfreeport) --no-browser
This command will spit out several URLs.
One of the URLs will contain the machine name (starting with gizmo
). Copy that URL to your local computer’s clipboard.
Now open a new browser window and paste the URL into the address bar and press Enter.
Click R
under Notebooks
or use the File / Open
menu item to open an existing one.
Running a Jupyter Lab in Visual Studio Code
Click the View
menu and choose Command Palette
. Start typing jupyter
and then choose Create: New Jupyter Notebook
. Now click Select Kernel
, then Existing Jupyter Server...
and paste in the URL from the previous step.
The Tidyverse
The Tidyverse is a group of R packages that coordinate together and are commonly used for manipulating and visualizing data in data science applications. There are a number of useful packages for research based users that are part of the Tidyverse, and it’s worth the time to learn about them and see how one might employ them to clean, analyze and convey data and results. DataCamp has an online Introduction to the Tidyverse that can be useful when first evaluating whether these packages might be useful.
Shiny
Shiny is an R package bundled with RStudio that enables the creation of interactive applications powered by R code. These apps can be viewed on any computer running RStudio, or they can be hosted on a server. Scicomp provides instructions for hosting Shiny apps here.
Local resources
- Seattle useR group
- R-Ladies Seattle
- Cascadia RConf, a local yearly conference
Updated: October 6, 2023
Edit this Page via GitHub Comment by Filing an Issue Have Questions? Ask them here.