Getting Started with Common Languages and Tools

Edit this Page via GitHub       Comment by Filing an Issue      Have Questions? Ask them here.

Training Resources

Fred Hutch offers training resources for many of the common languages and tools listed below. Course descriptions can be found at fredhutch.io, and course registration is tracked in Hutch Learning (only accessible to Fred Hutch employees through CenterNet; search for “fredhutch.io” to see all relevant courses). New class offerings will also be posted in the Coop Newsletter. Email thecoop to join the mailing list.

R, RStudio

R is a programming language and also a software development environment. It is widely used among statisticians and has strong capabilities for statistical modeling and data analysis. While R’s core functions are fairly small, there is a robust community of user-contributed R ‘packages’ (eg. see “Bioconductor” on the target reference page). You can download R for your computer, install it from Center IT’s Self Service (on Macs or on PCs), or run R on SciComp’s computing clusters using environment modules.

RStudio is a graphical frontend to R that also improves upon a basic R installation, providing syntax-highlighting and code-completion, static or dynamic reports (via RMarkdown documents), and easing the creation of R packages, among other functionalities. It is considered an IDE which functions much like a wrapper around R itself, to create a graphic user interface, and easy access to various tools and functions that enhance the user’s experience of using R.

Python

Python is another language used extensively within the bioinformatic community. A very high-level comparison of Python to other commonly used languages is that it’s generally on the easier side for being able to learn and understand, but it doesn’t give you as much detailed control over the details of computation compared to C. In other words, it’s easier to use, but not quite as performant. One of the nice things about Python in recent years has been that there is a large community of software developers contributing highly efficient code as installable modules, which makes the entire codebase more valuable for your average user.

Managing and Sharing Code

While version control software has evolved over time, a new evolution that is happening more and more is the need for a wider group of researchers to actively use version control resources to manage their code and documentation of processes that are ongoing in their research. From the perspective of reproducibility, shareability and interoperability, the need for a sharing platform that integrates version control and collaboration is becoming more and more a critical part of a researcher’s toolkit. Thus, regardless of the degree to which code plays a direct role in a research project, more and more often at least a cursory understanding of what GitHub is and how it can be utilized in scientific research is important.

Linux, Unix and Shell Resources

Unix is the foundation for both Linux and macOS, and is the operating system that is most commonly used for developing and executing bioinformatic software tools. In order to navigate a Unix-based operating system and execute commands, it is extremely useful to use the command line interface, which is generally referred to as BASH (Bourne-Again SHell).

Other Languages

Julia

The Julia language aims to combine the accessible syntax of R or Python with the speed of C/C++ programs. While not currently as functional as R or Python for bioinformatic tasks, there is a growing collection of resources for Julia for bioinformatics.

Go

Go, or GoLang also has some support for data science.

Perl

Perl is a computational language often found in bioinformatic analyses. The language was originally developed in 1987. perl.org has numerous tutorials and modules for learning the language.

Updated:

Edit this Page via GitHub       Comment by Filing an Issue      Have Questions? Ask them here.