Data Visualization
Updated: November 11, 2022
Edit this Page via GitHub Comment by Filing an Issue Have Questions? Ask them here.A growing area of large-scale data analysis is the visualization and sharing of results of analyses. Data scientists need to communicate complex data and results in concise and clear ways, leading to an explosion of platforms, tools, software and approaches for data visualization.
On this page, we provide an overview of resources for learning how to visualize data, software for data visualization, and tools developed at Fred Hutch. While this is not an exhaustive list, we have highlighted what tends to be the most commonly employed or easiest to access resources.
Code-based data visualizations
Plotting in R
While it is possible to plot using base R, there are many packages available to make plotting easier and more visually appealing. Data visualization in R has been dominated by the {ggplot} package and a wealth of add-on packages that allow for further customization (such as {RColorBrewer} for color palettes and themes, etc). Meanwhile, the communication of data visualizations via interactive webapps like Shiny apps, are also R based and lend themselves well to displaying {ggplot} and {plotly} type visualizations.
Packages for plotting
Packages that extend {ggplot} capabilities
- ggbeeswarm
- ggtext
- read more about ggplot extensions here
Packages for arranging plots
Packages for coloring plots
Plotting in Python
Historically, the Matplotlib
had been the go-to library for scientific data visualization in Python. Matplotlib
is still a powerful plotting tool, but its syntax is complex and the graphics can look outdated when compared to R’s {ggplot2}. Matplotlib
is still often used over other Python data visualization libraries (particularly for machine learning workflows), but that this is due more to tradition in the software development community than better features.
The seaborn
library was developed as an easier to use and updated version of Matplotlib
and the plotnine
library was developed to mimic {ggplot2}’s grammar of graphics style plotting syntax. Still, some Python users choose to do their data processing in Python and switch to R for visualization.
The plotly
and Altair
libraries in Python are two options for interactive visualizations.
- Seaborn
- Matplotlib
- Plotnine
- Fredhutch.io’s Introduction to Python course covers plotnine in class 4
- Plotly
- Altair
Desktop software for data visualization
Fred Hutch’s Center IT (CIT) supports a wide range of commonly used software at little to no cost to you! We’ve pulled out a shortlist of software relevant to data visualization, but you can view the entire software catalog here. Tableau, MATLAB, and Microsoft Excel all are great options for users who prefer a point and click data visualization tools.
Community resources
The FH-Data Slack, and more specifically the #data-viz
channel, is always available as a space for researchers to ask questions and share resources about data visualization.
- Join the FH-Data Slack and follow the #data-viz channel. This channel is a space to share visualization ideas, ask questions, and troubleshoot code!
- The Data Science Learning Community helps data professionals learn together. Post questions in their help channels or join a book club.
- The Data Visualization Center is co-sponsored by Fred Hutch and Brotman Baty. They develop infrastructure and technology for visualization and analysis of data including scRNA-Seq, ATAC-Seq, and CyTOF. They focus on grant-funded collaborations involving data integration, analytical pipelines, and publishing interactive visualization websites for large-scale data.
Learning resources
Books that cover data visualization
Books can be a great way to dive deeper into a specific coding subject and fortunately many of these books are available online for free! The Fundamentals of Data Visualization by Claus Wilke is a great reference for code agnostic data visualization concepts. For language specific data visualization references, books and documentation that cover a specific language (like Python or R) will often also cover the basics of plotting in that language.
General
R
- R for Data Science, Chapter 3: Data visualisation - Garrett Grolemund and Hadley Wickham
- ggplot2: Elegant Graphics for Data Analysis - Hadley Wickham, Danielle Navarro, and Thomas Lin Pedersen
- Modern Data Visualization with R - Robert Kabacoff
- R Graphics Cookbook - Winston Chang
Python
Other data visualization resources!
Data visualization focused blogs and screencasts can be a great way to find inspiration and think outside the box.
Updated: November 11, 2022
Edit this Page via GitHub Comment by Filing an Issue Have Questions? Ask them here.