The Hutch Data Core is a Shared Resource at the Fred Hutchinson Cancer Research Center which works to support the needs of researchers who utilize large amounts of data in their research. We have a particular focus on the bioinformatic analysis of large-scale datasets generated by modern instrumentation such as genome sequencing, mass spectrometry, high-throughput imaging, electron microscopy, etc. While the needs of researchers vary widely by technical domain, scientific goals, and computational complexity, the Data Core works to provide support via:
- Data portals for automated analysis and visualization
- Automated bioinformatics workflows for common tasks in genomics, imaging, proteomics, etc.
- Project-oriented consultation (via the Bioinformatics Core)
- Instructor-led trainings & self-paced learning
With recent advancements in modern technology for the analysis and visualization of complex datasets, it has become possible to connect researchers directly with their data using interactive webpages referred to as Data Portals.
The Data Core is actively developing a set of data portals designed to provide scientific insight across a variety of research areas. These include:
- Data Atlases for the high-performance visualization of large-scale datasets (such as single-cell sequencing);
- PubWeb: an interactive platform for executing bioinformatics workflows in the cloud;
- Carousel: a flexible system for rendering and sharing interactive datasets.
For more information, please view the Data Portal Resources.
The bioinformatic process of analyzing large datasets often requires a series of computational steps, each of which may require a different set of software dependencies. To help coordinate and streamline the execution of these workflows, researchers around the world have started to adopt a set of software tools for workflow management, such as Nextflow, Cromwell, and Snakemake.
One of the ways in which the Data Core works to provide support for bioinformatic analysis is by helping to put these workflow management tools directly into the hands of Fred Hutch researchers.
This includes assistance with running computational workflows on different computational resources (individual computers, the on-premise HPC cluster, or in the “cloud”), curation of pre-existing workflows for commonly used tasks, and assistance with the development of novel workflows to implement new scientific ideas.
Our Workflow Resources include:
- Guidance for running automated workflows on Fred Hutch HPC resources (SLURM and AWS)
- A catalog of curated bioinformatics workflows (e.g. RNAseq, pan-genome analysis)
- Building your own automated workflows (e.g., from existing BASH scripts)
If you have any questions about using automated workflows for your research, please don’t hesitate to get in touch.
The process of analyzing datasets generated for a particular experiment or project can be complex, often requiring deep expertise in the technology used to generate the raw data as well as the computational tools needed to process them. The Bioinformatics Core provides researchers with support for this analysis, engaging on the basis of specific projects.
Bioinformatics staff are available by appointment for one-on-one consultation. We are happy to discuss experimental design, choice of data analysis strategies and software tools, or to help with advice and troubleshooting as you conduct your own analyses.
We strongly encourage researchers to consult with a bioinformatics specialist at the earliest stages of a project to ensure an appropriate experimental design is in place prior to seeking data analysis support.
While there are many resources available online for building skills in computational analysis of complex datasets, it often be difficult for researchers to know where to start or what approaches will be the most useful. To help provide some structure for researcher-driven skills development, we work to provide a combination of self-directed as well as in-person learning opportunities.
Depending on your background and interest, you may find it helpful to sign up for an instructor-led training; to work through a self-directed learning module; to attend one of the office hours held by experts in a particular domain; or to explore a wider range of resources available on the internet.
For more information, browse our information resources for learning opportunities.
The Data Core and Bioinformatics Core work to maintain a core set of data resources which are available to be used by the entire Fred Hutch community. The primary set of data files which are currently available are a collection of frequently used reference genomes provided for high-performance computing on the shared filesystem. Please contact the Data Core if there are additional data resources which could be added to provide value to researchers across multiple research groups.
For more information, browse our documentation of the iGenomes reference genomes hosted in
Updated: August 13, 2021Edit this Page via GitHub Comment by Filing an Issue Have Questions? Ask them here.