Using Computational Workflows
When performing bioinformatic or data science analysis, researchers often need to perform a series of interconnected computational transformations on raw input data. While it’s possible to coordinate multiple tasks using BASH scripts or direct batch submission to a SLURM cluster, workflow management systems provide a more robust, reproducible, and maintainable approach.
Workflow management systems are specialized software tools designed to:
- Orchestrate multi-step analyses - Automatically manage dependencies between tasks
- Handle parallelization - Run independent tasks simultaneously to save time
- Manage data flow - Track inputs, outputs, and intermediate files
- Ensure reproducibility - Define exact software versions and execution environments
- Enable portability - Run the same workflow on different computing infrastructures
- Provide resilience - Automatically retry failed tasks and resume interrupted workflows
- Track provenance - Record what was run, when, and with what parameters
Workflow Systems at Fred Hutch
At Fred Hutch, we support two primary workflow systems:
WDL (Workflow Description Language)
Best for:
- Users who want a simple, human-readable workflow language
- Integration with Fred Hutch infrastructure via PROOF
- Projects that would benefit from collaborative Fred Hutch workflow development
Key features:
- Open-source language originally developed by Broad Institute, now maintained by OpenWDL
- WILDS WDL Library provides tested, reusable WDLs built by and for Fred Hutch scientists
- PROOF platform for easy execution on Fred Hutch cluster
Learn more:
- WDL Workflows Guide - Language fundamentals
- WDL Execution Engines - How to run WDL workflows
- WILDS WDL Library - Ready-to-use modules and workflows
- GATK Workflows
- PROOF How To - Step-by-step guide to using PROOF
- PROOF Troubleshooting - Common issues and solutions using PROOF
Nextflow
Best for:
- Users comfortable with Groovy/Java-like syntax
- Integration with nf-core community workflows
- Projects needing fine-grained control over execution
Key features:
- Mature ecosystem with extensive community support
- Large collection of pre-built workflows via nf-core
- Active Fred Hutch community
Learn more:
- Nextflow at Fred Hutch - Getting started guide
- Nextflow Catalog - Fred Hutch curated workflows
- nf-core - Community workflow catalog
Choosing a Workflow System
Both WDL and Nextflow are excellent choices. Here’s guidance on which might be better for your needs:
| Consideration | WDL | Nextflow |
|---|---|---|
| Learning curve | Gentle - simple, declarative syntax | Moderate - Groovy/Java-like syntax |
| Fred Hutch integration | Excellent via PROOF and Cirro | Excellent via direct execution and Cirro |
| Community resources | Growing (WILDS, Broad, Terra) | Extensive (nf-core, global community) |
| Execution options | Multiple engines (Sprocket, miniWDL, Cromwell) | Nextflow runtime |
| Local testing | Easy with Sprocket/miniWDL | Easy with Nextflow |
| Pre-built workflows | WILDS WDL Library, GATK workflows, BioWDL | nf-core (500+ workflows) |
| Best for | All skill levels | Intermediate+ users |
Decision Guide
Choose WDL if you:
- Are new to workflow systems
- Want to use PROOF for easy cluster submission
- Prefer straightforward, readable workflow definitions
- Want to leverage WILDS WDL Library components
Choose Nextflow if you:
- Have programming experience (especially Java/Groovy)
- Want access to the extensive nf-core workflow catalog
- Need complex workflow logic or custom operations
- Prefer a single, unified execution runtime
Can’t decide? Both are great choices! Consider:
- Starting with WDL via PROOF to learn workflow concepts with minimal setup
- Exploring the WILDS WDL Library or nf-core catalog to see which has workflows closer to your needs
- Joining the #workflow-managers Slack channel to ask the community
Getting Started
For WDL Users
- Learn the basics - Read the WDL Workflows guide
- Explore examples - Browse the WILDS WDL Library
- Test locally - Install Sprocket and try a vignette
- Run on the cluster - Use PROOF to submit workflows
- Get help - Email wilds@fredhutch.org or join #workflow-managers Slack
For Nextflow Users
- Learn the basics - Read Nextflow at Fred Hutch
- Explore workflows - Browse the Nextflow Catalog or nf-core
- Install Nextflow - Follow the official installation guide
- Run your first workflow - Try an nf-core pipeline or Fred Hutch workflow
- Get help - Join #workflow-managers Slack
Fred Hutch Resources
Getting Help
- #workflow-managers Slack - FH-Data Slack channel for community support
- Data House Calls - Schedule a consultation with the WILDS team
- Email support - wilds@fredhutch.org for WDL questions
Workflow Libraries
- WILDS WDL Library - Tested WDL modules and workflows for bioinformatics
- Nextflow Catalog - Fred Hutch curated Nextflow workflows
- Fred Hutch Nextflow repos - Community workflows
Execution Platforms
- PROOF - User-friendly interface for WDL workflows on Fred Hutch cluster
- Gizmo - Fred Hutch SLURM cluster for direct Nextflow execution
- Local execution - Sprocket, miniWDL, or Nextflow on your workstation
- Cirro - Managed cloud environment for storing large datasets and running workflows
- Cloud platforms - AWS, Terra, GCP (via WDL or Nextflow)
Training and Learning Resources
WDL
- Data Science Lab: Developing WDL Workflows - Comprehensive course
- OpenWDL Learn WDL - Official tutorials
- WDL Documentation - Language reference
Nextflow
- Nextflow Training - Official Nextflow training
- nf-core Usage Docs - Running nf-core pipelines
- Nextflow Patterns - Common workflow patterns
Best Practices
Regardless of which workflow system you choose:
- Start small - Test with small datasets before scaling up
- Use version control - Track your workflows in Git
- Containerize - Use Docker/Apptainer for reproducibility
- Document - Include README files and example inputs
- Test - Validate outputs before running large analyses
- Version workflows - Tag releases when publishing results
- Share - Contribute back to community libraries if your workflow is public
Common Questions
Q: Which is faster? A: Performance depends more on your workflow design and resource allocation than the workflow system itself. Both are highly optimized.
Q: Can I run WDL workflows without PROOF? A: Yes! You can use Sprocket, miniWDL, or Cromwell directly. See the WDL Execution Engines guide.
Q: Do I need to learn Docker? A: Basic Docker knowledge is very helpful. If you start writing workflows you will likely need to install and run Docker at some point. There are many pre-built Docker containers for both WDL and Nextflow (e.g. WILDS Docker Library) so you may not need to create new containers yourself.
Q: Where should I store my workflow data?
A: See our data storage guide for recommendations. Use /fh/scratch for intermediate files and long-term storage for inputs/outputs.
Q: Can workflows use Fred Hutch environment modules? A: Yes for WDL (see software environments), but containers are preferred for portability.
Getting Involved
The Fred Hutch workflow community is active and collaborative:
- Join #workflow-managers - Share tips and get help
- Contribute workflows - Add to WILDS WDL Library or create Fred Hutch repos
- Attend trainings - Watch for Data Science Lab courses
- Schedule consultations - Get personalized help via Data House Calls
Whether you’re just starting with workflows or looking to optimize existing pipelines, the Fred Hutch community is here to help you succeed!