When performing bioinformatic or data science analysis, researchers often need to perform a series of interconnected computational transformations on raw input data. While it’s possible to coordinate multiple tasks using BASH scripts or direct batch submission to a SLURM cluster, workflow management systems provide a more robust, reproducible, and maintainable approach.

Workflow management systems are specialized software tools designed to:

  • Orchestrate multi-step analyses - Automatically manage dependencies between tasks
  • Handle parallelization - Run independent tasks simultaneously to save time
  • Manage data flow - Track inputs, outputs, and intermediate files
  • Ensure reproducibility - Define exact software versions and execution environments
  • Enable portability - Run the same workflow on different computing infrastructures
  • Provide resilience - Automatically retry failed tasks and resume interrupted workflows
  • Track provenance - Record what was run, when, and with what parameters

Workflow Systems at Fred Hutch

At Fred Hutch, we support two primary workflow systems:

WDL (Workflow Description Language)

Best for:

  • Users who want a simple, human-readable workflow language
  • Integration with Fred Hutch infrastructure via PROOF
  • Projects that would benefit from collaborative Fred Hutch workflow development

Key features:

  • Open-source language originally developed by Broad Institute, now maintained by OpenWDL
  • WILDS WDL Library provides tested, reusable WDLs built by and for Fred Hutch scientists
  • PROOF platform for easy execution on Fred Hutch cluster

Learn more:

Nextflow

Best for:

  • Users comfortable with Groovy/Java-like syntax
  • Integration with nf-core community workflows
  • Projects needing fine-grained control over execution

Key features:

  • Mature ecosystem with extensive community support
  • Large collection of pre-built workflows via nf-core
  • Active Fred Hutch community

Learn more:

Choosing a Workflow System

Both WDL and Nextflow are excellent choices. Here’s guidance on which might be better for your needs:

Consideration WDL Nextflow
Learning curve Gentle - simple, declarative syntax Moderate - Groovy/Java-like syntax
Fred Hutch integration Excellent via PROOF and Cirro Excellent via direct execution and Cirro
Community resources Growing (WILDS, Broad, Terra) Extensive (nf-core, global community)
Execution options Multiple engines (Sprocket, miniWDL, Cromwell) Nextflow runtime
Local testing Easy with Sprocket/miniWDL Easy with Nextflow
Pre-built workflows WILDS WDL Library, GATK workflows, BioWDL nf-core (500+ workflows)
Best for All skill levels Intermediate+ users

Decision Guide

Choose WDL if you:

  • Are new to workflow systems
  • Want to use PROOF for easy cluster submission
  • Prefer straightforward, readable workflow definitions
  • Want to leverage WILDS WDL Library components

Choose Nextflow if you:

  • Have programming experience (especially Java/Groovy)
  • Want access to the extensive nf-core workflow catalog
  • Need complex workflow logic or custom operations
  • Prefer a single, unified execution runtime

Can’t decide? Both are great choices! Consider:

Getting Started

For WDL Users

  1. Learn the basics - Read the WDL Workflows guide
  2. Explore examples - Browse the WILDS WDL Library
  3. Test locally - Install Sprocket and try a vignette
  4. Run on the cluster - Use PROOF to submit workflows
  5. Get help - Email wilds@fredhutch.org or join #workflow-managers Slack

For Nextflow Users

  1. Learn the basics - Read Nextflow at Fred Hutch
  2. Explore workflows - Browse the Nextflow Catalog or nf-core
  3. Install Nextflow - Follow the official installation guide
  4. Run your first workflow - Try an nf-core pipeline or Fred Hutch workflow
  5. Get help - Join #workflow-managers Slack

Fred Hutch Resources

Getting Help

Workflow Libraries

Execution Platforms

  • PROOF - User-friendly interface for WDL workflows on Fred Hutch cluster
  • Gizmo - Fred Hutch SLURM cluster for direct Nextflow execution
  • Local execution - Sprocket, miniWDL, or Nextflow on your workstation
  • Cirro - Managed cloud environment for storing large datasets and running workflows
  • Cloud platforms - AWS, Terra, GCP (via WDL or Nextflow)

Training and Learning Resources

WDL

Nextflow

Best Practices

Regardless of which workflow system you choose:

  1. Start small - Test with small datasets before scaling up
  2. Use version control - Track your workflows in Git
  3. Containerize - Use Docker/Apptainer for reproducibility
  4. Document - Include README files and example inputs
  5. Test - Validate outputs before running large analyses
  6. Version workflows - Tag releases when publishing results
  7. Share - Contribute back to community libraries if your workflow is public

Common Questions

Q: Which is faster? A: Performance depends more on your workflow design and resource allocation than the workflow system itself. Both are highly optimized.

Q: Can I run WDL workflows without PROOF? A: Yes! You can use Sprocket, miniWDL, or Cromwell directly. See the WDL Execution Engines guide.

Q: Do I need to learn Docker? A: Basic Docker knowledge is very helpful. If you start writing workflows you will likely need to install and run Docker at some point. There are many pre-built Docker containers for both WDL and Nextflow (e.g. WILDS Docker Library) so you may not need to create new containers yourself.

Q: Where should I store my workflow data? A: See our data storage guide for recommendations. Use /fh/scratch for intermediate files and long-term storage for inputs/outputs.

Q: Can workflows use Fred Hutch environment modules? A: Yes for WDL (see software environments), but containers are preferred for portability.

Getting Involved

The Fred Hutch workflow community is active and collaborative:

Whether you’re just starting with workflows or looking to optimize existing pipelines, the Fred Hutch community is here to help you succeed!

Updated: