Docker containers package software with all dependencies into standardized units that run consistently across different computing environments. This guide covers using and creating Docker containers at Fred Hutch.

What is Docker?

Think of a Docker container as a completely separate, self-contained computer running inside your computer. When you run a Docker container, it has:

  • Its own file system: The container has its own directory structure that’s completely separate from your computer’s file system
  • Its own software installations: All programs, libraries, and dependencies are pre-installed inside the container and isolated from your computer
  • Its own environment: Environment variables, system settings, and configurations are specific to the container
  • Isolation from your system: Changes made inside the container don’t affect your computer, and your computer’s software doesn’t interfere with the container

For example, you might have Python 3.9 installed on your laptop, but a Docker container could run Python 3.11 without any conflict. The container operates in its own isolated environment, accessing your data only when you explicitly grant permission by mounting folders.

This isolation is what makes Docker so powerful for research: you can package an entire computational environment (specific tool versions, dependencies, configurations) into a container that runs identically on any computer, anywhere.

Docker, Apptainer, and Singularity: What’s the Difference?

Before diving in, it’s important to understand these related technologies:

  • Docker: The original container platform. Requires administrator privileges to run, which is why it works great on your laptop but not on shared computing clusters.

  • Apptainer/Singularity: A container platform designed for scientific computing and HPC environments. Apptainer is the new name for Singularity (they’re the same thing). It can run Docker containers but doesn’t need administrator privileges, making it perfect for shared computing environments like Fred Hutch’s cluster.

In practice: You’ll typically create containers using Docker (because the tools are easier), then run them using Apptainer on the cluster. Apptainer can pull and run Docker images directly from Docker Hub.

What Do You Want to Do?

Using Existing Docker Images

Many bioinformatics tools are already available as pre-built containers, so you often don’t need to create your own.

WILDS Docker Library

The WILDS Docker Library provides 30+ pre-built, security-scanned images for popular bioinformatics tools (STAR, GATK, BWA, Samtools, Cell Ranger, and more). Images are versioned, tested, and designed to work seamlessly with WDL and other workflow systems.

Other Sources

  • Docker Hub: Largest public registry. Check image popularity, update frequency, and documentation before using.
  • BioContainers: Community-driven containers automatically built from Bioconda recipes.
  • Fred Hutch on Docker Hub: Additional images maintained by Fred Hutch SciComp.

Running Docker on Your Local Computer

Installing Docker Desktop

To use Docker containers on your laptop or desktop, you’ll need Docker Desktop installed:

Note that after installing you need to start Docker Desktop. It’s not enough to just install it, the application must be running for Docker to work. You’ll know Docker is running when you see the Docker icon in your system tray (Windows) or menu bar (Mac).

Basic Docker Commands

Once Docker Desktop is running, these commands will get you started:

# List your local images
docker images

# Pull an image from Docker Hub to your local computer
docker pull getwilds/samtools:1.19

# Run a container with a specific command: samtools --version
docker run getwilds/samtools:1.19 samtools --version

# Automatically remove the container (--rm) when it exits
docker run --rm getwilds/samtools:1.19 samtools --version

# List running containers
docker ps

# List all containers including stopped ones
docker ps -a

Running a Container Interactively

Sometimes you want to “look around” inside a container to check what’s installed.

Use the -it flag (interactive mode with a terminal).

# Start an interactive session (-it = interactive terminal)
docker run -it getwilds/samtools:1.19

# Now you're inside the container
# Your terminal promt will now be something like root@a1b2c3d4e5f6:/

# You can run bash commands such as:
ls
which samtools
samtools --version

# Type exit when you're done to go back to your terminal
exit

Mounting a Folder (accessing your data from the container)

To give the container access to files on your computer, you “mount” a folder.

Use the -v flag.

Example: You have data files in /Users/yourname/data that you want to process:

# Run STAR aligner on your data
docker run -v /Users/yourname/data:/data getwilds/star:2.7.6a \
  STAR --runMode genomeGenerate \
       --genomeDir /data/genome_index \
       --genomeFastaFiles /data/genome.fa

In this example, genome_index and genome.fa are located in /Users/yourname/data on your computer. Because we mounted that folder to /data inside the container, we reference them as /data/genome_index and /data/genome.fa in the command.

Using Docker on The Cluster

On the Fred Hutch cluster (Gizmo/Rhino), you’ll use Apptainer instead of Docker to run containers. Apptainer is pre-installed on the cluster and can pull and run any Docker image directly from Docker Hub.

Basic Apptainer Commands

Once you’re logged into the cluster, the commands below will get you started.

# Module load Apptainer first
ml Apptainer

# Run a Docker image and execute a command: samtools --version
apptainer exec docker://getwilds/samtools:1.19 samtools --version

# Pull and convert Docker image (ceates samtools_1.19.sif)
apptainer pull docker://getwilds/samtools:1.19

# Run a .sif with a specific command
apptainer exec samtools_1.19.sif samtools --version

As with Docker, you can run containers interactively (shell) and mount folders (--bind).

# Run interactively
apptainer shell samtools_1.19.sif

# Mount local folder to container
apptainer exec --bind /fh/fast/mylab/data:/data \
  docker://getwilds/samtools:1.19 \
  samtools --version

These commands are explained in detail in the container section of our Bash for Bioinformatics course.

Note: Apptainer builds a container in its own format (.sif) from a Docker image when you run exec and pull. Building containers is resource-intensive, so it is best practice to do this on a gizmo compute node instead of the shared rhino node. You can use gizmo via an interactive session or sbatch.

Managing Apptainer’s Cache

Apptainer caches downloaded images in ~/.apptainer/cache to speed up future runs. Over time, this cache can grow quite large (several GB), especially if you use many different containers.

To check your cache size:

du -sh ~/.apptainer/cache

To clean the cache and free up space:

# Module load Apptainer first
ml Apptainer

# Remove all cached images
apptainer cache clean

# Remove items older than 30 days
apptainer cache clean --days 30    

Using Docker with Workflows

Docker containers integrate seamlessly with workflow systems like WDL, Nextflow, and Snakemake. The workflow engine automatically pulls and runs containers as needed, ensuring every step of your pipeline uses the correct software versions. Here, we’ll focus on WDL as an example.

WDL Workflows

In a WDL workflow, each task can specify which Docker container to use in the runtime section:

task runSTAR {
  input {
    File genome_fasta
    File reads_fastq
  }

  command {
    STAR --runMode genomeGenerate \
         --genomeDir genome_index \
         --genomeFastaFiles ${genome_fasta}
  }

  runtime {
    docker: "getwilds/star:2.7.6a"
  }

  output {
    Directory index = "genome_index"
  }
}

Describe the container you want to use in this way: registry/namespace/repository:tag

So for getwilds/star:2.7.6a:

  • registry (optional, defaults to Docker Hub) = Not used
  • namespace (organization/user name) = getwilds
  • repository (image name) = star
  • tag (version) = 2.7.6a

Note: The WILDS Docker Library provides pre-built, tested Docker images for the WILDS WDL Library which can also be used for other things.

Creating Your Own Docker Images

If pre-made images don’t meet your needs, you can create custom Docker images using a Dockerfile.

When to Build Your Own Container

  • You need specific versions of tools that aren’t available in existing containers
  • You want to include custom scripts or configuration files in the container
  • You need to ensure your exact analysis environment is reproducible by others

Basic Dockerfile Structure

A Dockerfile is a recipe that tells Docker how to build your container. It’s usually a text file named Dockerfile (no extension), but can technically take any name.

Simple example:

# Start from a base image
FROM ubuntu:22.04

# Add metadata about the image
LABEL maintainer="your.email@fredhutch.org"
LABEL description="Custom analysis environment"
LABEL version="1.0"

# Install software dependencies
RUN apt-get update && apt-get install -y \
    python3 \
    python3-pip \
    wget \
    && rm -rf /var/lib/apt/lists/*

# Install additional packages
RUN pip3 install numpy pandas scipy

# Copy spedific files from your computer
COPY analysis_pipeline.py /usr/local/bin/
COPY utils.py /usr/local/bin/

# Set working directory
WORKDIR /data

Tip: Only include small files in your container (scripts, configs, small reference files). Large datasets should be mounted when you run the container, not baked into the image. This keeps your image size manageable and makes it more reusable.

Common Dockerfile Instructions:

  • FROM: Specifies the base image to build from
  • LABEL: Adds metadata (author, description, version, etc.)
  • RUN: Executes commands during image build (install software, etc.)
  • COPY: Copies files from your computer into the image
  • WORKDIR: Sets the working directory for subsequent instructions

Building Your Image

Save your Dockerfile and build the image:

# Build a specific Dockerfile and give the image a specific tag
docker build -f Dockerfile_latest -t myanalysis:2.0 .

The -f flag points to the location of the Dockerfile and -t flag tags your image with a name and optional version. The . at the end specifies the build context (current directory).

Sharing Docker Images

WILDS Docker Library

You can contribute your container to the WILDS Docker Library, helping the entire Fred Hutch community.

The WILDS team provides:

  • Comprehensive Dockerfile templates with detailed guidance
  • Automated testing and security scanning
  • Version management and DockerHub publishing
  • Documentation and usage examples

See the contributing sections of the library’s GitHub repository and SciWiki page for details.

Docker Hub

Docker Hub is the standard registry for sharing Docker images publicly. It’s a good choice for fully open-source projects. To push your own image to DockerHub:

# 1. Create a Docker Hub account at hub.docker.com

# 2. Tag your image with your Docker Hub username
docker tag myanalysis:1.0 yourusername/myanalysis:1.0

# 3. Log in to Docker Hub
docker login

# 4. Push the image
docker push yourusername/myanalysis:1.0

Now others can pull your image with:

docker pull yourusername/myanalysis:1.0

Deploying Containerized Applications with Docker

SciComp maintains a Docker Swarm for hosting containerized applications like Shiny apps, web services, and dashboards. Applications can be confirgured to be accessible publicly or only within the Fred Hutch network.

For details, see the Shiny deployment page. This page uses Shiny-specific language but SciComp can deploy other types of containerized applications.

Troubleshooting

docker: command not found

Solution: Docker is not installed or not in your system PATH. Install Docker Desktop and restart your terminal.


Failed to initialize docker backend or Cannot connect to the Docker daemon

Solution: Docker Desktop is installed but not running. Start the Docker Desktop application. You should then see the Docker icon in your system tray (Windows) or menu bar (Mac).

Learning More About Docker

We recommend this free course from the ITCR Training Network: “Containers for Scientists”

Best Practices:

  • Use specific versions: Use getwilds/star:2.7.6a rather than getwilds/star:latest for reproducibility
  • Keep images small: Only install necessary dependencies to reduce build time and storage
  • Document your images: Include clear README files and Dockerfile LABEL information

Additional Resources

Updated: