Computing with GPUs

Updated: March 31, 2023

Edit this Page via GitHub       Comment by Filing an Issue      Have Questions? Ask them here.

There are currently two capabilities available for GPUs in the gizmo. The J and K class nodes have consumer-grade RTX cards. We have recently (fall 2024) added more capable harmony nodes with NVIDIA L40s cards. These are significantly more capable systems.

GPU Nodes

Location Partition Node Name GPU
FHCRC campus, short, new j NVIDIA GTX 1080ti
FHCRC campus, short, new k NVIDIA RTX 2080ti
FHCRC chorus harmony NVIDIA L40S
FHCRC none (interactive use) rhino NVIDIA RTX1080ti

Accessing GPUs

GPUs are available in multiple partitions- the newest GPUs are in the chorus partition. Refer to this page for limits and other details about the partitions.

In campus, short, and restart-new partitions

GPUs are requested via the --gpus option:

sbatch --gpus=1 ...

Specific GPU models can be requested by indicating model and count:

sbatch --gpus=rtx2080ti:1 ...

In the chorus partition

The new GPU systems have a different processor and a newer version of operating system- this requires a new set of environment modules specific to this new architecture. Because of this, these nodes are separated into new partition, chorus:

sbatch --partition=chorus --gpus=1 ...

There are currently only L40S GPUs, but for the sake of precision and future-proofing, you can specify the GPU as well:

sbatch --partition=chorus --gpus=l40s:1 ...

Since these nodes have more GPUs available per node, you can request more just by increasing the count:

sbatch --partition=chorus --gpus=3 ...
# or:
sbatch --partition=chorus --gpus=l40s:3 ...

Please make sure your code is capable of using multiple GPUs before requesting more than one,

When submitting jobs make sure that your current environment does not have modules loaded (i.e. module purge) and that you are loading the new modules in your script. You may run into conflicts with modules built for rhino/gizmo compute platforms.

Using GPUs

When your job is assigned a GPU, Slurm sets the environment variable CUDA_VISIBLE_DEVICES. This environment variable indicates the assigned GPU- most CUDA tools (e.g. tensorflow) use this to restrict execution to that device.

Updated: March 31, 2023

Edit this Page via GitHub       Comment by Filing an Issue      Have Questions? Ask them here.