Computing with GPUs
Updated: March 31, 2023
Edit this Page via GitHub Comment by Filing an Issue Have Questions? Ask them here.There are currently two capabilities available for GPUs in the gizmo. The J and K class nodes have consumer-grade RTX cards. We have recently (fall 2024) added more capable harmony nodes with NVIDIA L40s cards. These are significantly more capable systems.
GPU Nodes
Location | Partition | Node Name | GPU |
---|---|---|---|
FHCRC | campus, short, new | j | NVIDIA GTX 1080ti |
FHCRC | campus, short, new | k | NVIDIA RTX 2080ti |
FHCRC | chorus | harmony | NVIDIA L40S |
FHCRC | none (interactive use) | rhino | NVIDIA RTX1080ti |
Accessing GPUs
GPUs are available in multiple partitions- the newest GPUs are in the chorus partition. Refer to this page for limits and other details about the partitions.
In campus, short, and restart-new partitions
GPUs are requested via the --gpus
option:
sbatch --gpus=1 ...
Specific GPU models can be requested by indicating model and count:
sbatch --gpus=rtx2080ti:1 ...
In the chorus partition
The new GPU systems have a different processor and a newer version of operating system- this requires a new set of environment modules specific to this new architecture. Because of this, these nodes are separated into new partition, chorus:
sbatch --partition=chorus --gpus=1 ...
There are currently only L40S GPUs, but for the sake of precision and future-proofing, you can specify the GPU as well:
sbatch --partition=chorus --gpus=l40s:1 ...
Since these nodes have more GPUs available per node, you can request more just by increasing the count:
sbatch --partition=chorus --gpus=3 ...
# or:
sbatch --partition=chorus --gpus=l40s:3 ...
Please make sure your code is capable of using multiple GPUs before requesting more than one,
When submitting jobs make sure that your current environment does not have modules loaded (i.e. module purge
) and that you are loading the new modules in your script. You may run into conflicts with modules built for rhino/gizmo compute platforms.
Using GPUs
When your job is assigned a GPU, Slurm sets the environment variable CUDA_VISIBLE_DEVICES. This environment variable indicates the assigned GPU- most CUDA tools (e.g. tensorflow) use this to restrict execution to that device.
Updated: March 31, 2023
Edit this Page via GitHub Comment by Filing an Issue Have Questions? Ask them here.