New Harmony Gpu Nodes
Edit this Page via GitHub Comment by Filing an Issue Have Questions? Ask them here.The gizmo cluster has been augmented with more capable GPU nodes which we are calling the “harmony” class node. These systems have four NVIDIA L40S GPUs and provide significantly more memory (44GB) for GPU processes. These should be helpful for many processes that were constrained on the old gizmo J and K class nodes. Additional information is available on the wiki (here and here).
These new systems are part of the gizmo cluster but there are a number of important differences:
Hardware
These use the AMD EPYC processor- the 9354P with 32 cores running up to 3.8GHz. These are very compatible with Intel but have some differences which require rebuilding some tools (see below, “Environment Modules”)
OS
These new hosts are alwo running the latest LTS release of Ubuntu (Noble Numbat, 24.04). Again, while there is some compatiblity with the OS on the J and K class gizmo nodes (Bionic Beaver, 18.04), there are many cases where software built on the old gizmo nodes will not run on these.
Environment Modules
Because of the above differences we have rebuilt the entire Lmod environment module tree for these new hosts. Not all of the modules available on the J and K class nodes are available on the harmony class nodes- use ml spider
to list available modules, and ml spider <modulename>
to show available versions. If you need a specific version or toolset built, please email SciComp.
Please note that we are reserving these hosts for GPU workload. We will be adding non-GPU hosts in the future, but for now we will be prioritizing modules needing GPU capabilities
Partition & Limits
These harmony nodes have been configured into the chorus partition. There are a limited number of nodes with this capability so jobs running in this partition are more restricted- access requires the job request a GPU and a maximum of four GPUs can be allocated to an account. These nodes have 32 cores, but any one job is limited to 8 cores- CPU intensive work should be targeted for J and K class nodes.
Access
Requesting the chorus partition and a GPU is all that’s required to allocate harmony resources to your job. Because of the differences between the J/K class and Harmony nodes we have configured a login/dev node called maestro for interactive use. This node is something like a rhino login node, but configured with the new hardware, OS, and modules.
maestro can be accessed via SSH. At this time we do not have NoMachine available for this node.
This is a shared host- please watch your utilization and limit use of its resources to the very minimum necessary to prove and develop your work. Once you have confidence that a job will work, submit it to the chorus parition.
Last Word
Eight nodes have been added, but currently (18 November) there are only 6 available: one is having hardware problems and will be put into service as soon as that has been resolved. The other is currently serving as maestro and providing login and development services for harmony nodes until its replacement arrives. This replacement should be available in early December.