On-Premise High Performance Computing at Fred Hutch
Edit this Page via GitHub Comment by Filing an Issue Have Questions? Ask them here.The Fred Hutch provides researchers on campus access to high performance computing using on-premise resources. The various services provided are outlined here along with the basic information required for researchers to identify which FH resource might be best suited to their particular computing needs.
The systems listed here provided by the Fred Hutch, serve needs that rise above
those that can be met using your desktop computer or web-based services. Often reasons to move
to these high performance computing (HPC) resources include the need for version controlled, specialized
package/module/tool configurations, higher compute resource needs, or rapid
access to large data sets in data storage locations not accessible with the
required security for the data type by desktop or web-based services. In the table below,
gizmo
is actually the compute resource that can be accessed via multiple
tools, which are also listed below.
Overview of On-Premise Resources
Compute Resource | Access Interface | Resource Admin | Connection to FH Data Storage |
---|---|---|---|
Gizmo | Via Rhino or NoMachine hosts (CLI, FH credentials on campus/VPN off campus) | Scientific Computing | Direct to all local storage types |
Beagle | Via Rhino or NoMachine hosts (CLI, FH credentials on campus/VPN off campus) | Center IT | home, fast, economy, AWS-S3, and Beagle-specific scratch |
Rhino | CLI, FH credentials on campus/VPN off campus | Scientific Computing | Direct to all local storage types |
NoMachine | NX Client, FH credentials on campus/VPN off campus | Scientific Computing | Direct to all local storage types |
Python/Jupyter Notebooks | Via Rhino (CLI, FH credentials on campus/VPN off campus) | Scientific Computing | Direct to all local storage types |
R/R Studio | Via Rhino (CLI, FH credentials on campus/VPN off campus) | Scientific Computing | Direct to all local storage types |
Rhino
Rhino, or more specifically rhinos are the locally managed HPC resources that are actually three different servers all accessed via the name rhino. These function as a data and compute hub for a variety of data storage resources and HPC tasks.
These are large shared Linux-based systems which are accessed via SSH. As these are shared, you must take care not to overload these hosts. As a rule, use the rhinos for cluster tasks, development, and prototyping.
NoMachine
The NoMachine (NX) servers provide a Linux desktop environment. These systems are useful if you use tools that require an X Windows display and you don’t wish to install an X11 server on your personal computer. Another benefit of using these systems is that the desktop environment and any processes are preserved if you should disconnect- particularly handy for laptop users.
There are three NX servers: lynx, manx, and sphinx. lynx runs the Unity desktop environment, the other two run Maté.
NoMachine requires you install the client (NX client) on your computer. Clients are available for OSX and Windows. Contact the helpdesk if you need assistance with installation.
Gizmo and Beagle Cluster
While we generally don’t recommend interactive computing on the HPC clusters- interactive use can limit the amount of work you can do and introduce “fragility” into your computing- there are many scenarios where interactively using cluster nodes is a valid approach. For example, if you have a single task that is too much for a rhino, opening a session on a cluster node is the way to go.
If you need an interactive session with dedicated resources, you can start a
job on the cluster using the command grabnode
. The grabnode
command will
start an interactive login session on a cluster node. This command will prompt
you for how many cores, how much memory, and how much time is required
This command can be run from any NoMachine or rhino host.
NOTE: at this time we aren’t running interactive jobs on Beagle nodes. If you have a need for this, please contact scicomp.
Batch Computing
Batch computing allows you to queue up jobs and have them executed by the batch system, rather than you having to start an interactive session on a high-performance system. Using the batch system allows you to queue up thousands of jobs- something impractical to impossible when using an interactive session. There are benefits when you have a smaller volume of jobs as well- interactive jobs are dependent on the shell from which they are launched- if your laptop should be disconnected for any reason the job will be terminated.
The batch system used at the Hutch is Slurm. Slurm provides a set of commands for submitting and managing jobs on the gizmo and beagle clusters as well as providing information on the state (success or failure) and metrics (memory and compute usage) of completed jobs. For more detailed information about Slurm on our systems see our Using Slurm page, which also links to a variety of detailed how-to’s and examples to get you started using the on-premise HPC resources available
Parallel Computing
There are many approaches to parallel computing (doing many jobs simultaneously rather than in series). We have begun a Resource Library entry on Parallel Computing with Slurm, as well as created the FredHutch/slurm-examples repository containing community curated examples with additional documentation that can help you get started.
External Slurm and HPC Reference and Learning Resources
For more information and education on how to use HPC resources from external sources see the following sites:
- Princeton’s Introduction to HPC systems and Bash.
- Harvard’s Wiki site Slurm page.
- The Carpentries lesson on HPC and job scheduling.