Gizmo is a capability cluster maintained by Center IT’s Scientific Computing group for use in Hutch research. Presented here is some basic information about the cluster and how to use it.
Gizmo is the compute cluster that you should use for most “production” computational tasks. Information about our cluster is provided elsewhere in the Wiki, but you can try it by using one of the
grabnode commands on either the NX servers or one of the
gizmo nodes requires the use of Slurm commands - Slurm is the software suite that queues and dispatches jobs to the cluster.
Batch Computing with
Batch computing describes a method of computing where a job is designed to run without interaction with the user. Batch jobs are submitted to the resource manager which manages execution of the job on the node and notifies the user of the job’s state as well as capturing any input or output.
Batch jobs are written as shell scripts.
There are 3 types of nodes in the current configuration- F, G and H class nodes. The class of node is reflected in the node’s host name: gizmof100 is an F class, gizmog10 is a G class. Without specifying other contstraints, you will be assigned cores from any available node (though F nodes are assigned preferentially if there are nodes of both class available).
These resources are dedicated to PIs and/or projects but have been made available for use by the general community when workload can be preempted. Access to these nodes is automatic- if you are a member of the group, your jobs can use those nodes without special commands.
gizmo nodes are available for all Fred Hutch users and as it is a community resource, Fred Hutch IT creates and manages a set of usage limits that aim to manage all the needs on campus. These limits focus on walltime of jobs, the number of cores and nodes per PI group, the amount of memory and the total number of running jobs at a time.
The campus partition is the default- jobs submitted to this will have shared, non-exclusive, access to assigned nodes.
The largenode partition must be specifically requested by adding “–partition=largenode” to your job submission.
|running jobs||5 per account|
In its most simple form your script will use a single CPU core, for example start a simple Python script using sbatch:
jdoe@rhino:~$ sbatch myscript.py Submitted batch job 11111
Now you want to see if the job is running for user jdoe and using the squeue command we see it is pending / waiting (ST=PD)
jdoe@rhino:~$ squeue -u jdoe JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 11111 campus wrap jdoe PD 0:00 1 (Resources)
When the job starts running (wait for a few minutes) squeue will show ST=R and you can see if the job produces any output or errors to STDOUT/STDERR by checking file slurm-
jdoe@rhino:~$ tail -f slurm-11111.out reading in file bla bla....
Then you submit more and more jobs and after a while you decide you want to cancel some of them, let’s say 11112 and 11113:
jdoe@rhino:~$ scancel 11112 11113
or you just want to cancel all your jobs (all the jobs submitted by user jdoe):
jdoe@rhino:~$ scancel -u jdoe
Tasks are the basic unit of work- a “task” requires cores and memory all on the same node. A job may have multiple tasks- in this case, a job may be assinged multiple nodes.
Use the “-n” or “–ntasks” option to specify the number of tasks in the job. This example shows a job that requests 6 tasks (which may be assigned up to 6 different nodes).
sbatch -n 6 bin/job.sh
Note that your job must handle communication between and startup of multiple tasks.
CPUs per Task
Indicates the number of CPUs required for each task. Defaults to one, so one task (above) would get one CPU. When this is specified, Slurm will assign each of the tasks CPUs from the same host- so indicating 1 task with 3 CPUs per task will result in three CPUs on the same node allocated to the task.
Use -c/–cpus-per-task to change this.
sbatch -n 6 bin/job.sh # <- requests six tasks, may run on up to 6 different nodes sbatch -n 1 -c 6 bin/job.sh # <- requests one task that requires 6 cores, will run only on one node sbatch -n 6 -c 2 bin/job.sh # <- requests 6 tasks of two cores each (12 total). May run on up to six nodes
Wall time indicates the length of time you expect your job to run. If no wall time is specified, a default value (three days) will be used. After a grace period of 72 hours (during which time you can ask SciComp to increase your wall time) your job will be killed
To add this information to your job use the argument -t/–time. Different time formats are supported:
sbatch -n 1 -c 12 -t 1-12 # 12 cores on one node for 1 day and 12 hours sbatch -n 1 -c 4 -t 8:00:00 # 4 cores for 8 hours
All systems have local scratch. Each node you are assigned will have a temporary directory created in this scratch space- the path to this is available via the “TMPDIR” environment variable in the job.
If you require large volumes of local scratch, use “–tmp” to specify (in megabytes) the volume of disk you think you’ll require to make sure you are assigned a node with sufficient space:
sbatch --tmp=300000 # requires node configured with at least 300GB disk
Note that this only ensures that the node is configured with that amount of disk- that file system may not have that space available due to use by others.
If you require a significant amount of RAM (over 1-2GB per core requested), you’ll want to specify this to ensure there’s enough available. Similar to requesting local scratch, specify the megabytes required using the argument “–mem”
sbatch --mem=8096 # Requests 8GB available RAM
Note that this also places an upper limit on the RAM you can use- if your job uses more than requested, it will be killed by the resource manager.
Submit a job with wall time using the sbatch command:
gizmo: sbatch --ntasks=1 --time=1-0 --output=out.txt --wrap="echo Hello World" Submitted batch job 1254 gizmo: cat out.txt Hello World
or you can use the short form for arguments:
gizmo: sbatch -n 1 -t 24:00:00 -o out.txt --wrap="echo Hello World" Submitted batch job 1255 gizmo: cat out.txt Hello World
In this case, one task is requested for one day. As we don’t specify the number of CPUs required per task, we are given one CPU for the task. The other arguments handle where output is saved and how the command is run (the –wrap argument takes the value and “wraps” it in a bash wrapper for execution on the node.
The term “partition” refers to a logical grouping of resources within the cluster. Jobs queued in a partition may have different nodes, services, restrictions, and defaults.
The “restart” partition is for jobs that can be preempted by other workload, meaning that a job running in this partition can be killed by jobs that are queued in the other partitions. These jobs are not requeued by default- they are removed from the system, unless you use the –restart flag. In that case they would move to the end of the waiting line and be restarted when there are enough resources.
The advantage to using this partition is that you can use more nodes than in the default partition. The jobs that have the highest degree of success running in this partition are jobs that use a single core and run quickly- in the scope of hours.
You will likely need to “clean up” after jobs that are killed- partial output files, temporary files in network locations, etc. Checkpointing is one strategy for effective use of the restart queue- there are many different methods available to checkpoint an application- contact us to determine a) if checkpointing is an appropriate strategy and b) what the most effective checkpointing technology would be.
gizmo: sbatch -n 1 -p restart -t 1-0 bin/job.sh # <--- this job will not restart gizmo: sbatch -n 1 -p restart -t 1-0 --restart bin/job.sh # <--- this will automatically re-enter the queue