The Fred Hutch provides researchers on campus access to high performance computing using on-premise resources. The various technologies provided are outlined on our Technologies page along with the basic information required for researchers to identify which FH resource might be best suited to their particular computing needs.
The Scientific Computing group supports additional software used in scientific research beyond those available on local workstations. A large number of pre-compiled packages are already available on our high performance computing (HPC) cluster and Linux systems. Individual user installation of packages and language modules is also supported.
Reasons to use scientific software maintained by SciComp include:
- packages are often faster due to compiler optimizations
- packages are reproducible in or outside Fred Hutch
- rapid access to many software packages and package versions
EasyBuild Life Sciences
The full list of available software can be found on the Easy Build site.
On the command line and in scripts, we use the Environment Module system to make software versions available in a modular and malleable way. Environment Modules provide modular access to one version of one or more software packages. We use a system called EasyBuild to create modules for everyone to use - there are over a thousand modules already available.
A Note About Environment Module Use
As you will learn below, Environment Modules can be referred to in two ways - generic and specific. Often the generic method is fastest, and this is an acceptable way to load Environment Modules when using a shell interactively. When using the generic method, you refer simply to the software package name you want to load (ex:
Python). This is fast, but circumvents the reproducible aspect of Environment Modules. The version of
Python loaded using the generic reference will change as the
Python package versions are updated. For scripts, we recommend always using a specific Environment Module reference.
How to Use Environment Modules
When you log in to a SciComp server your terminal session has Lmod pre-loaded. Commonly used shell commands around Environment Modules include:
||Output a list of available Environment Modules|
||Output a filtered list of modules based on pattern (ex:
||Load a specific version of a module into your environment (ex:
||Load a generic Environment Module (ex:
||Output a list of Environment Modules loaded in your current shell|
||Unload an Environment Module|
||Unload all currently loaded Environment Modules|
There is also a short version of the
ml. You can substitute
module in any of the commands above (
ml <pkg> =
module load <pkg>).
$ which python /usr/bin/python $ module avail Python/2.7.15 -------------------------- /app/easybuild/modules/all -------------------------- Python/2.7.15-foss-2016b-fh1 Python/2.7.15-foss-2016b Use "module spider" to find all possible modules. Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys". $ module load Python/2.7.15-foss-2016b-fh1 $ which python /app/easybuild/software/Python/2.7.15-foss-2016b-fh1/bin/python
Scripting with Environment Modules
To use Environment Modules in a bash script, there are two Best Practices.
- Interactive shell session have the required
modulecommands activated, but scripts can often be run in non-interactive shells, so it is best to explicitly activate the
modulecommand. Add the follow lines to the top of your script:
#!/bin/bash source /app/Lmod/lmod/lmod/init/bash module use /app/easybuild/modules/all
This activates the
modulecommand a points it to our list of installed modules.
- Scripts are expected to be reproducible, so using a specific Environment Module reference is recommended:
module load Python/3.5.1-foss-2016b-fh1
module load Python
The above line will load a different version of the software package over time as the “pointer” to a specific version is changed.
Note: This does mean that your script will only work in environments with the specific Environment Module version you are loading. That environment module may not be initially available on systems outside Fred Hutch or on internal systems follow upgrades. You can either request the specific version be added, or edit your script to load an available package version.
Installing Custom Software Packages
If you do not find the software you need, a support package or library, or the specific version you need, you have two options:
Request the software be built: file an issue in our software repo and we will work with you to build a module for any software or version. This Environment Module will then be available to all.
If you cannot wait for the software to be built, you may be able to install it yourself. This is primarily supported for language (Python/R) packages.
Packages/Modules for Python and R
Normal install methods will work after loading an Environment Module:
- Python: you can use
Any package you install this way will be installed into your home directory.
Remember that the environment module you have loaded will be used to install the package/module. For example, if you load
Python/3.6.9 and use
pip install --user <newpkg> then you will need to load
Python/3.6.9 every time you wish to use
newpkg. Using a different version of the language module may or may not work.
Other software installs and builds
If you want to install or build a standalone software package, you are also welcome to install into your home directory, with the following caveats:
- We cannot install OS package dependencies (if your software has many dependencies, please file an issue here and we will be happy to work with you to offer a package build with all dependencies.
- Ubuntu compilers are not optimized. We recommend loading a ‘toolchain’ module:
module load foss-2016b
This will get you GCC 5.4.0, binutils 2.26, OpenMPI 1.10.3, OpenBLAS 0.2.18, FFTW 3.3.4, ScaLAPACK 2.0.2 (most of our software on Ubuntu 14.04 is built against this toolchain).
- If you loaded a toolchain module when installing or building new software, you will must load that toolchain module before running that software, or you will get library errors.
Frequently Asked Questions
Note: For announcements from Scientific Computing, please see the Announcements page, and for assistance email
scicomp. Also, see the Events page in CenterNet for current Office Hours.
- Something weird is going on with my shell and/or job!?!
- “Reset” your shell by logging out and back in. This will clear your environment. Users using screen or tmux will need to exit their session to clear their environment.
- Why am I getting errors when running software from a module?
- Unload all modules with
module purgeand re-load only the module(s) you need
- Reset your shell - see above
- Remove and reinstall software in your home directory not installed with the module you are using (
~/.local) - this is key with toolchain modules and package/libraries that use compiled code
- Unload all modules with
- Only bash?
- Our recommendation is to use bash as your shell. If you wish to use a different shell, please contact SciComp.
- Is there a faster way?
- The command
mlis a shortcut for
module loadbut will work with other
ml avail Python/3.5)
- The command
- What is this “foss-2016b” stuff?
- The EasyBuild project supports many different toolchains. The toolchain defines a compiler and library set, and also has a number of common support libraries (things like libTIFF) associated with it.
- Should I load default modules?
- It is faster and easier to type
ml Rthan specifying the full package and version. However, the default version loaded by a generic
module load <pkg>command will change over time. If maintaining a specific version of a package is important to you, always specify the version.
- It is faster and easier to type
- Is there a list of included language libraries/modules/packages?
- Yes! For R, Python, and some additional packages, look here.
- What about Bioconductor for R?
- Starting with R/3.4.3-foss-2016b-fh2 we include Bioconductor and many Bioc packages with the standard R module.
- What are Best Practices with Environment Modules?
- Specify the full Module name when loading, especially in scripts (see above for scripting information).
- Avoid mixing Modules from different toolchains at the same time (unloading one and loading another mid-script works well if you need to).
- If you can’t find a package you want, send an email us or file an issue requesting a new or updated package.
Batch computing allows you to queue up jobs and have them executed by the batch system, rather than you having to start an interactive session on a high-performance system. Using the batch system allows you to queue up thousands of jobs- something impractical to impossible when using an interactive session. There are benefits when you have a smaller volume of jobs as well- interactive jobs are dependent on the shell from which they are launched- if your laptop should be disconnected for any reason the job will be terminated.
The batch system used at the Hutch is Slurm. Slurm provides a set of commands for submitting and managing jobs on the gizmo and beagle clusters as well as providing information on the state (success or failure) and metrics (memory and compute usage) of completed jobs. For more detailed information about Slurm on our systems see our Using Slurm page, which also links to a variety of detailed how-to’s and examples to get you started using the on-premise HPC resources available
There are many approaches to parallel computing (doing many jobs simultaneously rather than in series). We have begun a Resource Library entry on Parallel Computing with Slurm, as well as created the FredHutch/slurm-examples repository containing community curated examples with additional documentation that can help you get started.
External Slurm and HPC Reference and Learning Resources
For more information and education on how to use HPC resources from external sources see the following sites: