Running Tensorflow (and Keras) on GPUs
Updated: April 21, 2023
Edit this Page via GitHub Comment by Filing an Issue Have Questions? Ask them here.Running tensorflow with GPUs has become easier as of 2019 as the latest Gizmo J and K class nodes are equipped with GPUs.
GPU Tensorflow in a Python Environment
GPU Tensorflow with the standard Python
load a current version of Python 3.6 on Rhino using the ml shortcut for module load and then use pip3 to install Tensorflow in your user home directory.
~$ ml Python/3.6.7-foss-2016b-fh2
~$ pip3 install --user --upgrade tensorflow-gpu
then create a small python test script:
echo "#! /usr/bin/env python3" > ~/tf-test.py
echo "import tensorflow" >> ~/tf-test.py
echo "from tensorflow.python.client import device_lib" >> ~/tf-test.py
echo "print(tensorflow.__version__)" >> ~/tf-test.py
echo "print(tensorflow.__path__)" >> ~/tf-test.py
echo "print(device_lib.list_local_devices())" >> ~/tf-test.py
chmod +x ~/tf-test.py
and run it on Gizmo with --gpus=1
to select a node with GPU:
~$ sbatch -o out.txt --gpus=1 ~/tf-test.py
~$ tail -f out.txt
if you want to switch back to the non-GPU version of Tensorflow just uninstall the GPU version you installed under .local
~$ pip3 uninstall tensorflow-gpu
Uninstalling tensorflow-gpu-1.13.1:
Would remove:
/home/petersen/.local/bin/freeze_graph
/home/petersen/.local/bin/saved_model_cli
/home/petersen/.local/bin/tensorboard
/home/petersen/.local/bin/tf_upgrade_v2
/home/petersen/.local/bin/tflite_convert
/home/petersen/.local/bin/toco
/home/petersen/.local/bin/toco_from_protos
/home/petersen/.local/lib/python3.6/site-packages/tensorflow/*
/home/petersen/.local/lib/python3.6/site-packages/tensorflow_gpu-1.13.1.dist-info/*
Proceed (y/n)?
GPU Tensorflow in a virtual environment
Python virtual environments are useful for advanced users who would like to work with multiple versions of python packages. It is important to understand that the virtual env is tied to the Python environment you have previously loaded using the ml
command. Let’s load a recent Python and create a virtual environment called mypy
~$ ml Python/3.6.7-foss-2016b-fh2
~$ python3 -m venv mypy
~$ source ./mypy/bin/activate
(mypy) petersen@rhino3:~$ which pip3
/home/petersen/mypy/bin/pip3
Now that you have our own environment you can install packages with pip3. Leave out the –user option in this case because you want to install the package under the virtual environment and not under ~/.local
(mypy) petersen@rhino3:~$ pip3 install --upgrade tensorflow-gpu
Now you can just continue with the example from GPU Tensorflow with the standard Python
. After you are done with your virtual environment you can just run the deactivate
script. No need to uninstall the tensorflow package:
(mypy) petersen@rhino3:~$ deactivate
~$
GPU Tensorflow from an Apptainer container
To run in a Apptainer container, you need to start with a Docker image containing a modern Python and the tensorflow-gpu package installed. The Tensorflow Docker images are all set up and ready.
After that load Apptainer:
ml Apptainer
After that, the only change is to enable NVIDIA support by adding the --nv
flag to apptainer exec
:
apptainer exec --nv docker://tensorflow/tensorflow:latest-gpu-py3 python3
Sample code is available in the slurm-examples repository.
Tensorflow from R
Scientific Computing maintains custom builds of R and Python. Python modules with fh suffixes have Tensorflow since version 3.6.1. Only Python3 releases have the Tensorflow package. To use Tensorflow from R, use the FredHutch Modules for R and Python. Example Setup
ml R
ml Python/3.6.7-foss-2016b-fh2
# Start R
R
# R commands
pyroot = Sys.getenv("EBROOTPYTHON")
pypath <- paste(pyroot, sep = "/", "bin/python")
Sys.setenv(TENSORFLOW_PYTHON=pypath)
library(tensorflow)
tf_version()
sess = tf$Session()
hello <- tf$constant('Hello, TensorFlow!')
sess$run(hello)
# Sample output: b'Hello, TensorFlow!'
Troubleshooting
First verify that you have a GPU active (e.g. Tesla V100) as well as CUDA V 10.0 or newer
gizmok12[~]: nvidia-smi
Thu Sep 17 14:54:48 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:5E:00.0 Off | N/A |
| 38% 42C P0 39W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
gizmok12[~]:
If you see any issues here you need to contact SciComp
to have the error corrected
Also please email SciComp
to request further assistance
Updated: April 21, 2023
Edit this Page via GitHub Comment by Filing an Issue Have Questions? Ask them here.