High Performance Computing Reference Stack

This guide gives examples for converting Docker* containers, such as those provided by the Deep Learning Reference Stack into Singularity* containers suited for HPC, and then walking through a multi-node benchmarking example with TensorFlow*.

Overview

The High Performance Computing Reference Stack (HPCRS) meets the needs of deploying HPC and AI workloads on the same system. This software solution reduces the complexities associated with integrating software components for High Performance Computing (HPC) Platforms. Singularity is an open source container platform to package entire scientific workflows, software and libraries, and even data.

Installing Singularity

The installation instructions are for Linux* systems, and have been enabled for installation on Clear Linux* OS.

  1. Install Go*.

    This guide requires version 1.13 of Go, for compatibility with Singularity v3.0.0. Please use these steps to ensure the correct version of Go is installed:

    $ export VERSION=1.13 OS=linux ARCH=amd64 && \
    wget https://dl.google.com/go/go$VERSION.$OS-$ARCH.tar.gz && \
    sudo tar -C /usr/local -xzvf go$VERSION.$OS-$ARCH.tar.gz && \
    rm go$VERSION.$OS-$ARCH.tar.gz
    
  2. Setup the environment for Go.

    echo 'export GOPATH=${HOME}/go' >> ~/.bashrc && \
    echo 'export PATH=/usr/local/go/bin:${PATH}:${GOPATH}/bin' >> ~/.bashrc && \
    source ~/.bashrc
    
  3. Install dep for dependency resolution with Singularity v3.0.0.

    go get -u github.com/golang/dep/cmd/dep
    
  4. Download Singularity.

    go get -d github.com/sylabs/singularity
    

    Note

    Go will complain that there are no Go files, but it will still download the Singularity source code to the appropriate directory within the $GOPATH.

  5. Checkout version 3.0.0 of Singularity.

    export VERSION=v3.0.3 # or another tag or branch if you like && \
    cd $GOPATH/src/github.com/sylabs/singularity && \
    git fetch && \
    git checkout $VERSION # omit this command to install the latest bleeding edge code from master
    
  6. Build Singularity.

    Singularity uses a custom build system called makeit. mconfig is called to generate a Makefile and then make is used to compile and install. The devpkg-openssl, devpkg-util-linux package may be required and can be installed using the sudo swupd bundle-add <pkg-name>.

    ./mconfig && \
    make -C ./builddir && \
    sudo make -C ./builddir install
    
  7. Configure bash completion (optional).

    To enjoy bash completion with Singularity commands and options, source the bash completion file. Add this command to your ~/.bashrc file so that bash completion continues to work in new shells

    . /usr/local/etc/bash_completion.d/singularity
    

Converting Docker images to Singularity Images

  1. Download d2s.

    d2s os an open source tool to convert Docker images to Singularity images. You can use the script in the location where it is downloaded, or install it using the included setup.py file with the python setup.py install

    git clone https://github.com/intel/stacks.git
    cd stacks/hpcrs/d2s
    
  2. List local Docker images.

    python d2s.py --list_docker_images
    

    Your output can appear like this:

    ==============================
    Docker images present locally
    ==============================
    ID         NAME
    0: clearlinux/stacks-dlrs-mkl
    1: clearlinux/stacks-dlrs_2-mkl
    ==============================
    
  3. Convert to Singularity images.

    To convert the Docker images to Singularity images, use the d2s script with the ID numbers of the images you wish to convert. We strongly recommend using one of the clearlinux/stacks-dlrs-mkl or sysstacks/stacks-dlrs-mkl based images for this guide. Other images may be incompatible with expected configuration or filesystem options.

    python d2s.py --convert_docker_images <ID_1> <ID_2>
    
  4. Use the Singularity image.

    To use the container shell to run workloads, launch the image and you will be dropped into the shell. The Singularity image name will be the same as the name of the Docker image, with slashes converted to underscores.

    singularity shell <singularity image>
    

    Using the example output above, after conversion you could launch the clearlinux/stacks-dlrs-mkl Singularity image with singularity shell clearlinux_stacks-dlrs-mkl

Execute a multi-node benchmark on an HPC cluster

The following example was executed on an Intel(r) Xeon(r) Processor-based HPC infrastructure. The following steps may need to be adjusted for different environments. See this Intel Whitepaper for more information.

Running a ResNet50 workload multi-node

  1. Download the TensorFlow benchmark.

    git clone http://github.com/tensorflow/benchmarks -b cnn_tf_v1.13_compatible
    
  2. Copy the Singularity image and the benchmark files to the HPC cluster environment.

  3. Install OpenMPI* if needed.

    Note

    If the HPC host does not have OpenMPI installed, install a custom local version in the user’s home directory. This version must be the same as the version installed in the DLRS container. Follow the steps for building OpenMPI from their documentation.

  4. Adjust PATH variables.

    Include the OpenMPI install locations in the PATH and LD_LIBRARY_PATH environment variables.

    export PATH="$PATH:<openmpi install path>/bin"
    export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:<openmpi install path>/lib/"
    
  5. Execute the TF benchmark script on single or multiple nodes using OpenMPI through the mpirun command. Replace variables in {} braces to reflect your environment.

    mpirun --np ${NUM_COPIES}  \
    -bind-to none \
    -map-by slot \
    --display-map \
    -host ${HOSTNAMES} \
    --report-bindings \
    --oversubscribe \
    -x LD_LIBRARY_PATH \
    -x PATH \
    -x HOROVOD_FUSION_THRESHOLD \
    -x OMP_NUM_THREADS=${OMP_NUM_THREADS} \
    singularity exec ${PATH_TO_SING_IMAGE} \
    python ${PATH_TO_TF_BENCH}/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py \
    --batch_size=128 \
    --model=resnet50 \
    --num_intra_threads=${NUM_INTRA_THREADS} \
    --num_inter_threads=${NUM_INTER_THREADS} \
    --data_format=NHWC \
    --device=cpu \
    --variable_update=horovod \
    --horovod_device=cpu
    

    Note

    Refer to the DLRS script for recommended values for setting environment variables in the mpirun command.

    Note

    You may see an error regarding a missing library while executing the DLRS container. “tensorflow.python.framework.errors_impl.NotFoundError: libnuma.so.1: cannot open shared object file: No such file or directory”

    A workaround for this error is to bind the path to the library from the host.

    --bind /usr/lib64/libnuma.so.1:/usr/lib64/libnuma.so.1