Machine Learning System Nvidia DGX-1 and OpenStack GPU VMs
The Machine Learning System DGX-1 is a “Supercomputer in a box” with a single precision peak performance of 80 TFlop/s. It contains eight high end GPGPUs from NVIDIA (Tesla P100) with each 16 GB RAM and 28.672 CUDA-compute units which are connected to each other by a NVLink Interconnect and a host x86 compatible system with 40 cores (Intel Xeon). Users can reserve the whole DGX-1 exclusivly and run complex machine-learning tasks, which are available via Docker images.
A set of preinstalled images covers deep learning toolkits such as TensorFlow, Theano, CNTK, Torch, DIGITS, Caffe and others.
The system is running Ubuntu 14.04 LTS in the version supported by NVidia.
Below is a scematic drawing about the internals of the system. The 8 GPUs are connected via NVlink High Speed Interconnect and the x86 cores are connected via PCIe Switches to the GPUs. For a detailed documentation about the hardware please see the Nvidia website directly.
Also available are 4 single node systems with one NVIDIA GPU P100 for development purposes. On these systems only a general purpose image is available which provides the PGI Compiler Suite and CUDA.
Access and Login
Linux Cluster Users can get access to the system by submitting an Incident ticket to the LRZ service desk (firstname.lastname@example.org). The user account will then be inserted into the allowed users list of the DGX-1.
The system can be reserved via a online calendar system (https://datalab.srv.lrz.de) which is at the moment only available in the MWN. If you want to use the reservation system and the login to the compute system from outside the MWN you first have to connect to the LRZ VPN (see VPN documentation).
In the online calendar reservation the user can see the available timeslots and book the complete system for maximal 8 hours per day.
Please remember that on DGX-1 only /home/ is kept between sessions. On VMs **everything** is lost. We are working on that but in the meantime always copy your data.
The user has then to upload a ssh key into the online calendar which will be used for authentication on the the system.
When the date of the reservation approaches, the user will obtain an email with further instructions how to connect to the system via ssh and via a http link. Please be aware that the users computer has to be in the Munich Science Network or LRZ VPN in order to connect to the system.
NVIDIA GPU Optimized Deep Learning Frameworks
The NVIDIA Deep Learning SDK accelerates widely-used deep learning frameworks.
This release provides containerized versions of those frameworks optimized for the NVIDIA DGX-1, pre-built, tested, and ready to run, including all necessary dependencies.
|Framework||Base Version||Container Name||Description||Release Notes|
|Caffe||NVIDIA Caffe 0.16||Caffe 17.04||
Caffe was originally developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors. It is a deep learning framework made with expression, speed, and modularity in mind.
NVIDIA Caffe is an NVIDIA-maintained fork of BVLC Caffe tuned for NVIDIA GPUs, particularly in multi-GPU configurations, accelerated by the NVIDIA Deep Learning SDK. It includes multi-precision support as well as other NVIDIA-enhanced features and offers performance specially tuned for the NVIDIA DGX-1.
|Caffe2||Caffe2 0.0.5+||Caffe2 17.04||
Caffe2 is a deep-learning framework designed to easily express all model types, for example, CNN, RNN, and more, in a friendly python-based API, and execute them using a highly efficiently C++ and CUDA back-end.
It allows a large amount of flexibility for the user to assemble their model, whether for inference or training, using combinations of high-level and expressive operations, before running through the same python interface allowing for easy visualization, or serializing the created model and directly using the underlying C++ implementation.
Caffe2 supports single and multi-GPU execution, along with support for multi-node execution.
|CNTK||CNTK 2.0.beta15.0||CNTK 17.04||Microsoft Cognitive Toolkit (CNTK) empowers you to harness the intelligence within massive datasets through deep learning by providing uncompromised scaling, speed and accuracy with commercial-grade quality and compatibility with the programming languages and algorithms you already use.||Link|
|DIGITS||DIGITS 5.0||DIGITS 17.04||DIGITS can be used to rapidly train highly accurate deep neural network (DNNs) for image classification, segmentation and object detection tasks. DIGITS simplifies common deep learning tasks such as managing data, designing and training neural networks on multi-GPU systems, monitoring performance in real time with advanced visualizations, and selecting the best performing model from the results browser for deployment. DIGITS is completely interactive so that data scientists can focus on designing and training networks rather than programming and debugging.||Link|
|MXNet||MXNet 0.9.3a+||MXNet 17.04||
MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix the flavors of symbolic programming and imperative programming to maximize efficiency and productivity.
In its core is a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. The library is portable and lightweight, and it scales to multiple GPUs and multiple machines.
MXNet is also more than a deep learning project. It is also a collection of blueprints and guidelines for building deep learning systems and interesting insights of DL systems for hackers.
|PyTorch||PyTorch v0.1.10||PyTorch 17.04||PyTorch is a python package that provides two high-level features:
|Tensorflow||Tensorflow 1.0.1||Tensorflow 17.04||
TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. This flexible architecture lets you deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device without rewriting code.
TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research. The system is general enough to be applicable in a wide variety of other domains, as well.
|Theano||Theano 0.9.0||Theano 17.04||Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.||Link|
Torch is a scientific computing framework with wide support for deep learning algorithms. Thanks to an easy and fast scripting language, Lua, and an underlying C/CUDA implementation, Torch is easy to use and is efficient.
Torch offers popular neural network and optimization libraries that are easy to use yet provide maximum flexibility to build complex neural network topologies.
General purpose container
CUDA and OpenACC compiler ("CUDA 8 and PGI 17.4")
This is a basic container for software development containing the following components:
- Ubuntu 16.04
- NVIDIA CUDA® 8.0.61
- NVIDIA cuDNN 6.0.20
- NVIDIA NCCL 1.6.1 (optimized for NVLink)
- PGI C++ and Fortran Compiler 17.4
- OpenMPI with GPUDirect