High Performance Computing for AI


Friday, Oct. 5 , 2018, 9:00-17:00

Location: LRZ Building, Garching/Munich, Boltzmannstr. 1, LRZ Hörsaal H.E.009


The Leibniz Supercomputing Center (LRZ) is the computer center for Munich's universities and for the Bavarian Academy of Sciences and Humanities. It is also a national center for High Performance Computing. LRZ, Nvidia and IBM are excited to announce this one-day workshop "High Performance Computing for AI" at LRZ on Friday , Oct. 5, 2018.

In this full-day workshop, you will learn the advanced techniques for accelerating AI workload. Topics include:

  • Introduction to LRZ AI computational infrastructure.
  • Mixed Precision Arithmetic for deep learning (Nvidia)
  • MultiGPU training for deep learning (Nvidia)
  • PowerAI, large model support and distributed deep leanring (IBM)
  • SnapML (IBM)
  • Carme and HP-DLF for  Deep Learning on HPC systems (ITWM)

Workshop Agenda:

9:00 – 9:10  Welcome (Prof. Helmut Reiser, LRZ)

9:10 – 9:30  Introduction to LRZ AI computational infrastructure (Yu Wang, Stephan Peinkofer, LRZ)

9:30 – 10:00 CharlieCloud for Deep Learning at LRZ (Juan J. Durillo, LRZ)
10:00 - 10:30 HP-DLF - High Performance Deep Learning Framework (Peter Labus, Fraunhofer ITWM)
10:30 - 11:00 Coffee Break
11:00 - 12:00 Accelerate Deep Learning – unleash the power of multiple NVIDIA GPUs and mixed precision arithmetic (Part One, Adolf Hohl, Nvidia)
12:00 - 13:00 Lunch
13:00 - 14:00 Accelerate Deep Learning – unleash the power of multiple NVIDIA GPUs and mixed precision arithmetic (Part Two, Adolf Hohl, Nvidia)
14:00 - 14:30 Carme - An Open Source Framework for Multi-User, Interactive Machine Learning on Distributed GPU-Systems (Dominik Straßel, Fraunhofer ITWM)
14:30 - 14:45 Coffee Break
14:45 - 15:15 PowerAI (Thomas Parnell, IBM)
15:15 - 16:15 SnapML (Thomas Parnell, IBM)
16:15 - 16:30 Concluding Remark and General Q&A


"Carme - An Open Source Framework for Multi-User, Interactive Machine Learning on Distributed GPU-Systems"

Dominik Straßel, Fraunhofer ITWM

In oder to make high performance clusters attractive and effectively  usable for machine learning and data science users we provide an  open-source framework to manage resources for multiple users. Therefore we combine open-source solutions - like Container-Images and Jupyter notebooks - with HPC back-ends - like Slurm and BeeGFS. A scheduler makes it possible to reserve resources and use various different queues according to the needs of the users. Having reservations it easy to have a scalable framework for adding GPUs to a running job or having strong and weak scaling of deep learning trainings. Most of the back-end remains invisible as we use a web-interface at which the users can login and submit jobs.
“Accelerate Deep Learning – unleash the power of multiple NVIDIA GPUs and mixed precision arithmetic”
Adolf Hohl, Nvidia
Deep Learning is a versatile tool to solve problems with increasing complexity and size. To keep up a high pace of innovation with growing datasets the training process has to be accelerated. This session is about leveraging MultiGPU and Mixed Precision Arithmetic to boost the rate you can innovate on DL challenges using NVIDIA DGX family and TESLA V100.
"HP-DLF: High Performance Deep Learning Framework"
Peter Labus, Fraunhofer ITWM
Deep Neural Networks (DNNs) have proven to be highly successful in many domains, such as computer vision,speech recognition and natural language processing. Especially for tasks for which large data sets are available DNNs often outperform classical machine learning algorithms significantly. Since this is achieved by constructing models with a large number of parameters, the GPU memory available on a single device, quickly becomes the limiting factor for the size of the DNN.

In this talk I introduce the High Performance Deep Learning Framework (HP-DLF). HP-DLF is a model-parallel back end for deep learning on high performance environments and aims at providing facilities to train and execute large DNNs on multi-node clusters in an auto-parallel, weak- and strongly scalable fashion. The user simply provides a DNN in the ONNX standard, which the HP-DLF compiler then translates into a Petri net. This Petri net is then executed in a distributed, fault-tolerant and hardware-aware way using the parallel programming development software GPI-Space.

In this way, HP-DLF enables the use of scalable deep learning on HPC systems, especially in domains for which very large data set are available, e.g. biology/neuroscience, astrophysics, medicine, and computer vision.
"Power AI and SnapML"
Thomas Parnell, IBM

 PowerAI is an enterprise software distribution of popular open-source deep learning frameworks pre-packaged for easier use. It has been specially compiled and optimized for the IBM Power platform. PowerAI greatly eases the time, effort, and difficulty associated with getting a deep learning environment operational and performing optimally. IBM Caffe with Large Model Support (LMS) loads the neural model and data set in system memory and caches activity to GPU memory, allowing models and training batch size to scale significantly beyond what was previously possible. IBM PowerAI Distributed Deep Learning (DDL) is a MPI-based communication library, which is specifically optimized for deep learning training. An application integrated with DDL becomes an MPI-application, which will allow the use of the ddlrun command to invoke the job in parallel across a cluster of systems. DDL understands multi-tier network environment and uses different libraries (e.g. NCCL) and algorithms to get the best performance in multi-node, multi-GPU environments.

Snap Machine Learning (Snap ML), combines recent advances in machine-learning systems and algorithms in a nested manner to reflect the hierarchical architecture of modern distributed systems. This allows us to leverage available network, memory and heterogeneous compute resources effectively. On a terabyte-scale publicly available dataset for click-through-rate prediction in computational advertising, we demonstrate the training of a logistic regression classifi€er in 1.53 min.

IMPORTANT: To reserve your seat, you MUST register at the LRZ registration with a valid email address. Please choose course HDLW1W18.

This workshop is brought to you by Nvidia, IBM and LRZ.

Screen Shot 2017-12-13 at 12.24.46

Content Level: Advanced
Prerequisites: sound deep learning / AI technical background
Language: English
Presentators: Dominik Straßel, Peter Labus, Fraunhofer ITWM; Adolf Hohl, Nvidia; Thomas Parnell, IBM; Juan J. Durillo, Yu Wang, Stephan Peinkofer, LRZ
Registration: Via the LRZ registration form. Please choose course HDLW1W18.