ONLINE: PRACE Workshop: HPC code optimisation workshop

Date:

Monday, June 8 - Wednesday, June 10, 2020, 10:00-16:00 CEST

Location:

ONLINE

Contents:

This course will be delivered as an ONLINE COURSE for remote participation because of the COVID-19 measures enforced by most European governments.

REGISTRATION is strictly NECESSARY since the details to access the online course will be provided to the registered and accepted attendees only.

Please mind that the time has been changed to 10:00-12:00 & 13:00-16:00 CEST.

Please use your own laptop or PC (with X11 support and an ssh client installed) for the hands-on sessions. For GUI applications we recommend the NoMachine Enterprise Client, available for Windows, Linux and macOS. Further details can be found below.

In the ever-growing complexity of computer architectures, code optimization has become the main route to keep pace with hardware advancements and effectively make use of current and upcoming High Performance Computing systems.

Have you ever asked yourself:

Where does the performance of my application lay?
What is the maximum speed-up achievable on the architecture I am using?
Is my implementation matching the HPC objectives?

In this workshop, we will answer these questions and provide a unique opportunity to learn techniques, methods and solutions on how to improve code, how to enable the new hardware features and how to use the roofline model to visualize the potential benefits of an optimization process.

We will begin with a description of the latest micro-processor architectures and how the developers can efficiently use modern HPC hardware, in particular the vector units via SIMD programming and AVX-512 optimization and the memory hierarchy.

The attendees are then conducted along the optimization process by means of hands-on exercises and learn how to enable vectorization using simple pragmas and more effective techniques, like changing data layout and alignment.

The work is guided by the hints from the Intel® compiler reports, and using Intel® Advisor. Besides Intel® Advisor, the participants will also be guided to the use of Intel® VTune™ Amplifier, Intel® Application Performance Snapshot and LIKWID as tools for investigating and improving the performance of a HPC application. We further cover the Intel® Math Kernel Library (MKL), in order to show how to gain performance through the use of libraries.

We provide an N-body code, to support the described optimization solutions with practical hands-on.

You can ask Intel in the Q&A session about how to optimise your code. Please provide a description of your code in the registration form.

Learning Goals

Through a sequence of simple, guided examples of code modernization, the attendees will develop awareness on features of multi and many-core architecture which are crucial for writing modern, portable and efficient applications.

A special focus will be dedicated to scalar and vector optimizations for the latest Intel® Xeon® Scalable processor, code-named Skylake, utilized in the SuperMUC-NG machine at LRZ.

The workshop interleaves frontal and practical sessions. Here is a preliminary outline:

Day 1

Introduction to LRZ systems and software
Code modernization approach
Scalar optimization
Compiler autovectorization
Data layout from AoS to SoA
Memory access optimization
SDLT (Intel® SIMD Layout Templates) / Explicit vectorization / Skylake optimization

Day 2

Introduction to roofline model
Intel® Advisor analysis
Intel® Math Kernel Library (MKL) and other libraries

Day 3

Introduction to Intel® VTune™ Amplifier
Introduction to Intel® Application Performance Snapshot (APS)
LIKWID ( “Like I Knew What I’m Doing.”) Performance Tools
Q&A Session

Recommended Access Tools

Exercises will be done on the RRZE Meggie cluster, see: https://www.anleitungen.rrze.fau.de/hpc/meggie-cluster/
Please use your own laptop or PC with X11 support and an ssh client installed for the hands-on sessions.
- Under Windows
  - We recommend to install the comfortable tool MobaXterm (https://mobaxterm.mobatek.net/download-home-edition.html) which also includes an X11 client.
  - Alternatively install and run the Xming X11 Server for Windows: https://sourceforge.net/projects/xming/ and then install and run the terminal software putty: https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html
- Under macOS
  - Install X11 support for macOS XQuartz: https://www.xquartz.org/
- Under Linux
  - ssh and X11 support comes with all distributions
For GUI applications we recommend the NoMachine Enterprise Client, available for Windows, Linux and macOS. It can be downloaded for free from https://www.nomachine.com/download-enterprise#NoMachine-Enterprise-Client. See https://www.nomachine.com/getting-started-with-nomachine and https://www.anleitungen.rrze.fau.de/hpc/dialogserver/ for further details how to connect.

The workshop is a PRACE training event organized by LRZ in cooperation with Intel and RRZE.

About the Lecturers

Fabio Baruffa is a senior software application engineer at Intel. He provides customer support in the high-performance computing (HPC) area and artificial intelligence software solutions at large scale. He collaborates with several research institutes in Europe to develop prototypes of quantum computing algorithm simulations running on current HPC systems. Prior at Intel, he has been working as HPC application specialist and developer in the largest supercomputing centers in Europe, mainly the Leibniz Supercomputing Center and the Max-Plank Computing and Data Facility in Munich, as well as Cineca in Italy. He has been involved in software development, analysis of scientific code and optimization for HPC systems. He holds a PhD in Physics from University of Regensburg for his research in spintronics devices and quantum computing.

Gennady Fedorov is a Technical Consulting Engineer supporting technical and Intel Performance Libraries ( IPP, MKL and DAAL) within the Intel Architecture, Graphics and Software Group at Intel in Russia. His focus areas are Image Processing, Crypto, Compressing techniques, High Performance Computing and Artificial Intelligence.

Thomas Gruber (né Röhl) collected experience with all kinds of clustering approaches during his apprenticeship at the Erlangen Regional Computing Center (RRZE), the IT service provider for the Friedrich-Alexander-University Erlangen-Nuernberg (FAU). Afterwards, he studied Computer Science at RWTH Aachen University with emphasis on parallel programming and operating system kernel development. At the same time, he worked as a research assistant for the HPC group of the RWTH IT center. After receiving his M. Sc. degree, he went back to RRZE to work for the HPC group. Thomas Gruber leads the development of the performance tool suite LIKWID, which comprises easy-to-use tools for hardware performance monitoring, affinity control and micro-benchmarking. He also works on projects involving monitoring and analysis of hardware performance data.

Carla Guillen works as a researcher in the application support group at the LRZ. She obtained her PhD in computer science at the Technische Universitaet Muenchen in 2015. She joined the LRZ in 2009, and has been working in the fields of system-wide performance monitoring and energy optimization of large scale clusters.

Gerald Mathias works in the application support for the HPC systems at LRZ since 2015 and leads the Biolab@LRZ. After his PhD in Computational Biopyhsics at the LMU Munich he joined the chair of Theoretical Chemistry at the RUB in Bochum afterwards as a postdoc. He is experienced in the development and optimization of highly parallel ab initio and force field based molecular dynamics codes, both in Fortran and C.

Michael Steyer is a Technical Consulting Engineer supporting technical and High Performance Computing segments within the Intel Architecture, Graphics and Software Group at Intel in Germany. His focus areas are High Performance Computing and Artificial Intelligence.

Igor Vorobtsov has more than 11 years of experience in the areas of C/C++ and Fortran compilers, application tuning and developer support. Igor got a Master of Science degree in Applied Mathematics. Since joining Intel in 2008, Igor has worked as a Technical Consulting Engineer supporting software developers throughout EMEA region. Igor has a broad array of application experience, including enterprise applications and high performance computing environments.

Prerequisites:

Attendees should be comfortable with either C/C++ or Fortran programming language and basic Linux command, like make and ssh. No previous experience in vectorization and parallelization is required and profiling tools, as well.

Content Level:

he content level of the course is broken down as:

Beginner's content:	3,9h	20%
Intermediate content:	7,8h	40%
Advanced content:	7,8h	40%
Community-targeted content:	0,0h	0%

Language:

English

Teachers:

Fabio Baruffa (Intel), Gennady Fedorov (Intel), Mathias Gerald (LRZ), Thomas Gruber (RRZE), Carla Guillen (LRZ), Michael Steyer (Intel), Igor Vorobtsov (Intel)

Assistant:

Momme Allalen (LRZ)

PRACE-PAGE:

https://events.prace-ri.eu/event/1003/

Registration:

https://events.prace-ri.eu/event/1003/registrations/730/

Fee:

This course is a PRACE Advanced Training Center event. Therefore, the course is free of charge for all students and researchers from the EU or from PRACE-member-countries.

Contact:

Dr. Volker Weinberg (LRZ)

Information for ...