Most HPC systems are clusters of shared memory nodes. Such SMP nodes can be small multi-core CPUs up to large many-core CPUs. Parallel programming may combine the distributed memory parallelization on the node interconnect (e.g., with MPI) with the shared memory parallelization inside of each node (e.g., with OpenMP or MPI-3.0 shared memory). This course analyzes the strengths and weaknesses of several parallel programming models on clusters of SMP nodes. Multi-socket-multi-core systems in highly parallel environments are given special consideration. MPI-3.0 has introduced a new shared memory programming interface, which can be combined with inter-node MPI communication. It can be used for direct neighbor accesses similar to OpenMP or for direct halo copies, and enables new hybrid programming models. These models are compared with various hybrid MPI+OpenMP approaches and pure MPI. Numerous case studies and micro-benchmarks demonstrate the performance-related aspects of hybrid programming.

Tools for hybrid programming such as thread/process placement support and performance analysis are presented in a "how-to" section. Hands-on exercises give attendees the opportunity to try the new MPI shared memory interface and explore some pitfalls of hybrid MPI+OpenMP programming. This course provides scientific training in Computational Science, and in addition, the scientific exchange of the participants among themselves.

Dr. Georg Hager (RRZE)
Georg Hager hold a Ph.D. in Computational Physics from the University of Greifswald. He is a senior researcher in the HPC Services group at Erlangen Regional Computing Center (RRZE) at the University of Erlangen-Nuremberg. Recent research includes architecture-specific optimization strategies for current microprocessors, performance engineering of scientific codes on chip and system levels, and special topics in shared memory and hybrid programming. His daily work encompasses all aspects of user support in High Performance Computing like tutorials and training, code parallelization, profiling and optimization, and the assessment of novel computer architectures and tools. His textbook “Introduction to High Performance Computing for Scientists and Engineers” is recommended or required reading in many HPC-related lectures and courses worldwide. In his teaching activities he puts a strong focus on performance modeling techniques that lead to a better understanding of the interaction of program code with the hardware.


Dr. Rolf Rabenseifner (HLRS)
is head of Parallel Computing - Training and Application Services at HLRS. In workshops and summer schools he teaches parallel programming models in many universities and labs. Since 1996, he has been a member of the MPI-2 Forum and since Dec. 2007 he is in the steering committee of the MPI-3 Forum and was responsible for the new MPI-2.1 standard and in charge with the development of the new MPI-3 Fortran interface. In January 2012, the Gauss Center of Supercomputing (GCS), with HLRS, LRZ in Garching and the Jülich Supercomputing Center as members, was selected as one of six PRACE Advanced Training Centers (PATCs) and he was appointed as GCS' PATC director.


10:00 Welcome
10:05 Motivation
10:15 Introduction
10:45 Programming Models: Pure MPI

11:15 Coffee Break

11:35 MPI + MPI-3.0 Shared Memory (Talk + 2 Practicals) 

13:10 Lunch 

14:10 MPI + OpenMP

15:10 Coffee Break

15:30 MPI + OpenMP continued (Talk + Practical)
16:00 MPI + Accelerators
16:15 Tools
16:25 Conclusions
16:45 Q&A
17:00 End

MPI+X - Hybrid Programming on Modern Compute Clusters with Multicore Processors and Accelerators

Exercise Sheet

The exercises about the MPI shared memory can be found in MPI.tar.gz as described in

 i.e., in the file

 and there in the subdirectories MPI/course/*/1sided with

 * = C    for the C intterface,

 * = F_20 for the old Fortran mpi module, and

 * = F_30 for the new Fortran mpi_f08 module.

The 2nd exercise block about hybrid MPI+OpenMP can be found in

with pure MPI code in the C and Fortran directories and the hybrid MPI+OpenMP Version in the solution sub-directories.


