ALIs
kommt nochPGAS Parallel Languages
Introduction
PGAS or Partitioned Global Address Space is a parallel programming model based on the following concepts:
- multiple execution contexts with separate address spaces, thereby allowing any given execution instance to exploit memory affinity properties of the underlying architecture for good application performance
- access of memory locations on one execution instance by other execution instances
The following programming language extensions implement this concept:
- Coarray Fortran
- Unified parallel C
- Chapel, the Cascade HPCS language under development by Cray
- Titanium, an explicitly parallel dialect of Java
- X10, an experimental HPCS language under development at IBM
- Implicit parallelism is also targeted for Fortress, originally developed under the guidance of Sun Microsystems.
The target is to achieve the following advantages over other parallel paradigms:
- Integration of the type system. This greatly reduces the development and debugging effort for object based parallel codes compared to e.g., the MPI library approach.
- One-sided communication semantics. An efficient implementation can fully exploit the hardware properties (latency and bandwidth) of the underlying system interconnect. Overlap of computation and communication is fully under the programmer's control.
- Optionally, control over memory affinity improves large scale scalability compared e.g., to OpenMP shared data.
This document provides information about PGAS facilities provided on LRZ HPC systems. At present, these are only partial, and not fully optimized implementations which can be used to jumpstart code development and perform parallel algorithm research.
Coarray Fortran
Please consult the coarray subdocument for more information.
Unified parallel C
UPC is a PGAS parallel extension to the C standard.
Basic Usage on LRZ HPC systems
-
The GCC-based version of UPC is available and supports parallelism within a shared memory system. Please load the environment module
module load upc/gcc
(Beware: this module will also overload the default gcc with a version of its own; in particular, this module cannot be loaded in conjunction with any gcc module).
-
The Intel C based version of the Berkeley UPC compiler can be used on SGI Altix systems by loading the module
module load upc/bupc_icc
A simple example program
#include <upc_relaxed.h>
#include <stdio.h>
int main(){
printf("Hello World from THREAD %d (of %d THREADS)\n",
MYTHREAD, THREADS);
}
stored in hello.upc can then be compiled under the dynamic translation environment by GCC UPC with the command
upc -o hello.exe hello.upc
The resulting executable can then run using e.g., 4 threads via the command
./hello.exe -n 4 -sched-policy auto
producing its output lines in random order (since no explicit synchronization measures are provided to ensure a particular ordering).
For Berkeley UPC, the compilation is done with
upcc -o hello.exe hello.upc
and the executable is then run under the control of upcrun:
upcrun -n 4 ./hello.exe
Under thestatic translation environment, a fixed number of threads is specified at compile time. This allows more optimization, and the THREADS and MYTHREADS macro can then also be used to define other static quantities via the preprocessor. However, the resulting executable will then run only with a fixed number of threads. For GCC UPC, compilation is done via
upc -fupc-threads-4 -o hello.exe hello.upc
The resulting executable will then run using 4 threads always:
./hello.exe -sched-policy auto
For Berkeley UPC, compilation is done with
upcc -T=4 -o hello.exe hello.upc
Execution still requires the use of upcrun:
upcrun ./hello.exe
UPC Documentation
- The upc manual page (man upc) for GCC UPC
- The upcc manual page (man upcc) for Berkeley UPC
- HTML documentation for upcc (Berkeley UPC) is available in a subdirectory of $UPC_DOC, or on the Berkeley UPC web site
- UPC 1.2 specification
- UPC manual
- UPC extensions, in particular collectives
Chapel
The 1.01 preview of the reference implementation is available on some of the LRZ HPC systems. Please load the module
module load chapel
Then, the program
config const message = "Hello, world!",
printLocaleName = true;
coforall loc in Locales {
on loc {
var myMessage = message + " (from locale " + here.id + " of " + numLocales;
if (printLocaleName) then myMessage += " named " + loc.name;
myMessage += ")";
writeln(myMessage);
}
}
stored in hello-multiloc.chpl can then be compiled with
chpl -o hello.exe hello-multiloc.chpl
and then executed with
mpirun -np 4 ./hello.exe -nl 4
Notes:
- Multi-locale execution is presently limited to one node. Please contact LRZ HPC support if you wish to do more than evaluation.
- On Altix systems, multi-locale execution puts out a warning with reference to unused SHMEM. Unfortunately, SHMEM support could not be built with the present release.
- On compilation a warning with respect to speculation not being enabled is issued, which you can ignore.