Development Environment for the SuperMUC HPC system

This document gives an overview of the most important program development tools;

Programming Environment

Operating environment

All nodes of the SuperMUC run a GNU/Linux based operating environment based on SuSE Linux Enterprise Server.

All compute nodes are diskless. i.e. the OS image files are held in memory.

Important: Using compilers within batch jobs is not possible, even when the modules for compilers are loaded and point to locations in some files systems. Providing compilers and other development tools would take too much space in memory

Login Shells

The following shells are supported as login shells, both on the login nodes and on the compute nodes:

  • bash, ksh, sh
  • csh, tcsh

Some other shells like zsh can be used on the login nodes, but are not available on the compute nodes. On the compute nodes the user of such a shell must switch to a supported shell for batch jobs (via #@shell)

Using Modules to handle environment settings

LRZ uses the modules approach to manage the user environment for different software, library or compiler versions. The distinct advantage of the modules approach is that the user is no longer required to explicitly specify paths for different executable versions, library versions and locations of other entities needed for the execution environment.


Development Tools and Libraries

Overview

ActivityToolsLinux versions

Source code development

Editors

vi, vim, emacs, etc.

Executable creation Compilers icc, icpc, ifort, gcc, gfortran, g95, pgif90, pgicc
Parallel executable creation Compilers mpif90, mpicc, mpiCC: provided by modules for IBM-MPI and Intel-MPI environment

Archiving

Library Archiver

ar

Object and executable file inspection

Object tools

objdump, ldd

Debugging

Debuggers

gdb, idb,  Alinea DDT, Intel Inspector

Performance analysis

Profilers, advisors

see: Optimization and Tuning Tools

Automation

Make

make, gmake

Environment configuration

modules, embedded Tcl

module

For details see the SuperMUC Software page for a listing of available software packages.

Eclipse (integrated development environment)

Eclipse was designed as an integrated development environments (IDE). The Eclipse CDT installed at LRZ provides an IDE for C/C++ development, as well as keeping the (generic) support for developing Java applications and Eclipse plug ins, since Eclipse itself is programmed in Java.

Main features of the Eclipse CDT include:

  • C/C++ Editor (basic functionality, syntax highlighting, code completion etc.)
  • C/C++ Debugger (using GDB)
  • C/C++ Launcher (APIs & Default implementation, launches and external application)
  • Search Engine
  • Content Assist Provider
  • Makefile generator
  • Graphical CVS management

Photran, which provides an IDE for Fortran90, 95, possibly 95+, has been integrated with CDT, and VTune Amplifier XE (formerly known as VTune) with full Eclipse support. 

Pfeil nach oben


Compilers and Parallel Programming

A complete list of all compilers and parallel programming libraries available on SuperMUC can be obtained using the following module command

module avail -c compilers
module avail -c parallel
 

which list all packages installed in LRZ's module classes compilers and parallel.

Intel Compilers and Performance Libraries

Since SuperMUC is based on Intel's SandyBridge/Westmere technology, LRZ recommends the usage of the Intel Compilers and Performance Libraries as a first choice. Licensing and support agreements with Intel ensure that bug-fixes should be available within reasonably short order.

GNU Compilers

We recommend to use the Intel Compilers on SuperMUC. Use the GCC compilers only if strict compatibility to gcc/gfortran is needed.

PGI Compilers

Some commercial packages still require availability of the Portland Group compiler suite. High Performance Fortran is also supported by the PGI Fortran compiler, as well. Hence, this package is available on SuperMUC. In order to use the PGI compiler on the login-nodes of SuperMUC, please load the compiler modules ccomp/pgi and fortran/pgi. Anyway, support for the PGI compilers is limited: the LRZ HPC team will report bugs to PGI, but this is kept at low priority.

Documentation for the PGI compilers is available from the PGI web site.

MPI

Programs parallelized with MPI can of course run on SuperMUC. Please refer to the Section Parallelization for general information about MPI on SuperMUC, and to the Sections IBM MPI and POE, and Intel MPI, for specific informations about the MPI standard flavors on SuperMUC. In the MPI introductory document additional details on MPI and further available MPI flavours and their handling can be found.

OpenMP

The Multi-Threading capabilities of Intel SMPs can be used via the OpenMP implementations of the Intel and PGI Compilers. Examples for OpenMP programming in Fortran as well as general information about OpenMP can be found in the Sections Parallelization: MPI, OpenMP and POE and the LRZ OpenMP introduction.

Pfeil nach oben


An Overview of Compiler Functionality

The following sections discuss the most commonly used compiler switches and extensions implemented by the Intel, PGI, and g95 Compilers. We give an overview of the available optimization switches. If you experience any difficulties, you might have to progressively switch off some of them again.

Please also consult the tuning document for further details and tools for optimization.

Optimization Options for x86_64 processors

Option
Intel
Option
PGI
Option 
gcc, gfortran
MeaningComments
-O[0-3] t.b.d. t.b.d. Specifies the code optimization level for applications. Here O0 specifies no optimization, whereas O3 specifies the highest optimization level (see Compiler Documentations for further details).
-fast t.b.d. t.b.d. Maximizes speed across the entire program. Sets the following options -ipo, -O3, -no-prec-div, -static, and -xHost .
-xHost t.b.d. t.b.d. Tells the compiler to generate instructions for the highest instruction set available on the compilation host processor. Host maybe replaced by AVX1/2, SSE4.1/2, SSE2/3, or SSSE3 (see Compiler Documentations for further details).
Intel compilers have

-opt-streaming-stores [always|never|auto]

or use the source code directive
!DEC$ VECTOR NONTEMPORAL
instead.
-Mnontemporal   Some programs may slow down with -fastsse due to prefetches used. Adding -Mnontemporal offers a different data movement scheme which may improve performance. Worth a try during code tuning. May especially be useful for memory-bound code, since this supports cache bypass for streaming writes.

Options for Code Transformations, Aliasing and Interprocedural Optimization

Option 
Intel
Option 
PGI
Option 
gcc, gfortran
MeaningComments
-fno-alias (-fno-fnalias) n/a n/a Assume no aliasing (within functions) This may give a considerable performance increase. Beware: Check your code yourself for pointer aliasing!
-unroll[<number>] -Munroll[=n:<number>] -funroll-loops, -funroll-all-loops Unroll loops <number> (optional) gives the maximum number of times for unrolling. 0 disables unrolling, omitting it enables compiler heuristics for unrolling. Note that for the Intel compiler you can instead use a source code directive
!DEC$ UNROLL(<number>)
       do i=1,imax
         ... 
in your code, which might be more useful.
-ip -Minline[=option[,option,...]] -finline-functions Enables interprocedural optimizations for single file compilation performs inline function expansion for calls to functions defined within the current source file. For Intel compilers, you can disable full/partial inlining enabled by this option by also specifying -ip_no_inlining/-ip_no_pinlining. For the PGI compiler, please check out man page and user's guide for more information on inlining.
-ipo -Minline and -Mextract with suboptions n/a Enables multifile interprocedural (IP) optimizations (between files). Performs inline function expansion for calls to functions defined in separate files. For the Intel compiler, a set of source files must be specified as an argument. For the PGI compiler, an inline library must be explicitly created.

Linkage Options

Option 
Intel
Option 
PGI

gcc, gfortran
MeaningComments
-static -Bstatic -Wl,-Bstatic
-nonshared
force static linkage Recommended if binary is to be run on a machine where the compiler is not installed. Considerably increases executable size!
-[no-]heap-arrays       Allocate automatic arrays on heap (Fortran; default is to allocate on stack, which may lead to trouble for low stack limits)
-auto       Direct all local variables to be automatic (Fortran)

  -c

compile only, do not link  This follows conventional usage.
n/a -g77libs n/a add GNU Fortran libraries Needed if g77-built objects are to be linked correctly. The Intel Compiler does not support this.

-Ldir

look for libraries in dir as well This follows conventional usage.

-lmylib

link with library libmylib.{a|so} This follows conventional usage.

Source format and Preprocessing

Option 
Intel
Option 
PGI
Option 
gcc, gfortran
MeaningComments
-FI or -fixed [-72|-80|-132] -Mfixed   fixed format source code [with possibly extended width] source file extension .f (Intel: also .ftn .for) automatically assumes fixed form
-FR or -free -Mfree   free format source code source file extension .f90 automatically assumes free form
-fpp [-P] -F   Invoke preprocessor (C-style includes) Intel Compiler: optional -P switch puts preprocessing results in output_file instead of compiling it.
Open64 Compiler: -o switch required for preprocessing to output_file.
PGI Compiler: source file must have extension .F, output is put into matching file with extension .f.
-Dname[=value] define preprocessor macro this follows conventional usage.
-Idir look for include files in dir as well. This follows conventional usage.

Options for Data and I/O

Option 
Intel
Option 
PGI
Option 
gcc, gfortran
MeaningComments
-i{2|4|8} INTEGER and LOGICAL types of unspecified KIND use the indicated amount of bytes Default value is 4; -i2 not available for Open64
-r{4|8|16} -Mr8 -r{4|8} REAL types of unspecified KIND use the indicated amount of bytes Default value is 4. A value of 8 would change all REAL variables to DOUBLE PRECISION. For the PGI Compilers only promotion from 4 to 8 byte REAL is available.
Controlled via environment run time option. See Section on Big Endian I/O in the Troubleshooting document -Mbyteswapio
-byteswapio
(probably not available) Do unformatted I/O in big endian instead of little endian PGI Compiler: should enable you to read and write data compatible to Sun and SGI platforms.

Diagnostics, Runtime Checking and Debugging

Option 
Intel
Option 
PGI
Option 
gcc, gfortran
MeaningComments
-g Include symbols for debugging Use idb or Totalview to debug, or pgdbg for PGI-compiled binaries
-check all

This option applies to Fortran Compilers only. The argument "all" switches on all available checks. It can be replaced by:

  • arg_temp_created: check for copy-in/copy-out for procedure arguments.
  • bounds: performs run-time checks on array subscripts and substring references
  • format, output_conversion: performs run-time checks on formatted I/O
  • pointers: performs run-time checks on pointers and allocatables
  • uninit: run-time checks on uninitialized variables (except module globals)
-C (g77 had -ffortran-bounds-check) run time checking Full checking may incur a large performance penalty.
-opt-report -opt-report-level[min|max]   n/a generate optimization report The Intel compiler writes the report to stderr
-list -Mlist n/a provide source listing The Intel compiler writes the source listing to STDOUT, while the PGI compiler produces a file myprog.lst from myprog.f

Parallelization and Vectorization Option

Option 
Intel
Option 
PGI
Option 
gcc, gfortran
MeaningComments
-openmp -mp   generate multithreaded code from OpenMP directives in the source code If used, this option must also be specified for linkage.
-openmp-stubs n/a   Compile OpenMP programs for serial mode; directives are ignored and a stub library for the function calls is linked. If used, this option must also be specified for linkage.
-openmp-report[0|1|2] n/a   Diagnostic level for OpenMP parallelization  
-parallel -Mconcur
[=option[,option]]
  perform (shared-memory) auto-parallelization If used, this option must also be specified for linkage. Please refer to the PGI User's Guide, Section 3.1.2 for information on the -Mconcur suboptions.
-par-report[0|1|2] n/a   Diagnostic level for automatic parallelization  
-par-threshold{n} n/a   set threshold for autoparallelization of loops -par_threshold0 : always parallelize
-par_threshold25 : parallelize if chance of perf. increase is 25%
-par_threshold75 : parallelize if chance of perf. increase is 75% (default)
-par_threshold100 : onlyparallelize if absolutely sure.

For the PGI compiler, the -Mconcur suboptions (q. v.) allow for a finer control of autoparallelization

-vec t.b.d.   Enables or disables vectorization.  
-simd t.b.d.   Enables or disables the SIMD vectorization feature of the compiler.  
-vec-report[0-5] t.b.d.   Controls the diagnostic information reported by the vectorizer. Here 0 specifies to report no diagnostic information, for the other levels please consult the Compiler Documentations.
-vec-threshold[n] t.b.d.   Sets a threshold for the vectorization of loops. -par_threshold0 : always vectorize
-par_threshold75 : vectorize if chance of perf. increase is 50%
-par_threshold100 : only vectorize if absolutely sure (default).

Compiler Directives for the Intel compiler

The following table shows the source code directives as supported by the Intel Fortran compiler to help with tuning or debugging applications. Note that for fixed source format the "!" comment symbol in the first column needs to be replaced with a "c" comment symbol.

DirectiveMeaning

!DEC$ ivdep

Ignore vector dependencies

!DEC$ loop count N

Software pipelining hint

!DEC$ distribute point

Split large loop

!DEC$ unroll

Unroll inner loop N times. Compiler heuristics used if N omitted.

!DEC$ nounroll

Do not unroll loop

!DEC$ prefetch A

Prefetch Array A

!DEC$ noprefetch A

 Do not prefetch array A

!DEC$ vector [CLAUSE]

Vectorize loop,

CLAUSE = { ALWAYS [ASSERT]|ALIGNED|UNALIGNED|TEMPORAL|NONTEMPORAL [(var1 [, var2]...)] }

For further details please see Compiler Documentations.

!DEC$ novector

Do not vectorize loop.

Pfeil nach oben


Debuggers

Debuggers with graphical Interface (GUI)

  • DDT:             Distributed Debugging Tool: a commercial product by  Allinea Software.
  • Totalview:      A commercial product by Etnus.

The GUI driven debuggers offer a graphical user interface; simple debugging sessions can therefore be  handled without  intensive, prior study of man-pages and manual.

  • DDT and Totalview are  advanced tools for more complex  debugging, especially when it comes to debugging parallel codes (MPI, OpenMP). They allow to inspect  data structures in the different threads of a parallel program, set global breakpoints, set  breakpoints in individual threads, etc.
  • DDT is the preferred debugger at SuperMUC, and the largest number of licences is available.
  • Totalview can also be used in CLI  mode, whereas DDT is a pure GUI tool.

Table of available Debuggers and Info

Please note, that the environment variables given in the column Documentation (e.q. $TOTALVIEW_DOC) refer to environment variables set by the module command on the LRZ HPC systems.

Compilers

NameInterfaceProgramming
Model
LRZ moduleDocumentation 
Recommended Debuggers
 DDT GUI g77, g95, icc, ifort serial, parallel (MPI, OpenMP) module load ddt PDF ($DDT_DOC)
Totalview GUI, CLI g77, icc, ifort serial, parallel (MPI, OpenMP) module load totalview PDF ($TOTALVIEW_DOC)
HTML

Other debuggers like gdb, idb or DDD are available, but they can hardly be used on the compute nodes.

Pfeil nach oben


Troubleshooting with the Intel, PGI and GNU Compilers

The troubleshooting section is being extended and has moved to its own web page.

Pfeil nach oben