Development Environment for Linux-based HPC systems at LRZ

This document gives an overview of the most important development tools; a troubleshooting guide for some problems which often arise is also included

Programming Environment

Using Modules to handle environment settings

LRZ uses the modules approach to manage the user environment for different software, library or compiler versions. The distinct advantage of the modules approach is that the user is no longer required to explicitly specify paths for different executable versions, library versions and locations of other entities needed for the execution environment.

Operating environment

All nodes in the cluster run a GNU/Linux based operating environment. Depending on which segment is used, a different release of the operating system may be deployed.

Software parallel segments (UV) EM64T
and Opteron
Altix, ICE
Distribution: SUSE Linux Enterprise Server SLES 11 SP1 SLES 10 SP3
Kernel 2.6.32 2.6.16
libc glibc 2.11.1 glibc 2.4
gcc release See LRZ gcc document for details
X-Window-System (Xorg-X11) 7.4 6.9
GNU all available packages
TSM Backup Client

6.2 (not available on all systems)

Batch Queuing SGE (Sun Grid Engine)

Development Tools and Libraries


Activity Tools Linux versions

Source code development

Editors vi, emacs, etc.

Executable creation


icc, icpc, ifort, gcc, gfortran, g95, pgif90, pgicc


Library Archiver


Object file inspection

Object tools




gdb, idb, totalview, ddd, DDT, Intel Inspector

Performance analysis


perfmon, histx, VTune, Intel Tracing Tools (aka VAMPIR)




Environment configuration

modules, embedded Tcl


For details see the Linux Cluster Software page for a listing of available software packages (under the headings "development tools", "mathematical libraries", and "parallelization").

Eclipse (integrated development environment)

Eclipse was designed as an integrated development environments (IDE). The Eclipse CDT installed at LRZ provides an IDE for C/C++ development, as well as keeping the (generic) support for developing Java applications and Eclipse plug ins, since Eclipse itself is programmed in Java.

Main features of the Eclipse CDT include:

  • C/C++ Editor (basic functionality, syntax highlighting, code completion etc.)
  • C/C++ Debugger (using GDB)
  • C/C++ Launcher (APIs & Default implementation, launches and external application)
  • Search Engine
  • Content Assist Provider
  • Makefile generator
  • Graphical CVS management

There are plans at LRZ to install and support additional Eclipse toolkits as these become available and/or stable enough to be recommended for general use. Among these are: Photran, which provides an IDE for Fortran90, 95, possibly 95+. It is currently in beta state, was found to be somewhat unstable on the LRZ 32-bit Linux cluster and is not available yet for IA64 platforms. It will eventually be integrated with CDT, and VTune 8.1+ with full Eclipse support, on IA64 platforms, hopefully including threading tools.

Compilers and Parallel Programming

Intel Compilers and Performance Libraries

LRZ recommends the usage of the Intel Compilers and Performance Libraries as a first choice; licensing and support agreements with Intel ensure that bug-fixes should be available within reasonably short order.

PGI Compilers

Some commercial packages still require availability of the Portland Group compiler suite; furthermore, for Opteron-based (AMD x86_64) systems this compiler may well still be the best choice. Finally, High Performance Fortran is also supported by the Fortran compiler. Hence, this package is still available on the Linux Cluster, and is also licensable in the Munich Campus area.

In order to use the PGI compiler on the interactive nodes of the Linux Cluster, please load the environment module pgi; also note that this product is not available on Itanium-based nodes. Support for the PGI compiler is limited: the LRZ HPC team will report bugs to PGI, but this is kept at low priority.

Documentation for the PGI compilers is available from the PGI web site.


Programs parallelized with MPI can of course run on the Linux cluster. Please refer to the introductory document for further details on available MPI flavours and their handling.


The Multi-Threading capabilities of Intel SMPs and SGI's ccNUMA can be used via the OpenMP implementations of the PGI and Intel Compilers. Examples for OpenMP programming in Fortran as well as general information about OpenMP can be found in the LRZ OpenMP introduction.

An Overview of Compiler Functionality

The following sections discuss the most commonly used compiler switches and extensions implemented by the PGI and Intel Compilers. We give an overview of the available optimization switches. If you experience any difficulties, you might have to progressively switch off some of them again.

Please also consult the tuning document for further details and tools for optimization.

Optimization Options for x86_64 compatible processors

Meaning Comments
-O3 -xP -align -scalar_rep -prefetch -fastsse -tp p7-64   maximum optimization for Pentium 4 Nocona (and later) Processor with SSE3 extensions and 64 bit extensions. Note that SSE* will only work for properly aligned data.

On an AMD Opteron, Intel's -xP may not work. Specify -xW instead.

Intel version 10 and higher has -opt-streaming-stores [always|never|auto] ,otherwise use the source code directive
-Mnontemporal   Some programs may slow down with -fastsse due to prefetches used. Adding -Mnontemporal offers a different data movement scheme which may improve performance. Worth a try during code tuning. May especially be useful for memory-bound code, since this supports cache bypass for streaming writes.

Optimization Options for Intel Itanium processors

Meaning Comments
-O3 -tpp2 -O3 Enables -O2 optimizations plus more aggressive optimizations (including -ftz) -tpp2 is presently the default setting, i.e., tuning is performed for the Itanium2. Note that not all codes will show increased performance using -O3, so we recommend checking against -O2.
-fast not available Maximizes speed across the entire program by combining -O3 -ipo -static Do not use for parallel programs or with SCSL on Altix
-ftz (default) Denormal result is set to zero (switch off: -ftz-), no IEEE exception raised This gives better performance at cost of IEEE conformity by suppression of FP assist. Recommended for HPC production code after check of numerical stability. See also the -fpe0, -fpe1 and -fpe3 options in the ifort man page for other exceptions, especially underflow.
-mp, -mp1 n/a Maintain Floating Point precision These option restrict optimization to maintain declared precision and to ensure that floating point arithmetic conforms more closely to the ANSI and IEEE standards. These options adversely affect performance, the impact being less for -mp1 than for -mp.

Options for Code Transformations, Aliasing and Interprocedural Optimization

Meaning Comments
-fno-alias (-fno-fnalias) n/a n/a Assume no aliasing (within functions) This may give a considerable performance increase. Beware: Check your code yourself for pointer aliasing!
-unroll[<number>] -Munroll[=n:<number>] -funroll-loops, -funroll-all-loops Unroll loops <number> (optional) gives the maximum number of times for unrolling. 0 disables unrolling, omitting it enables compiler heuristics for unrolling. Note that for the Intel compiler you can instead use a source code directive
!DEC$ UNROLL(<number>)
		   do i=1,imax
in your code, which might be more useful.
-ip -Minline[=option[,option,...]] -finline-functions Enables interprocedural optimizations for single file compilation performs inline function expansion for calls to functions defined within the current source file. For Intel compilers, you can disable full/partial inlining enabled by this option by also specifying -ip_no_inlining/-ip_no_pinlining. For the PGI compiler, please check out man page and user's guide for more information on inlining.
-ipo -Minline and -Mextract with suboptions n/a Enables multifile interprocedural (IP) optimizations (between files). Performs inline function expansion for calls to functions defined in separate files. For the Intel compiler, a set of source files must be specified as an argument. For the PGI compiler, an inline library must be explicitly created.

Linkage Options

Meaning Comments
-static -Bstatic -Wl,-Bstatic
force static linkage Recommended if binary is to be run on a machine where the compiler is not installed. Considerably increases executable size!
-[no-]heap-arrays       Allocate automatic arrays on heap (Fortran; default is to allocate on stack, which may lead to trouble for low stack limits)
-auto       Direct all local variables to be automatic (Fortran)


compile only, do not link  This follows conventional usage.
n/a -g77libs n/a add GNU Fortran libraries Needed if g77-built objects are to be linked correctly. The Intel Compiler does not support this.


look for libraries in dir as well This follows conventional usage.


link with library libmylib.{a|so} This follows conventional usage.

Source format and Preprocessing

Open64 (IA64)
-FI or -fixed [-132] -Mfixed -fixedform fixed format source code [with possibly extended width] source file extension .f (Intel: also .ftn .for) automatically assumes fixed form
-FR or -free -Mfree -freeform free format source code source file extension .f90 automatically assumes free form
-fpp [-F output_file] -F -E -o output_file Invoke preprocessor (C-style includes) Intel Compiler: optional -F switch puts preprocessing results in output_file.
Open64 Compiler: -o switch required for preprocessing to output_file.
PGI Compiler: source file must have extension .F, output is put into matching file with extension .f.
-Dname[=value] define preprocessor macro this follows conventional usage.
-Idir look for include files in dir as well. This follows conventional usage.

Options for Data and I/O

-i{2|4|8} INTEGER and LOGICAL types of unspecified KIND use the indicated amount of bytes Default value is 4; -i2 not available for Open64
-r{4|8|16} -Mr8 -r{4|8} REAL types of unspecified KIND use the indicated amount of bytes Default value is 4. A value of 8 would change all REAL variables to DOUBLE PRECISION. For the PGI Compilers only promotion from 4 to 8 byte REAL is available.
(default setting) -Mlfs (for Itanium, try using -mbig-endian, default is -mlittle-endian on Linux) Enable I/O on files with a size of > 2 Gigabyte The Intel compilers automatically support large files
Controlled via environment run time option. See Section on Big Endian I/O in the Troubleshooting document -Mbyteswapio
(probably not available) Do unformatted I/O in big endian instead of little endian PGI Compiler: should enable you to read and write data compatible to Sun and SGI platforms.

Diagnostics, Runtime Checking and Debugging

Meaning Comments
-g Include symbols for debugging Use idb or Totalview to debug, or pgdbg for PGI-compiled binaries
-check all

This option applies to Fortran Compilers only. The argument "all" switches on all available checks. It can be replaced by:

  • arg_temp_created: check for copy-in/copy-out for procedure arguments.
  • bounds: performs run-time checks on array subscripts and substring references
  • format, output_conversion: performs run-time checks on formatted I/O
  • pointers: performs run-time checks on pointers and allocatables
  • uninit: run-time checks on uninitialized variables (except module globals)
-C (g77 had -ffortran-bounds-check) run time checking Full checking may incur a large performance penalty.
-opt-report -opt-report-level[min|max]   n/a generate optimization report The Intel compiler writes the report to stderr
-list -Mlist n/a provide source listing The Intel compiler writes the source listing to STDOUT, while the PGI compiler produces a file myprog.lst from myprog.f

Parallelization Option

Open64 (IA64)
Meaning Comments
-openmp -mp n/a generate multithreaded code from OpenMP directives in the source code If used, this option must also be specified for linkage.
-openmp-stubs n/a n/a Compile OpenMP programs for serial mode; directives are ignored and a stub library for the function calls is linked. If used, this option must also be specified for linkage.
-openmp-report[0|1|2] n/a n/a Diagnostic level for OpenMP parallelization  
-parallel -Mconcur
n/a perform (shared-memory) auto-parallelization If used, this option must also be specified for linkage. Please refer to the PGI User's Guide, Section 3.1.2 for information on the -Mconcur suboptions.
-par-report[0|1|2] n/a n/a Diagnostic level for automatic parallelization  
-par-threshold{n} n/a n/a set threshold for autoparallelization of loops -par_threshold0 : always parallelize
-par_threshold25 : parallelize if chance of perf. increase is 25%
-par_threshold75 : parallelize if chance of perf. increase is 75% (default)
-par_threshold100 : onlyparallelize if absolutely sure.

For the PGI compiler, the -Mconcur suboptions (q. v.) allow for a finer control of autoparallelization

Compiler Directives for the Intel compiler

The following table shows the source code directives as supported by the Intel Fortran compiler to help with tuning or debugging applications. Note that for fixed source form the "!" comment symbol in the first column needs to be replaced with a "c".

!dir$ ivdep

Ignore vector dependencies

!dir$ swp

Try to software pipeline an inner loop

!dir$ noswp

disable software pipelining

!dir$ loop count N

Software pipelining hint

!dir$ distribute point

Split large loop

!dir$ unroll

Unroll inner loop N times. Compiler heuristics used if N omitted.

!dir$ nounroll

Do not unroll loop

!dir$ prefetch A

Prefetch Array A

!dir$ noprefetch A

 Do not prefetch array A


Command Line Oriented Debuggers  (CLI)

gdb (GCC)  and idb (Intel Compiler)are available on the Linux Clusters of LRZ

Theses debuggers are command line driven (CLI) und require some knowledge of their commands by the user. Their functionality is quite powerful,and thus supply a wealth of commands; the learning curve for a successful debugging session is therefore quite high.
The Intel debugger idb can be used in the "gdb mode", which means that a user aquainted with the gdb commands can also use these with the idb debugger.
These debuggers are mainly used for debugging serial programs.

idb supplies a graphical interface for the "casual user" in order to lower the "learning curve"; this interface  is available via the command " idb  -gui"

Debuggers with graphical Interface (GUI)

  • DDD:             Data Display  Debugger: a Open Source  project by A. Zeller (Uni Saarland)
  • DDT:             Distributed Debugging Tool: a commercial product by  Allinea Software
  • Totalview:      a commercial product by  Etnus

The GUI driven debuggers offer a graphical user interface; simple debugging sessions can therefore be  handled without  intensive, prior study of man-pages and manual.

  • DDD is a graphical frontend for gdb  and supplies its capabilities within an intuitive graphical interface.
  • DDT and Totalview are  advanced tools for more complex  debugging, especially when it comes to debugging parallel codes (MPI, OpenMP). They allow to inspect  data structures in the different threads of a parallel program, set global breakpoints, set  breakpoints in individual threads, etc.
  • Totalview can also be used in CLI  mode, whereas DDT is a pure GUI tool.

For such complex sessions  the study of the respective documentation is  unavoidable and recommended before resorting to the LRZ support staff.

Table of available Debuggers and Info

Programming Environment  availability
LRZ module
gdb CLI c, c++, fortran ,g77,
gfortran (gcc 4.0)
serial all Linux based machines command line "gdb" n/a man page
idb CLI, (GUI) icc, ifort serial all LRZ managed Linux cluster machines command line "idb" , when module is loaded module  load intel_idb man page
DDD GUI c,c++,g77 serial all LRZ managed Linux cluster machines command line "ddd" n/a PDF (link: .....)
DDT GUI c,c++,g77,g95,icc,ifort serial, parallel (MPICH, MPT) all LRZ managed Linux cluster machines command line "ddt" , when module is loaded module  load ddt PDF (link: .....)
Totalview GUI, CLI c,c++,g77,icc,ifort serial, MPI (MPICH, MPT), OpenMP all LRZ managed Linux cluster machines command line "totalview" module load totalview
(loaded via default)
PDF (link: .....)HTML

Threading Tools

Threading Tools allow you to perform correctness and performance checking on multi-threaded applications (running in shared memory). The parallelization method may be based on POSIX or Linux Threads, or on OpenMP. For OpenMP applications it is necessary to use the Intel compilers in combination with suitably chosen compiler switches to perform the analysis of applications:

Intel Inspector
Inspector allows you to perform correctness checking on multi-threaded applications (running in shared memory).
Intel Amplifier
Intel Amplifier XE (formerly VTune) allows you to perform performance analysis on multi-threaded applications (running in shared memory). The analyzer also collects, analyzes, and displays hardware performance data from the system-wide view down to a specific function, module, or instruction.

Troubleshooting with the Intel, PGI and GNU Compilers

The troubleshooting section is being extended and has moved to its own web page.