Intel Tracing Tools: Profiling and Correctness checking of MPI programs
Short introduction to the Tracing Tools, used for performance analysis, tuning and debugging of MPI-parallel programs.
Table of contents
- Installation on LRZ HPC platforms
- MPI performance snapshots
The Intel Tracing Tools (ITAC), including Trace Collector, Trace Analyzer, and Message Checker, support the development and tuning of programs parallelized using the Message Passing Interface (MPI). These tools enable you to investigate the communication structure of your parallel program, and hence to isolate incorrect or inefficient MPI programming.
- Trace Collector provides an MPI tracing library which produces tracing data collected during a typical program run; these tracing data are written to disk in an efficient storage format for subsequent analysis.
- Trace Analyzer provides a GUI for the analysis of the tracing data.
- Message Checker allows you to identify certain classes of bugs in your MPI-parallel algorithm.
- MPI performance snapshots (MPS) provide a high-level overview of MPI execution via light-weight statistics
Installation on LRZ HPC platforms
The software is available on all HPC systems operated by LRZ.
|2017||load module mpi.intel/2017 first|
|2018||load module mpi.intel/2018 first|
Tracing is only supported for Intel MPI.
Before using either Trace Collector or Trace Analyzer it is necessary to load the environment modules for both a supported MPI implementation and then the tracing software itself (itac)
module unload mpi.<whatever you're using>
module load mpi.intel
module load itac
Compiling for tracefile generation
As long as no changes to the program are introduced - for example to explicitly call ITA routines - it is sufficient to relink the executable. In all other cases it is necessary also to recompile the sources. In every case you should however use the MPI wrapper scripts to perform compilation.
mpif77 -g -trace -c <further options> myprog.f
for compilation of Fortran 77 programs
mpif90 -g -trace -c <further options> myprog.f90
for compilation of Fortran 90/95 programs
mpicc -g -trace -c <further options> myprog.c
for compilation of C programs
mpiCC -g -trace -c <further options> myprog.C
for compilation of C++ programs
Linkage is performed similarly. Do not forget to add the
-trace option there too. Also note that
-g usually implies that optimization is turned off, unless you add them back again in the
Running the program
Startup of the program uses the standard mechanisms described in the MPI document; this usually requires (implicit) startup of a batch job. Please take care that you load the same sequence of environment modules as described above.
Automatic subroutine tracing
By default, only the MPI part of the program can be resolved ("ungrouped") into the various API calls. If you also wish to resolve subroutine calls, you either need to make use of explicit API calls, or perform automatic subroutine instrumentation. Please note that the latter method may involve a much larger overhead compared to explicit API calls.
By compiler switch
Simply specify the
-tcollect switch in addition to
-trace and recompile as well as relink your application.
By binary instrumentation
Perform the module loads as described above and build your MPI application as usual (i.e., without extra switches for tracing).
Run your application with the command line (you may need to suitably replace mpiexec if a batch script is used)
mpiexec -n <No. of MPI tasks> itcpin --profile --run -- ./myprog.exe <application flags>
Note that the
--profile switch will perform instrumentation not only on MPI, but also your own subroutine calls, libc calls etc. and may considerably increase the size of trace files, unless you take steps to filter excess information. See section 3.5 of the ITC User's Reference (linked in the documentation subsection below) for further switches usable with
itcpin. Please also note that itcpin support has been dropped for releases 9.0 and later.
An arbitrarily named configuration file may contain a large number of entries which control tracing execution. Please set the environment variable VT_CONFIG to the full path name of this file. Here is an example on what kinds of entries could be contained:
# Log file
LOGFILE-FORMAT STF # disable all MPI activity
ACTIVITY MPI OFF # enable all bcasts, recvs and sends
SYMBOL MPI_WAITALL ON
SYMBOL MPI_IRECV ON
SYMBOL MPI_ISEND ON
SYMBOL MPI_BARRIER ON
SYMBOL MPI_ALLREDUCE ON # enable all activities in the Application class
ACTIVITY Application ON
Please check out the user's guide for further settings which e.g., may be advantageous in limiting the amount of generated trace data.
LRZ specific configurations
VT_FLUSH_PREFIX environment variable, which denotes the path for the intermediate traces, is set by the
mpi_tracing environment module to point at the high-bandwidth scratch file system. The rationale for this is to prevent the /tmp filesystem from overflowing if large traces are performed.
Using Control Calls
In order to obtain more fine-grained control over the tracing procedure, it is possible to insert suitable subroutine calls into the program source code. For example, a call of
will switch off tracing for the subsequent program execution flow, and
will switch tracing back on again. With
you can mark certain program regions. Since tracing will usually involve a performance overhead it is recommended to use preprocessor macros to enable tracing only during the optimization phase, thus for a C/C++ program
# ifdef USE_VT
Note that an additional include File
VT.h is must be
mpicc -o myfoo.o myfoo.c -vtrace -DUSE_VT
(where the Macro
-DUSE_VT is arbitrarily named). This method is also applicable to Fortran programs, if the file name extension .F is used to automatically apply the C preprocessor before the actual compilation process.
For details of the ITA API please consult the documentation.
Analyzing the Trace File
After execution of your tracing run, assuming your program's name is
myprog, you will find a number of files
myprog.stf*. These are summarily analyzed by executing
Please perform the following steps:
Load the module stack described above.
Completely recompile your application with debug symbols switched on:
mpif90 -g -O2 -c foo.f90
mpif90 -g -O2 -o myprog.exe myprog.f90 foo.o ...
Dynamic linkage must be performed. This is necessary to allow the LD_PRELOAD mechanism described below to work.
Run the program as follows (you may need to suitably replace mpiexec if you use a batch script):
mpiexec -genv LD_PRELOAD libVTmc.so -n [# of MPI tasks] ./myprog.exe
The report of Message Checker is written to standard error. Please check all lines marked ERROR or WARNING. Due to compiling with debug symbols, line information will also be displayed, pinpointing the location of your bug.
Further environment variables can be specified with additional -genv clauses on the mpiexec line:
|VT_DEADLOCK_TIMEOUT||60||maximum interval to wait (in seconds) for deadlock detection|
|VT_DEADLOCK_WARNING||300||maximum interval to wait (in seconds) for deadlock warning|
|VT_CHECK_MAX_ERRORS||1||maximum number of errors before aborting|
|VT_CHECK_MAX_REPORTS||0 (unlimited)||maximum number of reports before aborting|
This list is not complete, at run time further settings are indicated in the lines of output marked INFO, as well as in the file <program name>.prot. The latter will also give an indication which variables have been modified from the default.
MPI performance snapshots
This feature does not require recompilation of your application. In order to use it, you only need to take your executable built against Intel MPI and run it as follows (e.g., with 32 tasks):
module load mps
mpiexec -mps -n 32 ./myapplication.exe
As a result, you will find files named app_stat_<date>-<time>.txt and stats_<date>-<time>.txt in your current directory, which can be evaluated using the mps tool. For example,
mps ./app_stat_20160301-131433.txt ./stats_20160301-131433.txt
will write the statistics overview for your program run to standard output. The -h switch of mps provides information about additional options you can specify.
Manuals and Weblinks
- Trace Collector Reference Guide (PDF, 1.0 MB)
- Trace Analyzer Reference Guide (PDF, 2.0 MB)
- Frequently asked Questions for Trace Analyzer and Collector (PDF, 0.1 MB)
- More information on ITC and ITA may be found on the Intel Web Site.
- For use of MPS, have a look at the introductory tutorial and the MPI Performance Snapshot User's Guide.
Within LRZ's HPC training courses, a ITA/ITC tutorial is usually provided. This includes information on
- setting up tracing runs
- programming the API
- giving hints on tracing configuration
- usage of the GUI