ALIs
kommt nochIntel Tracing Tools: Profiling of MPI programs
Short introduction to the Tracing Tools, used for performance analysis, tuning and debugging of parallel programs.
Table of contents
Introduction
The Intel Tracing Tools, comprised of Trace Collector, Trace Analyzer and Message Checker, support the development and tuning of programs parallelized using the MPI message passing interface. Using these tools enables you to investigate the communication structure of your parallel program, and hence to isolate incorrect and/or inefficient MPI programming.
- Trace Collector provides a MPI tracing library which produces tracing data collected during a typical program run; these tracing data are written to disk in an efficient storage format for subsequent analysis.
- Trace Analyzer provides a GUI for analysis of the tracing data.
- Message Checker allows you to identify certain classes of bugs in your MPI-parallel algorithm.
Installation on LRZ HPC platforms
Installed versions
| Plattform | Trace Collector | Trace Analyzer | Remarks |
|---|---|---|---|
| IA64 Linux cluster and Altix superclusters | 7.1, 7.2 | 7.1, 7.2 | available for SGI's MPT, MPICH (e.g., Parastation MPI), and Intel MPI |
| x86_64 Intel-based systems | 7.1 | 7.1 | available for Intel MPI |
Remarks on the various MPI flavors supported
- The SGI MPT version also supports tracing of SHMEM calls. These are grouped into a separate class.
- There are linkage problems when using Tracing with MPICH installations. This appears to be due to changes in the linker functionality introduced in SLES10. As a workaround, please use the additional link switch --allow-multiple-definition. The bug has been reported and will hopefully be fixed in a future update.
- ITC will not trace applications on non-Intel processors. However, the ITA can be used also on non-Intel systems to analyze existing tracefiles.
Usage
Initialization
Before using either Trace Collector or Trace Analyzer it is necessary to load the appropriate environment module:Note that this module will load the tracing environment depending on the loaded MPI module; not all available MPI environments will be supported (see table above for details). In particular, if you change the MPI environment to be different from the default, you must unload the mpi_tracing and the reload it after the new MPI environment has been configured.module load mpi_tracing
Tracefile Generation
As long as no changes to the program are introduced - for example to explicitly call ITA routines - it is sufficient to relink the executable. In all other cases it is necessary also to recompile the sources. In every case you should however use the MPI wrapper scripts to perform compilation; on all LRZ HPC platforms the following are supported:- mpif77 -g -vtrace -c <further options> myprog.f
for compilation of Fortran 77 programs - mpif90 -g -vtrace -c <further options> myprog.f90
for compilation of Fortran 90/95 programs - mpicc -g -vtrace -c <further options> myprog.c
for compilation of C programs - mpiCC -g -vtrace -c <further options> myprog.C for compilation of C++ programs
Automatic subroutine tracing
By default, only the MPI part of the program can be resolved ("ungrouped") into the various API calls. If you also wish to resolve subroutine calls, you either need to make use of explicit API calls, or perform automatic subroutine instrumentation. Please note that the latter method may involve a much larger overhead compared to explicit API calls.By compiler switch
Simply specify the -tcollect switch in addition to -vtrace and recompile as well as relink your application.By binary instrumentation
This is available for EM64T and Itanium based applications, but is not supported for the SGI MPT used onn Altix systems. It is recommended to use this functionality with Intel MPI.-
Perform
module unload mpi.altix mpi.parastation
module load mpi.intel
module load mpi_tracing
and build your MPI application as usual (i.e., without extra switches for tracing) -
Run your application with the command line
mpiexec -n <No. of MPI tasks> itcpin --profile \
--run -- ./myprog.exe <application-specific switches>
Note that the --profile switch will perform instrumentation not only on MPI, but also your own subroutine calls, LIBC calls etc. and may considerably increase the size of trace files unless you take steps to filter excess information.
Note that the --profile switch will perform instrumentation not only on MPI, but also your own subroutine calls, LIBC calls etc. and may considerably increase the size of trace files unless you take steps to filter excess information. See section 3.5 of the ITC User's Reference (linked in the documentation subsection below) for further switches usable with itcpin
Configuration File
An arbitrarily named configuration file may contain a large number of entries which control tracing execution. Please set the environment variable VT_CONFIG to the full path name of this file. Here is an example on what kinds of entries could be contained:
# Log file LOGFILE-NAME myprog.stf LOGFILE-FORMAT STF< # disable all MPI activity ACTIVITY MPI OFF # enable all bcasts, recvs and sends SYMBOL MPI_WAITALL ON SYMBOL MPI_IRECV ON SYMBOL MPI_ISEND ON SYMBOL MPI_BARRIER ON SYMBOL MPI_ALLREDUCE ON # enable all activities in the Application class ACTIVITY Application ON |
LRZ specific configurations
The VT_FLUSH_PREFIX environment variable, which denotes the path for the intermediate traces, is set by the mpi_tracing environment module to point at the high-bandwidth scratch file system. The rationale for this is to prevent the /tmp filesystem from overflowing if large traces are performed.Using Control Calls
In order to obtain more fine-grained control over the tracing procedure, it is possible to insert suitable subroutine calls into the program source code. For example, a call of
VT_traceoff()
will switch off tracing for the subsequent program execution flow, and
VT_traceon()
will switch tracing back on again. With
VT_begin(mark), VT_end(mark)
you can mark certain program regions. Since tracing will usually involve a performance overhead it is recommended to use preprocessor macros to enable tracing only during the optimization phase, thus for a C/C++ program
#ifdef USE_VT
# include "VT.h"
# endif
.....
# ifdef USE_VT;
VT_traceoff()
# endif
Note that an additional include File VT.h is required for C programs. For the above example, you'd need to compile with the command
mpicc -o myfoo.o myfoo.c -vtrace -DUSE_VT
(where the Macro -DUSE_VT is arbitrarily named). This method is also applicable to Fortran programs, if the file name extension .F is used to automatically apply the C preprocessor before the actual compilation process.
For details of the ITA API please consult the documentation.
After execution of your tracing run, assuming your program's name is myprog, you will find a number of files myprog.stf*. These are summarily analyzed by issuing the
traceanalyzer myprog.stf
For some time, the previous command name vampir will also be available still.
Message Checking
Error detection for MPI code is only supported via Intel MPI. Hence, please perform the following steps:
-
Load the module stack supporting MPI checking:
module unload mpi_tracing module unload mpi.parastation mpi.altix # may need to unload further modules module load mpi.intel module load mpi_tracing -
Completely recompile your application with debug symbols switched on:
mpif90 -g -O2 -c foo.f90 ... mpif90 -g -O2 -o myprog.exe myprog.f90 foo.o ...Dynamic linkage must be performed. This is necessary to allow the LD_PRELOAD mechanism described below to work. -
Run the program as follows:
mpiexec -genv LD_PRELOAD libVTmc.so -n [# of MPI tasks] ./myprog.exe - The report of Message Checker is written to standard error. Please check all lines marked ERROR or WARNING. Due to compiling with debug symbols, line information will also be displayed, pinpointing the location of your bug.
Further environment variables can be specified with additional -genv clauses on the mpiexec line:
| Variable | Default Value | Meaning |
|---|---|---|
| VT_DEADLOCK_TIMEOUT | 60 | maximum interval to wait (in seconds) for deadlock detection |
| VT_DEADLOCK_WARNING | 300 | maximum interval to wait (in seconds) for deadlock warning |
| VT_CHECK_MAX_ERRORS | 1 | maximum number of errors before aborting |
| VT_CHECK_MAX_REPORTS | 0 (unlimited) | maximum number of reports before aborting |
This list is not complete, at run time further settings are indicated in the lines of output marked INFO, as well as in the file <program name>.prot. The latter will also give an indication which variables have been modified from the default.
Documentation
Manuals and Weblinks
-
Trace Analyzer Reference Guide (PDF, 1.9 MB)
-
Trace Collector Reference Guide (PDF, 0.9 MB)
-
Frequently asked Questions for Trace Analyzer and Collector (PDF, 0.1 MB)
-
More information on ITC and ITA may be found on the Intel Web Site.
Course Material
Within LRZ's HPC training courses, a ITA/ITC tutorial is usually provided. This includes information on
-
setting up tracing runs
-
programming the API
-
giving hints on tracing configuration
-
usage of the GUI
The lecture notes of the most recently held training course are available; however note that they may not always be entirely up to date with the most recently available software release.
Troubleshooting
Please consult the appropriate sections in the Troubleshooting Document.