ALIs

kommt noch

gprof

Profiling allows you to learn where your program spent its time and which functions called which other functions while it was executing. This information helps you to see in which funcions most of the time is spent and thus are candidates for optimization efforts.

Profiling involves three steps:

  • You must compile and link your program with profiling enabled (-p, and if you want line information -g)
  • Execution of the program to generate a profile data file
  • Execution of a profiling program (like gprof) to analyze the profile data

Compilation

The first step in generating profile information is to compile and link the program with profiling enabled. For the Intel Compilers this is done by setting the options -g and -p.

Note: It is important that -p option is used not only for the compilation but also to link seperate precompiled object files:

ifort -pg -c x1.f
ifort -pg -c x2.f
ifort -o x.out -pg x1.o x2.o

Note: For the Intel compiler, the default optimization is -O2. However if -g is specified, the default optimization is -O0. The get the former default, -O2 has to be specified explicitely.
Execution

The execution of the program doesn't differ from the normal execution. The executable is invoked with the same parameters and filenames and should produce the same output. In addtition it generates an output file with profiling information with the name gmon.out in the current working directory.

Profile data analysis

The generated profile data in gmon.out are analysed with the gprof command. gprof is executed with the following parameters:

gprof options [executable-file [profile-data-files...]] [> outfile]

For a detailed discussion of all options please see the gprof manpage. In this documentation we only give some short examples using the most common options.
The default output consists of three parts: A flat profile, the call graph, and and alphabetical index of the routines.

The Flat Profile

Each sample counts as 0.000976562 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 66.43      7.96     7.96        1     7.96     9.64  hamilt_
 11.15      9.30     1.34        1     1.34     1.34  outwin_
  8.14     10.28     0.98        1     0.98     1.07  hns_
  2.86     10.62     0.34  5266813     0.00     0.00  zheevx2_
  1.20     10.76     0.14                             __libm_error_support
  1.04     10.89     0.12                             zher2m_
  0.90     11.00     0.11                             cvtas_s_to_a
  0.75     11.09     0.09                             select_
  0.38     11.13     0.05                             cvt_ieee_t_to_text_ex
  0.37     11.18     0.04   445351     0.00     0.00  seclit_
...

The Flat profile contains the list of subroutines in the executable, sorted by the time used. It also gives the number of calls to the subroutine and the time spent in the routine itsself and the total time (including the subroutines of the given routine). This information is missing for routines not compiled with the -p option.

The Call Graph

the call graph shows how much time was spent in each function and its children. From this information, you can find functions that, while they themselves may not have used much time, called other functions that did use unusual amounts of time.

Here is a sample call from a small program. This call came from the same gprof run as the flat profile example in the previous chapter.

granularity: each sample hit covers 2 byte(s) for 20.00% of 0.05 seconds
index % time    self  children    called     name
                                                 <spontaneous>
[1]    100.0    0.00    0.05                 start [1]
                0.00    0.05       1/1           main [2]
                0.00    0.00       1/2           on_exit [28]
                0.00    0.00       1/1           exit [59]
-----------------------------------------------
                0.00    0.05       1/1           start [1]
[2]    100.0    0.00    0.05       1         main [2]
                0.00    0.05       1/1           report [3]
-----------------------------------------------
                0.00    0.05       1/1           main [2]
[3]    100.0    0.00    0.05       1         report [3]
                0.00    0.03       8/8           timelocal [6]
                0.00    0.01       1/1           print [9]
                0.00    0.01       9/9           fgets [12]
                0.00    0.00      12/34          strncmp <cycle 1> [40]
                0.00    0.00       8/8           lookup [20]
                0.00    0.00       1/1           fopen [21]
                0.00    0.00       8/8           chewtime [24]
                0.00    0.00       8/16          skipspace [44]

Limitations for MPI

You cannot use gprof  for MPI applications without special care. If no other means are provided, compile one task with -pg and the others wihout and run the application in MPMD mode, which results in just getting back one profile. in the next run examine another task.

IBM MPI:

The variable MP_PROFDIR allows you to specify the directory name into which POE/mpiexec stores the gmon.out file. Default: profdir.taskid

References and further information