Troubleshooting: Tips and Tricks for using the LRZ HPC Systems

This document is a collection of hints which may help you to solve problems encountered when running programs on LRZ's HPC systems.

Table of contents

Questions on accessing the HPC systems

Am I allowed to use the LRZ cluster systems for my simulation work?

If you are involved in scientific work at an institute that is connected to a Munich University or a University in Bavaria, then in principle you can obtain an account. To this end, please follow the instructions given in the introduction document for the cluster usage.

Am I allowedto use SuperMUC for my simulation work?

If you are a scientist at a German Research Institution, you can apply for a project on the big system. Please follow the instructions given in the introduction document for supercomputing resources. Note that via this route it is also possible to obtain testing resources in an uncomplicated manner.

Do I need to pay for usage of the HPC systems?

No, because all HPC systems in public use are jointly funded by the Federal Republic of Germany and the Free State of Bavaria. However, if you have special needs (e.g., for very large disk storage quotas) not covered by LRZ basic HPC services, these come with a fee, so you need to ask us for details as well as a quote.

Which of the two systems (Cluster or SuperMUC) should I apply for?

Assuming you are formally eligible for both systems: This depends on the CPU, memory and disk resources you need for your computation. Please consult the technical description of the supercomputer as well as the technical description of our clusters to see what fits your needs. If you do not know your simulation's requirements it may be a good idea to start off with a cluster account. If you are not eligible for cluster usage, you can apply for a test project (see above). For usage of the big system a requirement on your simulation typically will be scalability i.e., the ability to use many (if possible more than a few thousand) processor cores efficiently.

How do I log in to the HPC systems?

If you have a valid account, you should be able to access the systems as described in the SuperMUC login document and the cluster login document, respectively. If you have trouble, please try logging into the ID portal to check the status of your account, or to do necessary password changes etc. Only if this does not work, contact the Service Desk.


Common Problems with the Compilers

This section covers the Intel, PGI and GNU compilers as pointed out for each section.

Why does my IA32 executable fail when static arrays are very large?

Note: This is becoming (mostly) obsolete upon migrating to 64 bit.

When having more than around 1 GByte of static arrays, one starts to obtain segmentation faults when starting up the program. On IA32, it should however be possible to use data until the process memory limit of 2 GByte is reached. This presently is only a problem with the g95 compiler on 32 bit systems, but some alternatives for other compilers remain documented below.

Workaround:

  1. Use static linkage: This is the recommended workaround. The static linkage flag is -Bstatic for the PGI compiler, and -static for the GNU and Intel compilers.

  2. Use dynamic allocation: Replace

    	DOUBLE PRECISION A(150000000)
    	by
    	DOUBLE PRECISION, ALLOCATABLE :: A(:)
    	 INTEGER :: ISTAT
    	 ... (further declarations)
    	 ALLOCATE(A(150000000), STAT=ISTAT)
    	 IF (ISTAT /= 0) THEN
    	 STOP 'ALLOCATION FAILURE'
    	 END IF
    	 ... (further code, until A not used any more)
    	 DEALLOCATE(A, STAT=ISTAT)
    	 IF (ISTAT /= 0) THEN
    	 STOP 'DEALLOCATION FAILURE'
    	 END IF
    	

    This is the right way to do things in Fortran 90, at least for newly developed code; you can now adjust A to the size really needed.

  3. Intel Compiler: Add a common statement to your array declaration as shown here

    	DOUBLE PRECISION A(150000000)
    	COMMON /COM_INTEL/ A
    	

    (more than one array may of course appear in the same common block) and specify the -Qdyncom"COM_INTEL" option to the Intel Compiler. Please note that this will not work together with the -g debug option. This is a known bug.

  4. pgf77 Compiler: Add a dynamic common statement and allocate that

    	DOUBLE PRECISION A(150000000)
    	 COMMON, ALLOCATABLE /COM_PGI/ A
    	 ... (further declarations)
    	 ALLOCATE(/COM_PGI/, STAT=ISTAT)
    	 IF (ISTAT .NE. 0) THEN
    	 STOP 'ALLOCATION FAILURE'
    	 END IF
    	 ... (etc. as above)
    

    This uses a Fortran extension available only for pgf77, not for pgf90.

  5. g77 Compiler: A compilation problem is fixed since the 3.1 release of the gcc.

Code fails to link ("Relocation truncated to fit")

This may happen on x86_64 based systems if your data segment becomes larger than 2 GBytes. For the Intel compiler, please use the compiler options -mcmodel=medium -shared-intel to build and link your code. The -fpic option should be avoided. Other compilers (GCC, PGI) should simply use -mcmodel=medium. For NAG Fortran, try -Wc,-mcmodel=medium. Note that this problem does not arise if you manage memory on the heap, so we recommend converting static arrays to allocatable ones.

Using large temporary arrays in subroutines fails

For the Intel compiler, using large automatic arrays as in

		subroutine foo(n, u)
		integer :: n
		real(rk) :: u(n)
		...
		real(rk) :: temp(n)
		...
		end subroutine

leads to segmentation faults and/or signal 11 crashes of the generated executables. The reason is that automatic arrays are placed on the stack, and the stack limit may be too low.

Workarounds:

  1. Use the -heap-arrays compiler switch to move allocation to the heap. You can also specify a size modifier if only large arrays should be thusly treated.

  2. Increase the stack limit via the command ulimit -s unlimited. Note that special measures might be needed for MPI parallel programs to propagate this setting across nodes.

  3. Change over to use dynamic allocation:

     subroutine foo(n, u)
     integer :: n
     real(rk) :: u(n)
     ...
     real(rk), allocatable :: temp(:)
     ...
     allocate(temp(n), ...) ! allocation status query omitted here, please check to be safe
     ...
     deallocate(temp)
     end subroutine
    

    this will use the heap for the required storage.

icc and icpc fail to compile my (assembler) code

icc and icpc do their best to behave like the GNU compilers. However, they do not support assember. Please use the -no-gcc compiler switch. This will disable the gcc macros, and hence suppress using assembler statements which are (usually) shielded by macro invocations.

My program stops in I/O when writing or reading large files

Your file may be larger than 2 Gbytes and hence beyond the 32 bits supported by the traditional open() system call. Linux nowadays does support file sizes larger than 2 GBytes, however you may need to recompile your program to use this feature.

  1. GNU C compiler (gcc): Please recompile all sources containing I/O calls using the preprocessor macro _FILE_OFFSET_BITS=64, i. e.

    gcc -c -D_FILE_OFFSET_BITS=64 (... other options) foo.c
    

    See this page for further details (some of which may be outdated).

  2. PGI Fortran compiler: Please use the -Mlfs compiler switch when linking.

  3. Intel Fortran compiler: Automatically supports large files. However, there are limits on the record sizes.

  4. On 64 bit systems in 64 bit mode no problems should occur since large files should be supported by default. Note however that there still may be limits for accessing large files via NFS.

I've got a lot of Fortran-unformatted Files from big-endian systems (old vector or IBM Power). Can I use those?

Yes. There are two variants of this situation:

  1. Portability of unformatted data. In this situation you want to use both Intel (little endian) and other (big endian) platforms concurrently.

    Compiler

    Action

    PGI

    Use the compilation switch -Mbyteswapio

    Intel

    Set the following environment variable (under sh, ksh, bash) before running your executable:
    export F_UFMTENDIAN="big" 

    In this case all unformatted files are operated on in big endian mode.

  2. Migration from one platform to the other. Here you need to write a program to convert your data from (or to) big endian once and for all. In the following we shall assume that conversion happens from big endian to little endian, and unit 22 is used to read in the big endian unformatted data.

    Compiler

    Action

    PGI

    Use the OPEN statement specifier CONVERT in your source:
    OPEN(22, FILE='mysundata', FORM='UNFORMATTED', CONVERT='BIG_ENDIAN')

    Intel

    Set the following environment variable (under sh, ksh, bash) before running your executable:
    export F_UFMTENDIAN="little;big:22"
    This will switch I/O to big endian on unit 22 only.

Please note that you need to perform testing on data files from more exotic big endian platforms because assumptions still are made on IEEE conformance and Fortran record layout.

Generally the Intel Compiler gives you more flexible handling since the functionality is supported by the run time environment and no code recompile is required. You can also specify more than one unit via a list of comma-separated values, or a range of units, i. e. 10-20.

Please also refer to a section below on how to use this functionality in conjunction with MPI.

Reading unformatted direct access files generated on other HPC platforms

While the above mentioned method works fine for unformatted sequential files, care must be taken to read unformatted direct-access files generated on other platforms. When a direct acess file is opened, the parameter: ...,access='DIRECT', recl=irecl,... is required, specifying the record length. The unit irecl refers to is implementation dependent: E.g., 4 Byte words on Itanium using the Intel compiler, 8 Byte words on the VPP. It is therefore good practice to set the parameter irecl before the open call via the inquire function.
Assume the largest record one wants to write is an array A, which was declared as

         real,dimension(n) :: rval

Then one should add the following line before the open call:

         inquire(iolength=irecl) rval

and use irecl in the following open statement. Thus the assigned record length for the direct-access file becomes independent of the implementation. 

Maximum record length for unformatted direct access I/O for Intel ifort

Up to compiler release 10.1, Intel's documentation does not provide any information on this. The maximum value is 2 GBytes (231 bytes) for each record; note that the storage unit used is 4 bytes unless the switch -assume byterecl is specified, in which case the storage unit is 1 byte.

Compiler does not optimize as specified

The Intel compiler may occasionally give the complaint "fortcom: Warning: Optimization suppressed due to excessive resource requirements; contact Intel Premier Support". In this case, please try the -override-limits switch. However, this may lead to very long compilation time and/or considerable memory usage. If system resources are overstrained, the compilation may fail anyway. If compilation completes, the generated code may be incorrect. In the latter two cases please send your source file(s) to the LRZ support team.

Gradual underflow optimization: -ftz compiler option may improve performance

This applies to usage of the Intel compilers; the material is drawn from the SGI document "Linux Application Tuning Guide", Chapter 2, The SGI Compiling Environment. Many processors do not handle denormalized arithmetic (for gradual underflow) in hardware. The support of gradual underflow is implementation-dependent. Use the -ftz option with the Intel compilers to force the flushing of denormalized results to zero.

Note that frequent gradual underflow arithmetic in a program causes the program to run very slowly, consuming large amounts of system time (this can be determined with the 'time' command). In this case, it is best to trace the source of the underflows and fix the code. Gradual underflow is often a source of reduced accuracy anyway.

When starting my binary, it complains about missing libpgc.so (IA32/Opteron/EM64T)

This is a PGI compiler issue and hence only relevant for x86 and possibly x86_64 systems. Please load the environment module fortran/pgi/x.y,  or ccomp/pgi/x.y using the version number x.y you used for compiling the application. This should correctly set the LD_LIBRARY_PATH variable.

Intel C, C++ or Fortran compilers: Linkage fails

This not uncommonly happens if you need to link against system libraries (e.g., libX11, libpthread, ...). Of course there are many possible reasons:

  1. Check whether you have specified all needed libraries

  2. Check whether you are trying to link 32 bit objects into a 64 bit executable. This is not possible.

  3. If you use the -static option of the compiler in your linkage command, please remove it or replace it by -i-static (for 9.0 and newer Intel compilers) to only link the Intel libraries statically.

See also the linkage problems with MPI below for further information

When starting my binary, it complains about missing symbol (e.g., 0_memcopyA)

This can be a problem when using non-default versions of the Intel Compilers, or mixing different versions of the C and Fortran compilers. When doing e.g., a

module switch fortran fortran/<non-default version>

for compilation, this setting must also be performed before execution of the program. Otherwise the wrong base library may be bound at run time; in fact if the order of library entries in $LD_LIBRARY_PATH is wrong it may happen that the wrong library is bound from the C installation for a Fortran program (or vice versa). There are a number of possibilities to deal with this problem:

  1. If you use e.g., Fortran only, always perform the command sequence

    module unload ccomp
    module switch fortran fortran/<non-default version>
    

    before either compiling or running your executable.

  2. ifort supports the -static link time switch which statically links in the Intel libraries. However, static memory is then limited on a 64 bit system.

  3. Use the -Xlinker -rpath [path_to_libraries] switch at linkage to fix the path chosen for resolution of the shared libraries. We're considering to make this the default setting in the compiler configuration file.

How to get an error traceback for your code

If you are using Intel version 8.1 (and higher) compilers, the -traceback option should get you a traceback if your code fails. Adding the -g option may provide source line information as well. You can also add -fpe0 if you suspect that your code fails due to floating point exception error. Note that all of the above can (and perhaps should) be specified in addition to any options used for the production code. Example:

  program sample
  real a(10), b(10)
  do i=1,10
  b(i)=0.0
  a(i)=a(i)/b(i)
  end do
  stop
  end

$ ifort -fpe0 -traceback -g sample.f
$ ulimit -c unlimited
$ a.out

   forrtl: error (65): floating invalid
   Image PC Routine Line Source
   a.out 4000000000002D11 MAIN__ 6 sample.f90
   a.out 4000000000002A80 Unknown Unknown Unknown
   libc.so.6.1 2000000000435C50 Unknown Unknown Unknown
   a.out 40000000000027C0 Unknown Unknown Unknown

Note that the ulimit -c setting is necessary if you want to investigate a core dump.


Common Problems using OpenMP

my OpenMP program (Fortran, C or C++) segfaults upon or shortly after startup

All static arrays and variables are put on the stack (this ensures thread-safeness if -openmp is used). Hence one of the following needs to be done:

  • Increase the stack limit via e.g., ulimit -s 1000000 (for 1 GB of stack).

  • If you use the Intel compilers and the stack size is already adjusted, perform export KMP_STACKSIZE=32m to adjust the thread individual stack size to a higher value than the default 4 MB (here 32 MB).

  • Convert large static arrays to allocatable and allocate storage dynamically. For private entities this may not always be feasible, though.

Note that specifying -save (or the SAVE attribute for large arrays) may also be possible in some instances, but you need to check that no problems with thread-safeness ensue since SAVEd storage will be in shared scope by default. Also, there may be additional limits (e.g. within the kernel): do not count on being able to use more than 2 GByte on the stack even if limits are set appropriately on a 64 bit system.


Common Problems using MPI

My MPI program fails to compile

If for your C++ MPI compilation you receive error messages like "SEEK_SET is #defined but must not be for the C++ binding of MPI.", sometimes also "Include mpi.h before stdio.h", then please consider reworking the header ordering in your source code. As a workaround, it is also possible to set the macro  -DMPICH_IGNORE_CXX_SEEK.

sgi MPT consumes to much memory

see: specific page about SGI's MPI implementation and how to control it.

sgi MPT does not allow static linking

The message passing toolkit from sgi is delivered with shared libraries only. Hence it is not possible to perform static linking (-static or -fast switch) of MPI programs in the mpi.mpt environment.

My MPI program crashes. What do I do?

The symptom will look somewhat like this (sgi MPT):

MPI: MPI_COMM_WORLD rank 0 has terminated without calling MPI_Finalize()
MPI: aborting job
MPI: Received signal x (x may e.g. be 11)

Even if your program appeared to run correctly on another machine/with a different number of CPUs, there still may be bugs in the program. There also may be bugs in the MPI implementation, but that is less probable. To find out where bad things are happening, please perform a traceback procedure as described below.

MPI crash due to incorrect header information (any MPI)

If debugging shows that MPI calls very obviously deliver incorrect results (especially administrative calls), please check whether you've got a file called mpi.h, mpif.h or mpi.mod somewhere in your private include path which interferes with the corresponding files in the system include path. This may lead to errors since different MPI implementations are not binary or even source compatible. Please either remove the spurious files or change your include path so these files are not referenced.

MPI crash due to exceeding internal limits (sgi MPT)

Note that if the crash is initiated by a message similar to

*** MPI has run out of unexpected request entries.
*** The current allocation level is:
***     MPI_REQUEST_MAX = 16384

this is typically due to exceeding an MPT internal limit. In this case, you simply need to set the referenced environment variable (in the above case, MPI_REQUEST_MAX) to a value sufficient to cover your application's needs. Some experimenting may be necessary, also consult the mpi (1) manual page for the functionality and possible side effects of the referenced variable.

Traceback for parallel codes

The following recipe works for SGI's MPI implementation (MPT).

First, build your application as described in the section about obtaining an error traceback (in the serial case), except that you should use mpif90, mpicc etc. Then, perform the following command sequence inside a SLURM batch script or inside a salloc shell:


$ srun_ps -n 32 ./myparprog.exe

to trace back to the point in the code where the crash happens.

Master-Slave Codes

Some user applications run in master-slave mode. This may be a configuration where e.g. the process with MPI rank 0 does not actually do any computational work but is only responsible for administrative stuff. Here are some hints on how to deal with this situation.

Master consumes CPU Resources

On some MPI variants, the Master will consume CPU resources even though it is actually only waiting e.g., for an incoming message (spinlock). Depending on your code, this may lead to a performance and/or scalability problem if you configure for resource sharing as described in the previous subsection. Here are some suggestions on how to deal with this situation:

  • Parastation MPI: Switch off shared-memory communication via
    export PSP_SHAREDMEM=0
    However depending on your communication patterns this measure in itself may degrade performance.

  • SGI MPT: A setting of e.g., export MPI_NAP=100 will put idling processes to sleep after 100 milliseconds.

  • Self-regulatory: Teach the master to do a renice on itself. But note that this is not reversible.

If nothing helps, you will need to return to allotting the master its own CPU. In any case, please check your performance with suitable test scenarios before burning lots of low-quality cycles.

-fast compiler switch prevents linking on ICE or UV systems

You cannot use the -fast compiler switch at the linking stage with SGI MPT. See the section below for an explanation.


Hybrid MPI and OpenMP/threaded programs

Performance of Hybrid Code

Before going into production run with a code which supports hybrid mode, either via OpenMP or via automatic parallelization, please check whether performance is not better running with one thread per MPI process.
Please note: Altogether removing -openmp may improve performance of hybrid MPI+OpenMP codes (which then run as pure MPI codes): For these codes, if you are running with OMP_NUM_THREADS set to 1 because you want to run the "pure" MPI case, the performance of your code may be better if you compile/link your code without the -openmp flag. If you compile/link with the flag, the performance of your code may be penalized with the OpenMP overhead even though you don't want to use OpenMP since the compiler may produce less optimized code due to the OpenMP induced code transformations.

For codes that have explicit calls to OpenMP functions, either shield the calls with !$ directives, or compile them for the "pure" MPI case using the -openmp_stubs option instead of -openmp. A code compiled with -openmp_stubs will not work if OMP_NUM_THREADS is set to a value greater than 1.

Note that there may well be cases in which retaining hybrid functionality may give a performance advantage e.g. if your code becomes cache-bound and little shared-memory synchronization is required. But you need to check this, and optimize the number of threads used if you decide in favour of hybrid mode.


Problems with parallel tracing (Intel Tracing Tools / Vampirtrace)

Why can I only resolve MPI calls, but not my own subroutine calls?

Automatic subroutine tracing is supported in the newest releases of tracing tools (7.1) and compilers (10.0). Use the -tcollect compiler switch in addition to -vtrace after loading the appropriate modules. To reduce overhead, you can also use the VT API to manually insert instrumentation into your source.

My program runs, but crashes when trying to write the trace file

A typical error message might look like

    [0] Intel Trace Collector INFO: Writing tracefile a.out.stf in
    /home/cluster/a2832ba
    PSIlogger: Child with rank 1 exited on signal 15.
    PSIlogger: Child with rank 0 exited on signal 11.

(for Parastation MPI, sgi MPT produces a stack traceback). The trace data are mostly corrupt. The reason for this behaviour may be that you have a global symbol which clashes with a system call used by the tracing library e.g., a static variable

    double *time;

Please rename your symbol so as to not clash, or convert it to local scope.


Problems with using existing binaries

Failing to link with an existing object file

There may be various issues here:

  1. First run the file command on the object(s):

    file foo.o
    

    The result must be consistent with the platform you're working on (e.g. ELF 64-bit LSB relocatable, IA-64, version 1 (GNU/Linux), not stripped on an Itanium-based system.

  2. If you get an error message like

    undefined reference to `__ctype_b'
    

    you need to change the source code of your application and recompile. Symbols beginning with __ should not be used at all if possible since they are only meant for internal glibc usage and are not exported any more in newer glibc releases.

    A workaround in the above case may be to replace the __ctype_b by *__ctype_b_loc()in your code.

Failing to start up with message "version `GLIBC_2.xx' not found"

The problem here is that the binary was built for a different Linux distribution (with a different - often newer - GLIBC version) than the one deployed at LRZ. The solution is to rebuild the program on the same version of the same distribution. Service Requests to update GLIBC on the LRZ systems will usually be denied, because essentially all other programs depend on GLIBC, and many would stop working if such an update were performed. Therefore, GLIBC upgrades are tied to major operating system updates, which happen about twice per decade.


Issues with batch queuing and batch jobs

Jobs using shared memory parallelism

Note that OpenMP parallel programs may need a suitable setting for the environment variable OMP_NUM_THREADS. Default is usually 1 Thread.

For programs (like, e.g., Gaussian), which are multithreaded, however not via OpenMP but via shared memory used by processes or explicit pthread programming, you need to study the program documentation for hints on how to configure a parallel run. Setting OMP_NUM_THREADS will usually not have any effect for these programs.

For MPI programs running via shared memory, setting OMP_NUM_THREADS will also not have any effect.

Names of Job Scripts

Batch scripts must not have a number as first character of their name. E.g., a script of the form 01ismyjob will not be correctly started. Please use one of the characters a-z, A-Z as first character of your job script name.

Batch Scripts in DOS/Unicode format / Unprintable Characters in Scripts

Scripts which have been edited under DOS/Windows may contain line-feeds and carriage-returns; these will not work under SLURM or LoadLeveler. The same applies for scripts which have been written in Unicode (e.g., UTF format) by modern editors. Furthermore, apparent whitespaces, for example in the

#! /bin/sh

specification could lead to problems. Scripts like these will fail to execute and may even block a queue altogether! Please remove such special whitespaces.

Determination of file format and fixing of format problems

  1. Run the file command on the script:      
       file my_script
    

    The result should be something like my_script: Bourne-Again shell script text. If this is not the case, but instead a format like UTF-8 is reported, then please run the iconv command:

      iconv -c -f UTF-8 -t ASCII my_script > fixed_script
    

    (the result is written to standard output, which is redirected in this example - so fixed_script should now be ASCII, while my_script is unchanged).

  2. Edit the script with vi(=vim). In the status line you will see the string [dos]if the file happens to contain carriage returns/linefeeds.

    For conversion from DOS to UNIX format the tool recode may be used:

    recode ibmpc..lat1 my_script
    

    Alternatively, the dos2unix command can also be used:

    dos2unix my_script
    

    These commands perform the necessary changes in-place (i.e., the file is modified).

  3. If none of the above two items help, you can also perform an octal dump of your script. You should see the following

    $ od -c myscript | less
    0000000 # ! / b i n / b a s h \n # $ - o
     etc. etc. ...
    

    If any strange numbers or "\r \n" sequences occur, the format is incorrect and must be fixed (e.g. via multi-lingual emacs editing).

My jobs fail with strange error messages involving I/O (SLURM and LoadLeveler)

The error messages typically are "Can't open output file" and/or "file too large".

The reasons for this may be

  • you have exceeded a file system quota
  • you have exceeded the per-directory limit for the number of files

Please consult the file systems document appropriate for the system you are using: Cluster or SuperMUC. The therapy usually requires removing files and/or restructuring your directory hierarchy.

I've deleted my job, but it is still listed in squeue (SLURM) or llq (LoadLeveler)

With SLURM, you might typically see entries like

  JOBID PARTITION   NAME     USER        STATE              TIME     TIMELIMIT      NODES NODELIST(REASON)
278820 serial             multiflo   abcd12d   CANCELLED     17:42:32 4-08:00:00      1 lx64a295
278824 serial             process   abcd12d   COMPLETING   11:01:43 1-01:00:00      1 lx64a322

when issuing squeue -M serial -u $USER some time after having deleted your job using scancel. With LoadLeveler, the llq output has a different format, but the same problem may surface. The trouble is that the master cannot always distinguish a node crash from a temporary network outage, and so cannot remove the job from its internal tables. It might or might not need to do some janitorial work on the client node!

There is of course a catch here: If you resubmit the job (operating on the same data), there is a chance that processes from the deleted job still running on the client node will overwrite newly generated data. The chance for this is not large, since in most cases we do observe node crashes rather than long-term network outages, but it is not zero. There is no sure way to avoid this but for running the new job on a separate data set.


Common Problems with Scripting Languages (perl, python, R, ruby)

This section covers the scripting facilities for SuperMUC and the Linux Cluster systems

I need additional modules for perl. How can I install them for my user account?

In order to give our users the maximum freedom, we are not installing perl modules systemwide, but the modules have to be installed on a per user basis, unless the modules are of such importance and wide use by our user basis, that it justifies a system wide installation. We are always pleased to help our users in case of any problems.

You can install all the modules that you require easily in the following way:

  • create local perl module directory:
     > mkdir ~/myperl
    
  • start cpan
     > cpan
    
    You will be in the initial configuration dialog. The defaults are ok. Just press enter.
    When the installation dialog asks for the downloadsites set: 11 9 4
    This might take a while (several minutes).
    Then set the following options in cpan:
      cpan> o conf makepl_arg "LIB=~/myperl/lib \
      INSTALLMAN1DIR=~/myperl/man/man1 \
      INSTALLMAN3DIR=~/myperl/man/man3 \
      INSTALLSCRIPT=~/myperl/bin \
      INSTALLBIN=~/myperl/bin"
     cpan> o conf mbuildpl_arg "--lib=~/myperl/lib \
      --installman1dir=~/myperl/man/man1 \
      --installman3dir=~/myperl/man/man3 \
      --installscript=~/myperl/bin \
      --installbin=~/myperl/bin"
     cpan> o conf mbuild_install_arg "--install_path lib=~/myperl"
     cpan> o conf prerequisites_policy automatically
     cpan> o conf commit
     cpan> quit
    
  • Now you can install whatever modules you like in your local directory ~/myperl.

For example, for the bioperl module, type in the following commands while being in the cpan shell:

    cpan>d /bioperl/
	CPAN: Storable loaded ok
	Going to read /home/bosborne/.cpan/Metadata
	Database was generated on Mon, 20 Nov 2006 05:24:36 GMT

	....

	Distribution B/BI/BIRNEY/bioperl-1.2.tar.gz
	Distribution B/BI/BIRNEY/bioperl-1.4.tar.gz
	Distribution C/CJ/CJFIELDS/BioPerl-1.6.0.tar.gz
	Now install:
	 cpan> force install C/CJ/CJFIELDS/BioPerl-1.6.0.tar.gz

Some additional tricks:

in case the download is slow, then edit the file ~/.cpan/CPAN/MyConfig.pm and insert the following line into $CPAN::Config :

'dontload_hash' => { "Net::FTP" => 1, "LWP" =>1 },

in case something goes wrong you can delete ~/.cpan and start over again.

Very helpful is the perl shell which you can easily obtain by installing:

cpan> install Psh
cpan> install IO::String

it is then available under ~/myperl/bin/psh e.g. try it out:

psh% use Bio::Perl;
psh% $seq_object = get_sequence('genbank',"ROA1_HUMAN");
psh% write_sequence(">roa1.fasta",'fasta',$seq_object);

I need additional modules for R. How can I install them for my user account?

You can install R modules in your home directory. R automatically asks to put the libraries in a directory in your home when it cannot write to the system wide installation directory. To install additional modules you start up R and run install.modules:

$ R
R version 2.10.1 (2009-12-14)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> install.packages("XML")

R then asks you for a download site. Use the local (München) download site. In case you want special compilers or want to install the package on the SuperMUC where no direct internetconnection is possible, you have to download the tar.gz file first (e.g. from CRAN) and then install it via the following command:

> install.packages(
    c("XML_0.99-5.tar.gz",
      "../../Interfaces/Perl/RSPerl_0.8-0.tar.gz"),
    repos = NULL,
    configure.args = c(XML = '--with-xml-config=xml-config',
                       RSPerl = "--with-modules='IO Fcntl'"))

Please also have a look at the man pages in R.

I need additional modules for python. How can I install them for my user account?

In order to give our users the maximum freedom, we are not installing perl modules systemwide, but the modules have to be installed on a per user basis, unless the modules are of such importance and wide use by our user basis, that it justifies a system wide installation. We are always pleased to help our users in case of any problems.

The idea behind the home scheme' is that you build and maintain a personal stash of Python modules. This scheme's name is derived from the idea of a home directory on Unix, since it's not unusual for a Unix user to make their home directory have a layout similar to /usr/ or /usr/local/. This scheme can be used by anyone, regardless of the operating system they are installing for.

You can install all the modules that you require easily in the following way:

  • Download the tar.gz file to your home directory, unpack it and cd to the installation directory. You should have a setup.py file in this directory
  • Create a library directoy for your library files
    	$ mkdir ~/mypython
    
  • Install the library files in your own library directory.
    	$ python setup.py install --home=~/mypython
    

While in most cases setting the --home option will do what you want, in some cases you want to install modules in another python interpreter. E.g. consider that many Linux distributions put Python in /usr, rather than the more traditional /usr/local. This is entirely appropriate, since in those cases Python is part of the system' rather than a local add-on. However, if you are installing Python modules from source, you probably want them to go in /usr/local/lib/python2.X rather than /usr/lib/python2.X. This can be done with

	$ /usr/bin/python setup.py install --prefix=/usr/local

The last case is when you want to be totally free to set the options of your installation procedure. You can also override other options for the installation procedure:

	$ python setup.py install --home=~/mypython \
        --install-purelib=~/mypython/lib \
        --install-platlib='lib.$PLAT' \
        --install-scripts=~/mypython/scripts
        --install-data=~/mypython/data

Another way to install packages under python is by using the

easy_install

command. When you want to install packages only for your own user account then please use the following options:

easy_install --install-dir ~/mypython/lib/ --script-dir ~/mypython/scripts

and don't forget to set the python path accordingly:

export PYTHONPATH=~/mypython/lib

Installing your own program packages on supermuc

Sometimes you want to install your own software packages from the internet by using svn or via an installation script that fetches files via http or ftp. This cannot be done easily due to the restrictions of the supermuc firewall on external connections. We propose the following solutions:

Copy all needed installation files to supermuc

The easiest way to install a software package is to copy all needed files into a directory on supermuc, unpack them and run the configure script. Be sure that you have resolved all the dependencies and then you can compile the software and install it in a directory in your home. Most of the time this will include an option like this:

$ ./configure --prefix=/home/<group>/<account>/mydir

$ make

$ make install

This should do the job in most of the cases.

Mount a directory of supermuc on your local machine

Sometimes you need internet access in order to install software. You can mirror the directory on the supermuc to your local machine (when you are running SUSE SLES11 or another compatible linux) and install the software on your local machine.

This can be done by creating a directory on supermuc in your HOME directory e.g. $HOME/mydir and a directory at your local machine $HOME/mydir and then mounting it. You need sshfs in order to do the mapping on your local machine.

$ sshfs <account>@supzero.lrz.de:/home/<group>/<account>/mydir ./mydir

Then install the software on your local machine in the directory $HOME/mydir. It will be automatically mirrored to the supermuc directory and can then be used there.

Create a tunnel for internet access

In case the last method fails you can create a tunnel via ssh from your local machine to supermuc and use a proxy at your local machine to re-direct internet connections from supermuc to the internet.

Get a proxy software for http connections (e.g. Tiny HTTP Proxy from http://www.voidtrance.net/2010/01/simple-python-http-proxy/) and unpack it on your local machine.

Then start the local proxy using:

$ python TinyHTTPProxy.py -p 1234

Then create a tunnel to supermuc via:

$ ssh -l <kennung>-R 1234:<your hostname>:1234 login02.sm-gw.lrz.de

On supermuc you can set a http_proxy to localhost:1234 and have full access to the http protocoll. (e.g. firefox etc)

Remote visualisation

What happens to my data when the reservation ends?

Once a week, Sunday night at 04:00am, all VNC sessions on the remote visualisation servers are killed. Your visualisation application will stop taking any input and it will stop responding. This means that you will lose any un-saved data!!!
If you want to continue working, you have to submit a new reservation, re-start your visualisation application and re-load your last saved data.

I cannot log on to gvs1.cos.lrz.de

The server gvs1 can only be accessed from the Linux-Cluster login nodes (lxlogin2.lrz.de or lxlogin3.lrz.de). Only Linux-Cluster users have access to these login nodes. To get an account to the Linux-Cluster, follow this link.

I cannot log on to rvs1.cos.lrz.de

The server rvs1 can only be accessed from the SuperMUC login nodes (supermuc.lrz.de). To get an account to the SuperMUC, follow this link.

I forgot my VNC password

Don't worry - you can simply set a new password for VNC. Log on to the remote visualisation server and type

vncpasswd

Type a new password and verify it by entering it a second time. You will then be asked

Would you like to enter a view-only password (y/n)?

If you answer yes ("y"), you will be asked for another VNC password. You can give this password to co-workers, so that they can attach their PC to your VNC session, e.g. to discuss new results interactively. Your co-workers will see your desktop, but their inputs will be ignored.

I would like to have an environment for collaborative work - is that possible?

Yes, and its simple, too. When you set a password for TurboVNC (vncpasswd), you are asked if you want to give a view-only password. In that case, there are two passwords you can enter when you connect with the TurboVNC viewer. You should always(!) use a vncpasswd that is completely different from your log-in password. If you do so, you can give any of the two TurboVNC passwords to your colleagues. Give them your regular TurboVNC password, if you want your colleague to be able to move the mouse cursor or make inputs with the keyboard. If you give your colleague the view-only password, he can merely watch. In any case, anyone connected to your session will see the same remote desktop.

Now, if you combine this with an instant messaging client, you have a complete environment for collaborative work readily available!

(Keep in mind that you can always change your TurboVNC password again, so, basically, you can create per-session passwords.)

I would really like to use software xyz on the remote visualisation server - is that possible?

All LRZ users can compile and install their own software applications in their respective $HOME directory. If the software is used by many users, the LRZ may provide a module for it. Please send us an incident through the Service Desk.

I would like to use CUDA

This is currently not possible on the LRZ HPC systems. LRZ does also operate visualization systems with GPUs, but use of CUDA on these is rather limited.

I want to use vglconnect

Only VNC connections via the login nodes are supported by the LRZ. If you want to use vglconnect, you have to establish an ssh-tunnel between your desktop and the remote visualisation server via one of the login nodes.

I'm sorry, but non of these questions describe my problem. How can I get help?

If you have questions or problems that are not answered on this page, please submit an Incident Report through the Service Desk.