ALIs

kommt noch

SLURM example job scripts

Introductory remarks

The job scripts for SLURM partitions are provided as templates which you can adapt for your own settings. In particular, you should account for the following points:

  • Some entries are placeholders, which you must replace with correct, user-specific settings. In particular, path specifications and e-Mail addresses must be adapted.

  • For recommendations on how to do large-scale I/O please refer to the description of the file systems available on the cluster. It is recommended to keep executables within your HOME file system, in particular for parallel jobs. The example jobs reflect this, assuming that files are opened with relative path names from within the executed program.

  • In case you have to work with the environment modules package in your batch script, you also have to source the file /etc/profile.d/modules.sh.

Serial and Archivation jobs

Serial and archivation jobs are not supported in the SLURM partitions. Please submit an SGE script from the login node lx64ia3 to perform serial processing, or to run the TSM client in batch mode.

Shared Memory jobs

This job type uses a single shared memory node of the designated SLURM partition. Parallelization can be achieved either via (POSIX) thread programming or directive-based OpenMP programming.

Here are example scripts for starting an OpenMP program:

MPP cluster, ICE

UV, Myrinet cluster

#!/bin/bash
#SBATCH -o /home/hpc/<group>/<user>/myjob.%j.%N.out
#SBATCH -D  /home/hpc/<group>/<user>/mydir
#SBATCH -J <job_name>

#SBATCH --get-user-env
#SBATCH --clusters=mpp1
# in the line above:
#  replace mpp1 by ice1 to use the ICE
#SBATCH --nodes=1-1
#SBATCH --cpus-per-task=8
#SBATCH --mail-type=end
#SBATCH --mail-user=<email_address>@<domain>
#SBATCH --export=NONE
#SBATCH --time=08:00:00

source /etc/profile.d/modules.sh
cd mydir

export OMP_NUM_THREADS=16

# 16 is the maximum reasonable value for MPP and ICE
# (though on the ICE only 8 physical cores exist)


./myprog.exe

#!/bin/bash
#SBATCH -o /home/hpc/<group>/<user>/myjob.%j.%N.out
#SBATCH -D  /home/hpc/<group>/<user>/mydir
#SBATCH -J <job_name>

#SBATCH --get-user-env
#SBATCH --clusters=uv2
# replace uv2 by uv3 to use the other UV partition
# replace uv2 by myri to use a Myrinet node
#SBATCH --nodes=1-1
#SBATCH --cpus-per-task=64
#SBATCH --mail-type=end
#SBATCH --mail-user=<email_address>@<domain>
#SBATCH --export=NONE
#SBATCH --time=08:00:00

source /etc/profile.d/modules.sh
cd mydir

export OMP_NUM_THREADS=64
# up to 960 threads can be configured on uv2
# up to 1120 threads can be configured on uv3
# up to 8 or 32 threads can be configured on the Myrinet nodes (see below)
#  The used value should be consistent
#  with --cpus-per-task above


./myprog.exe

 

For each job, the maximum reasonable value of threads is set inside the script. On the UV or the Myrinet segment, please also specify the value via --cpus-per-task. Furthermore, to select between the 8 and 32 way Myrinet nodes, it may be necessary to specify the partition to be used:

#SBATCH --partition=myri_std start job on an 8-way node of the Myrinet cluster
#SBATCH --partition=myri_large start job on an 32-way node of the Myrinet cluster

 

MPI jobs

For MPI documentation please consult the MPI page on the LRZ web server. The following examples configure a 64 core job.

On the MPP cluster

MPP Infiniband Cluster

MPP Infiniband Cluster: large memory job

(Note: this leaves compute cores unused!)

#!/bin/bash
#SBATCH -o /home/hpc/<group>/<user>/myjob.%j.%N.out
#SBATCH -D /home/hpc/<group>/<user>/mydir
#SBATCH -J <job_name>
#SBATCH --get-user-env
#SBATCH --clusters=mpp1
#SBATCH --ntasks=64
#SBATCH --mail-type=end
#SBATCH --mail-user=<email_address>@<domain>
#SBATCH --export=NONE
#SBATCH --time=08:00:00

source /etc/profile.d/modules.sh


cd $OPT_TMP/mydata
srun_ps $HOME/exedir/myprog.exe

#!/bin/bash
#SBATCH -o /home/hpc/<group>/<user>/myjob.%j.%N.out
#SBATCH -D /home/hpc/<group>/<user>/mydir
#SBATCH -J <job_name>
#SBATCH --get-user-env
#SBATCH --clusters=mpp1
#SBATCH --ntasks=64
#SBATCH --cpus-per-task=2
# only half the cores on each node are used,
# but 1.8 GB per MPI task available
#SBATCH --mail-type=end
#SBATCH --mail-user=<email_address>@<domain>
#SBATCH --export=NONE
#SBATCH --time=08:00:00
 

source /etc/profile.d/modules.sh

 

cd $OPT_TMP/mydata
srun_ps $HOME/exedir/myprog.exe

 

On the Myrinet cluster

Myrinet 10 GE 8-way systems

Myrinet 10 GE 32-way systems

#!/bin/bash
#SBATCH -o /home/hpc/<group>/<user>/myjob.%j.%N.out
#SBATCH -D /home/hpc/<group>/<user>/mydir
#SBATCH -J <job_name>
#SBATCH --get-user-env
#SBATCH --clusters=myri
#SBATCH --partition=myri_std
#SBATCH --ntasks=16
#SBATCH --mail-type=end
#SBATCH --mail-user=<email_address>@<domain>
#SBATCH --export=NONE
#SBATCH --time=72:00:00

source /etc/profile.d/modules.sh


cd $OPT_TMP/mydata
srun_ps $HOME/exedir/myprog.exe

# at most 32 cores can be used

#!/bin/bash
#SBATCH -o /home/hpc/<group>/<user>/myjob.%j.%N.out
#SBATCH -D /home/hpc/<group>/<user>/mydir
#SBATCH -J <job_name>
#SBATCH --get-user-env
#SBATCH --clusters=myri
#SBATCH --partition=myri_large
#SBATCH --ntasks=32
#SBATCH --mail-type=end
#SBATCH --mail-user=<email_address>@<domain>
#SBATCH --export=NONE
#SBATCH --time=36:00:00
 

source /etc/profile.d/modules.sh

 

cd $OPT_TMP/mydata
srun_ps $HOME/exedir/myprog.exe

# at most 32 cores can be used

 

On SGI systems

For the Ultraviolet, SLURM will provide you with a cpuset of the required size to which your parallel program will be confined. On the ICE, a suitable number of 8-way nodes will be exclusively assigned to your job. On the UltraViolet, you need to manually select one of the two systems where your job will run.

SGI ICE

SGI Ultraviolet
#!/bin/bash
#SBATCH -o /home/hpc/<group>/<user>/myjob.%j.%N.out
#SBATCH -D /home/hpc/<group>/<user>/mydir
#SBATCH -J <job_name>
#SBATCH --get-user-env
#SBATCH --clusters=ice1
#SBATCH --ntasks=64
#SBATCH --mail-type=end
#SBATCH --mail-user=<email_address>@<domain>
#SBATCH --export=NONE
#SBATCH --time=08:00:00


source /etc/profile.d/modules.sh


cd $OPT_TMP/mydata
srun_ps $HOME/exedir/myprog.exe

#!/bin/bash
#SBATCH -o /home/hpc/<group>/<user>/myjob.%j.%N.out
#SBATCH -D /home/hpc/<group>/<user>/mydir
#SBATCH -J <job_name>
#SBATCH --get-user-env
#SBATCH --clusters=uv2
# or uv3
#SBATCH --ntasks=64
#SBATCH --mail-type=end
#SBATCH --mail-user=<email_address>@<domain>
#SBATCH --export=NONE
#SBATCH --time=08:00:00

 

source /etc/profile.d/modules.sh

 

cd $OPT_TMP/mydata
srun_ps $HOME/exedir/myprog.exe

Please note:

Please do not use mpirun or mpiexec. Use the LRZ-provided srun_ps command, which is capable of starting up

  • programs compiled with Parastation MPI (mpi.parastation module) on the MPP and Myrinet Clusters
  • programs compiled with Intel MPI (mpi.intel module) on any of the clusters
  • programs compiled with sgi MPT (mpi.mpt module) on the sgi systems

For some software packages, it is also possible to use SLURM's own srun command; this will however not work for programs compiled against Parastation MPI.

It is also possible to use the --nodes keyword in combination with --tasks-per-node (instead of --ntasks) to configure parallel jobs.

If use of hyperthreaded cores is desired on ICE or UV, the --ntasks-per-core=2 setting can be added.

Special job configurations

Hybrid jobs

Programs making joint use of MPI and OpenMP fall into this category. For other parts of the cluster (not all combinations are shown here), some modification may be required.

MPP Infiniband Cluster

sgi ICE

#!/bin/bash
#SBATCH -o /home/hpc/<group>/<user>/myjob.%j.%N.out
#SBATCH -D /home/hpc/<group>/<user>/mydir
#SBATCH -J <job_name>
#SBATCH --get-user-env
#SBATCH --clusters=mpp1
#SBATCH --ntasks=64
#SBATCH --cpus-per-task=8
#SBATCH --mail-type=end
#SBATCH --mail-user=<email_address>@<domain>
#SBATCH --export=NONE
#SBATCH --time=08:00:00
 
source /etc/profile.d/modules.sh

 

cd $OPT_TMP/mydata

 

srun_ps -t 8 $HOME/exedir/myprog.exe
# the above command runs with OMP_NUM_THREADS=8

# and 64 MPI tasks, using 32 nodes

#!/bin/bash
#SBATCH -o /home/hpc/<group>/<user>/myjob.%j.%N.out
#SBATCH -D /home/hpc/<group>/<user>/mydir
#SBATCH -J <job_name>
#SBATCH --get-user-env
#SBATCH --clusters=ice1
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=4
#SBATCH --mail-type=end
#SBATCH --mail-user=<email_address>@<domain>
#SBATCH --export=NONE
#SBATCH --time=08:00:00
 
source /etc/profile.d/modules.sh

 

cd $OPT_TMP/mydata
 

srun_ps -t 4 $HOME/exedir/myprog.exe
# the above command runs with OMP_NUM_THREADS=4

# and 32 MPI tasks, using 16 nodes

 


Job Farming (starting multiple serial jobs on a shared memory system)

Please use this with care! If the serial jobs are imbalanced with respect to run time, this usage pattern can waste CPU resources. At LRZ's discretion, unbalanced jobs may be removed forcibly. The example job script illustrates how to start up multiple serial MATLAB jobs within a shared memory parallel SLURM script. Note that the various subdirectories subdir_1, ..., subdir_8 must exist and contain the needed input data.

Multi-Serial Example

#!/bin/bash
#SBATCH -o /home/hpc/<group>/<user>/myjob.%j.%N.out
#SBATCH -D  /home/hpc/<group>/<user>/mydir
#SBATCH -J <job_name>

#SBATCH --get-user-env
#SBATCH --clusters=myri
#SBATCH --partition=myri_std
#SBATCH --nodes=1-1
#SBATCH --mail-type=end
#SBATCH --mail-user=<email_address>@<domain>
#SBATCH --export=NONE
#SBATCH --time=08:00:00

source /etc/profile.d/modules.sh

module load matlab


# Prevent matlab-internal multithreading
export OMP_NUM_THREADS=1

# Start as many background serial jobs as there are cores available on the node:
for i in $(seq 1 8) ; do
  cd subdir_${i}
  matlab -nodesktop input.m > output.res &
  cd ..
done
wait

# for completion of background tasks