Teramem System for Applications with Extreme memory requirements
The node teramem1 is a single node with 6TB main memory. It is part of the normal Linuxcluster infrastructure at LRZ which means that users can access their $HOME and $PROJECT directories as on every other node in the cluster. However, its mode of operation slightly differs from the remaining cluster nodes which can only be used in batch mode. As the teramem1 is the only system at LRZ, which can currently satisfy memory requirements beyond 1TB in a single node, users can choose between using the system in batch or interactive mode depending on their specific needs. Both options are described below.
Interactive SLURM shell
An interactive SLURM shells can be generated to execute tasks on the new multi-terabyte HP DL580 system "teramem". The following procedure can be used on one of the login nodes of CooLMUC2:
module load salloc_conf/teramem
salloc --cpus-per-task=32 --mem=2000000
The above commands execute the binary "my_shared_memory_program.exe" using 32 threads and up to 2 TBytes of memory (the units are MBytes). Additional tuning and resource settings (e.g. OpenMP environment variables) can be explicitly performed before executing the srun command. Please note that the target system currently (still) uses the NAS-based SCRATCH area (as opposed to the GPFS based area available on CooLMUC2). Please note that the DL580 can also be used by script-driven jobs (see the examples document linked below).
Batch SLURM script
Shared memory job on HP DL 580 "teramem1"
(using 32 logical cores. Note that this system is targeted not for best performance, but for high memory usage)
#SBATCH -o /home/hpc/.../.../myjob.%j.%N.out
#SBATCH -D /home/hpc/.../.../mydir
#SBATCH -J Jobname