Computer components in the endurance test
This strategy has worked: in 2020, the Leibniz Supercomputing Centre (LRZ) set up a test environment to explore the latest computer and IT technology, the "Bavarian Energy, Architecture, and Software Testbed", or BEAST for short. Since then, newest components such as processors, accelerators and storage solutions from a wide range of manufacturers have been put through their paces and have been evaluated not only by LRZ experts, but also in collaboration with research teams and students from Ludwig-Maximilians-Universität (LMU) and the Technical University of Munich (TUM). Both universities offer computer science students an internship in which they evaluate BEAST components and solve tasks with BEAST hardware. The test environment is regularly expanded, recently for example with hardware to support DAOS, a memory solution from Intel. This technology promises advantages especially for artificial intelligence (AI) applications. "Through the BEAST programme, the LRZ can validate manufacturers' statements," says Josef Weidendorfer, who holds a doctorate in computer science. In an interview and in a presentation during the supercomputing conference ISC2022 in Hamburg, the head of the LRZ's Future Computing programme explains how the data centre benefits from BEAST. In the meantime, BEAST's operating model is also making itself useful in research work on integrating quantum processors into HPC or in testing efficient hardware accelerators.
What has changed at BEAST recently, how has the test environment been expanded? Dr. Josef Weidendorfer: "In the last six months, for example, BEAST has been expanded to include systems from Intel, more precisely two nodes for evaluating the storage solution DAOS or Distributed Assynchronous Object Storage, which will soon be used in phase 2 of SuperMUC-NG. It is essentially based on the use of so-called Non-Volatile Random Access Memory or NVRAM. This is memory that can be used like regular main memory with fine-grained access, but holds the data even when no current is flowing and consequently requires little energy during operation. DAOS is supposed to improve access to data for AI applications and data analytics in particular. We are currently evaluating this. A 4-socket Intel Cooper Lake system has also arrived in BEAST. This is a relatively large computing node that delivers a lot of performance and whose processors support the floating-point format Brain Floating Point with 16 bits, which in turn can accelerate AI processes. We are currently using it to test which applications particularly benefit from this node size and functionality. Last but not least, two Ice Lake systems expand our test field, with which we evaluate the latest Intel Xeon CPUS. We are using one of these systems in the BEAST Lab as an incentive for students. Hopefully soon we will also have Graphics Processing Units and GPU from Nvidia and AMD available.
BEAST is not part of the LRZ's regular services, but students in Munich can get to know the latest technology within the BEAST Lab. Weidendorfer: Yes, part of the BEAST programme is an internship for students from LMU and TUM, the BEAST Lab. Here the LRZ works particularly closely with the universities so that the internship is recognised as part of their computer science degree. Students can count on intensive supervision. Through the BEAST Lab, the LRZ has been able to initiate a number of research projects on innovative computer technologies. Among other things, the practical course deals with GPU programming and code relocation with the help of the OpenMP interface and its Target Off-Loading function. In the course of the semester, the students developed a lot of code. This was not planned, but we can use it to put together a test suite for this programming model. In collaboration with Intel and in preparation for SuperMUC-NG Phase 2, we have already been able to use this to improve the quality and coverage of the OpenMP standard by the Intel compiler. Since the start of BEAST, we have organised four BEAST Labs, and currently more than 20 Master's students are participating. In total, around 30 Bachelor's and 40 Master's students completed the labs. This has resulted in two Bachelor's and one Master's thesis, and one graduate is continuing to work with BEAST for his doctorate.
How does BEAST support the work and planning on the HPC resources of the LRZ? Weidendorfer: Through the BEAST programme, the LRZ can validate the statements made by manufacturers themselves. For example, in preparation for phase 2 of SuperMUC-NG, we were able to use the DAOS test nodes to analyse new storage solutions and their advantages for AI processes. Of course, the experiences and findings with BEAST technologies also help in the design of successor systems.
The LRZ will integrate quantum processors into its HPC systems and also offers researchers support in AI methods - will BEAST also be experimented with for this? Weidendorfer: BEAST has become well established at the LRZ, so we are planning to evaluate systems in BEAST soon that are specifically intended for optimising AI applications. The "Quantum-Computing Integration Cluster" or QICC for short will also be operated as part of the BEAST environment in a few months, behind the BEAST gateway as an isolation layer. This is how we ensure that such research systems and work have no impact on the LRZ's service infrastructure.
Current technology in BEAST
- Systems based on x86-CPUs by Intel (Cascade Lake, CooperLake, IceLake) with NVRAM(Optane) as well as AMD (Rome) with MD GPUs (MI-100)
- Systems based on ARM-CPUs by Marvell (ThunderX2) with Nvidia-GPUs (V-100) and by Fujitsu (A64FX), coming in an HPE CS500-System
Dr. Josef Weidendorfer, head of Future Computing at LRZ