Jump to main navigation Jump to main navigation Jump to main content Jump to footer content

Warming up for Blue Lion

Technologie:Supercomputing Forschungsbereich:Future Computing

As part of its collaborative efforts in advance of its Blue Lion system, LRZ and vendor HPE installed a testing and optimization platform to get users and staff alike prepared for a new, powerful architecture.

Blue Lion sends in the young ones: For testing and training purposes, Hewlett Packard Enterprise (HPE) has installed Blue Cubs at the LRZ, a first test installation to prepare for the launch of Blue Lion, the Leibniz Supercomputing Centre’s (LRZ’s) next supercomputer. This will allow experimentation, initial code porting, and the setup of new workloads. The test system initiates the warm-up phase for the next-generation LRZ supercomputer. LRZ’s purchase explicitly includes the deployment of a test system and building out a training phase. The reason for this is the growing complexity of supercomputers, as well as the rapidly increasing costs. Initial investment and operation for Blue Lion will amount to 250 million euros, co-financed by the German Ministry of Research, Technology and Space (BMFTR) as well as the Bavarian Ministry for Science and the Arts (StMWK). 

Since April 2025, the LRZ’s specialists have been meeting regularly with the HPE team: “Three topics are the focus of the preparations: code porting, energy efficiency, and workloads for AI,” explained Gerald Mathias, head of LRZ’s computational support team (CXS). “With the help of the Blue Cubs, the planned system will be adapted for use and made fit for operations. “The test system will allow us to familiarize ourselves with how it works and set up administrative processes,”, said Mathias. 

The test installation Blue Cubs

Blue Cubs – meaning young lions – is the name given to the test installation by the HPC specialists at LRZ: it contains eight Grace Hopper superchips (pictured) from NVIDIA in two nodes, each with four graphics processing units (GPUs). This means that Blue Cubs is very similar to the architecture of the next supercomputer, Blue Lion, which will be equipped with HPE's Cray Supercomputing GX5000 platform and Vera Rubin processors.  

Grace Hopper Superchip. Photo: NVIDIA

During ISC25, NVIDIA had announced Blue Lion will feature Vera Rubin technology alongside HPE's next-gen supercomputing technology. These next-gen NVIDIA chips are specialized for HPC and artificial intelligence (AI) applications, but will not be available until 2026. For this reason, the Blue Cubs are equipped with similar, already available Grace Hopper chips. The test system consists of eight Grace Hopper superchips in two nodes with four GPUs each. “The test system is the best equivalent to Blue Lion to-date and gives at least an idea of what the next supercomputer will bring,” says Utz-Uwe Haus, head of HPE’s Europe, Middle East, and Africa (EMEA) Research Labs, where systems for high-performance computing (HPC) in EMEA are planned. “It is intended to prepare administrators and support staff for what will be technically possible.”

While Intel and AMD’s x86 architecture has dominated processors to-date, chips from ARM and NVIDIAoffer more possibilities: GPUs are being better matched to CPUs, or even integrated into them. Blue Lion is designed to enable the combination of physical-mathematical and statistical models for simulations. Technology and usage are changing programming environments and workloads, and researchers must modify parts of their code so they are more specifically directed to the CPU, GPU, or storage. As a result, many algorithms and applications must be rewritten. To learn about implementation steps and functionalities or to clarify optimization requirements, the HPE and LRZ teams are experimenting with scientific programs on the test system. “It takes time for ported code to run efficiently on an HPC system, and intervention is often necessary,” says Haus. “In addition, GPUs are energy-hungry, so if researchers don’t use these processors efficiently, it substantially raises electricity costs.”

Instead of runtime only, the focus in HPC is now on efficiency and, in turn, hardware control. System administrators use the Blue Cubs to determine how and when they will later reduce a processor’s clock rate, for example, to reduce cooling requirements. The monitoring system DCDB used by the LRZ to control the work of computers and their environment provides guidance in this regard. “With this data, we can better adjust Blue Lion and observe which jobs require more cooling,” said Haus. This experience will also help determine how many nodes later should be reserved for management. With the help of Blue Cubs, the teams are already developing procedures for system security, task allocation, and user identification: “We need to be able to rule out the possibility of Blue Lion being misused, for example for mining cryptocurrencies,” said Haus. 

The findings from the test phase will be incorporated into the future training program. “We can observe how much effort code porting requires and where support will be needed,” explained Mathias. (vs | LRZ)