"The hardware is becoming more specialised, the architecture of a supercomputer more complex”

platine

Computer board: The architecture of high-performance computers is becoming more diverse
and complex thanks to new processors and accelerators. Photo: Adobe

At the Leibniz Supercomputing Centre (LRZ), SuperMUC-NG Phase 2 is currently in operation and the next high-performance computing (HPC) system is already being planned. Graphics processing units (GPUs) and other accelerators are now joining central processing units (CPUs), and the first quantum systems are being tested. The LRZ has been running the BEAST testbed for some time now to evaluate new technologies, assess their usefulness for HPC, and answer questions about energy consumption and power requirements. It can also answer questions about how different processors and components work together. It's time to ask Dr Josef Weidendorfer, who heads the Future Computing team at the LRZ, Dr Michael Ott, who is using BEAST to explore the possibilities of energy efficiency in supercomputing, and Dr Juan Durillo Barrinuevo, who is investigating the possibilities of GPUs that not only process data faster, but also bring artificial intelligence (AI) to supercomputing.

HPC technology is becoming more complex, more diverse and more versatile: What technological trends do you currently see in supercomputing? Dr. Josef Weidendorfer: Moore's Law will continue pobably for a few more years. We could put more transistors on a chip in the same amount of space, but this would increase power consumption and therefore require even more heat to be dissipated during computing. So manufacturers are trying to ensure that putting more transistors on the chip doesn't lead to significantly more heat being generated, for example by making only one part active per task. They are also working to squeeze more computing power out of the transistors. This works by tailoring the hardware to specific tasks. As a result, the hardware becomes more specialised and diverse, and the architecture of a supercomputer becomes more complex. An example of specialisation is graphics processing units (GPUs), which were originally developed for the gaming market and the requirements of image processing, which have been adapted for the high performance computing (HPC) community and which are also suitable for artificial intelligence (AI) methods.
Dr. Juan Durillo Barrionuevo: There is also a need for differentiation from an AI perspective, especially for deep neural networks or large language models. With its CS-2 system, Cerebras has delivered a dedicated chip for AI workloads, and Graphcore and Sambanova are also moving in this direction. The reason for this is the increasing need for computing power.
Dr. Michael Ott: Power density continues to increase. The trend towards GPU-based systems means that much more computing power can be packed into a single rack. This increases the power requirements per rack and therefore the cooling requirements. From the view of energy effiency the good news is that the heat generated can no longer be cooled with air, so hot water cooling, which is much more energy efficient, will continue to gain ground. Unfortunately, there is also a trend towards lower water temperatures, which has a negative impact on energy efficiency and makes it much more difficult to reuse waste heat. However, hardware specialisation also opens up new opportunities for more energy-efficient HPC systems. However, the responsibility shifts from the data centre operator to the users, as they need to modify their applications to work with the new hardware. Data centres can help them to do this, for example by providing the necessary tools or by using system software that supports them.

More processors equals more performance and compute power: However, this equation no longer works because of power requirements. How do you achieve more performance today? Weidendorfer: The number of processors can certainly continue to grow - if new, specialised processors work in an energy-efficient way. But there is another problem: in order to solve computing tasks, data has to flow through HPC systems. The more processors involved, the longer the transport routes, which also require a lot of energy. In this sense, the number of processors is limited by an optimum, after which each additional chip degrades performance. As Michael already explained, one solution is to develop algorithms that require less data transfer, for example by carrying out calculations redundantly.
Durillo Barrinuevo: More efficient algorithms lead to the same solution with less effort. In the context of ML, this would mean saving computational effort and energy through fewer iterations during training. Therefor I guess, two methods are exciting, for which you can find initial research work: In short, the Hessian matrix could be used to reduce the number of necessary training points, and with the help of distillation, large training models can be divided up for specific tasks.

What technologies are the first exascale computers based on? Durillo Barrinuevo: According to the current Top500 list, exascale computers use GPUs from a variety of providers.
Weidendorfer: The Japanese supercomputer Fugaku, which is capable of slightly less than one exaflop, is interesting: It was developed in a co-design process and is intended to fulfil the requirements of HPC codes well without having to rely on GPU. Programmers don't have to rewrite their codes for GPUs, but they still have to invest time in adapting their codes to the hardware.

Would it be possible to have a supercomputer made up entirely of GPUs? Durillo Barrinuevo: No, that wouldn’t work, central processing units are responsible for operating the system, for managing data movements, for access to hard drives and other tasks. Although GPUs are becoming more and more powerful in terms of direct access to memory and networking, CPUs are still needed in a (super)computer. What the ratio of GPUs to CPUs will be is another matter. It is possible that CPUs will become simpler and more similar to GPU chips in the future.

BEAST is the LRZ’s testing ground – which new technologies are currently being tested there? Weidendorfer: We look at different architectures with CPU and GPU and test their suitability for benchmarks that meet the requirements of our users’ codes. On the CPU side, for example, we are working with the latest x86 architectures from Intel and AMD, as well as chips from ARM, Marvell and Fujitsu. BEAST also features Nvidia’s new A100 GPUs, and AMD’s MI-210, Intel’s new Ponte Vecchio GPUs are available in Phase 2 of the SuperMUC-NG system.

beastk

BEAST: The LRZ test environment is used to explore new technologies
and their energy efficiency.

Do they test the interaction between CPUs and GPUs? What has been learnt from that? Weidendorfer: The interaction between CPUs and GPUs is rather difficult with current technologies because there are bottlenecks in data transfer. This is done via a connection called PCIe4/5, which is significantly slower than accessing memory that is directly available to the CPU or GPU. The data should currently be left on the chip that does the calculations. Manufacturers are already working on better integration. The impact of the slowed data flow in HPC depends on the application. Testing this more intensively is part of the BEAST internship for students at Munich universities. They first optimise code for use on the CPU so that, if possible, it can also run on the GPU and on both processor types. Even the task of using multiple GPUs in a compute node can be difficult.
Ott: An interesting question is what is the best ratio of CPU to GPU in a computing node. The idea is to pack as much processing power into a node as possible, as this makes the system more energy efficient. But this is only true if the GPU can actually be used efficiently. If the CPU cannot deliver the data fast enough, the GPU will waste energy waiting.

GPUs accelerate supercomputing, expand the possibilities of data analysis and simulation: is the collaboration working well? Durillo Barrinuevo: Yes - it has been shown that simulations can be improved and accelerated through data analysis. AI can also help to add content and details that cannot yet be solved with classical simulation calculations. This also improves the quality of the simulation. We are looking forward to new opportunities and research projects that combine AI and classical simulation.

SuperMUC-NG, Phase 2 includes both CPU and GPU - what does this architecture mean in terms of energy and for the users? Durillo Barrinuevo: SuperMUC-NG Phase 2 is the first LRZ supercomputer suitable for both classical modelling and simulation as well as AI workloads. We are excited to see what the machine can do in operation.
Ott: GPUs are much more energy efficient. This is also reflected in the Top500 and related Green500 lists, which focus on energy efficiency: Almost all of the well-placed systems get their processing power from GPUs. For the moment we will no longer see a pure CPU-based system at the top of the Top500.
Weidendorfer: GPUs show their potential when you can break jobs down into many, many small, preferably recurring tasks, so that all parallel computations are handled in the same way and there are no data jams. Where possible, users should therefore rewrite code to use algorithms with these properties. Based on our initial observations, this may be difficult for some applications, but this is exactly where we want to support users and learn with them.

What do users need to be aware of? What are the challenges? Weidendorfer: Unfortunately, the trend towards using GPUs in HPC for better energy efficiency has a downside: code has to be rewritten, and there are also different interfaces from the GPU vendors. Portable code that runs on all architectures is urgently needed; from the LRZ's point of view, this is currently the biggest challenge. There are approaches to standardised interfaces, but vendors often undermine them to maintain competitive advantages. As a lowest common denominator, OpenMP offloading can be used for GPU programming, but this often does not allow the full performance of the hardware to be exploited. An alternative is to use abstract interfaces such as Kokkos or Raja, which internally map the interfaces of different GPU manufacturers. At the LRZ we are currently developing our own tool that can automatically translate between GPU interfaces. There is a lot to do.

Do GPUs really help reduce the energy consumption of supercomputers? After all, training with data requires a lot of power. Durillo Barrinuevo: This is a tricky question and I think the arguments are not well connected. Are GPUs machines that reduce energy consumption. The answer is yes, the performance per watt is much higher with respect to the CPU. ML requires a lot of power, yes. But if you were to train ML with CPUs, you would need much more power. The problem is that the current way of training ML methods relies on gradient descent algorithms that require many iterations to converge, but this is not a GPU-specific problem. Let me put it this way: I may have a car that produces for every 100km less emissions than your car. But if I do 500km for every km you do, I pollute more than you do. The problem would be if your car was the only way to do the 500 kms.
Ott: Actually, GPUs do not help to reduce the energy requirements of supercomputers, but only to slow down their growth. More computing power is still needed, which is why classic rebound effects occur: Efficiency increases, but the increased computing power is immediately absorbed by the users. If a new supercomputer has 10 times the computing power of its predecessor, that doesn't mean that 9 older computers can now be switched off. Instead, the new one is used to solve significantly more computationally intensive problems.

Quantum processors could become another accelerator for supercomputers – are you already testing quantum technologies in BEAST? Which ones specifically? What are you testing? What are your first experiences? Weidendorfer: Quantum computers, which are currently under development, could in the future help users solve problems that cannot be solved with current computer technology. And they could help speed up supercomputing. Hopes are high. We are already preparing for deployment and exploring the first systems. Colleagues are working on how to integrate quantum computers into existing computing infrastructures and supercomputers. This requires an understanding of the physical constraints and requirements, as well as the functionality of a software stack. However, this is still a plan for the future, and the necessary exploration of potential components and interfaces is different from the testing of current technologies in BEAST, which aims to see if they can be used for the next HPC system.
Ott: Like GPUs, quantum processors have the potential to solve certain problems more efficiently and thus reduce energy consumption. However, they are likely to help solve only a subset of the classical problems addressed by high-performance scientific computing. For the rest, we will continue to rely on classical computing architectures implemented with CPUs and GPUs. (Interview: vs/ssc)