Bringing AI and classical simulation closer together

Phase 2 of the SuperMUC-NG brings supercomputing and artificial intelligence even closer together at the LRZ: The graphics processing units (GPUs) integrated into the new system favour artificial intelligence methods that enrich, accelerate or supplement conventional simulations. Initial experiences have already been gained in the natural and environmental sciences.

Simulation

Research and simulation: Artificial intelligence expands the set of methods in science. Photo: Adobe


Today, clear, high-resolution images do not always come from cameras, microscopes and other recording devices. They can be the result of simple data processing. This is illustrated by the World Settlement Footprint (WSF), a series of global datasets by the German Aerospace Centre (DLR) produced at 10/30m spatial resolution from radar and multispectral satellite imagery. These are calculated from freely available satellite imagery and show how human settlements around the world have changed since 1985, and how they continue to evolve. Today, however, advanced artificial intelligence (AI) methods can be used to further improve the resolution to less than 10 metres on a large scale by employing only a limited amount of reference very high resolution samples at 1m resolution or finer accessible from commercial providers. “To further enrich and improve the WSF, super-resolution will be mega-interesting, but it is still difficult to achieve" explains DLR's Dr Mattia Marconcini. Indeed, sharper, more detailed images would probably make it possible to distinguish between commercial and residential buildings, and perhaps even trace the development of individual neighbourhoods. This is still a dream for the future, but AI will one day be able to effectively support this task.

WSF

The World Settlement Footprint shows the development of towns and settlements since 1985. Here
you see the different phases of Shanghai. Photo: DLR


GPUs give new impetus to simulation

Traditional simulation methods and AI models are converging. In both research and engineering: SuperMUC Phase 2 (SNG-2), which will soon be operational, is equipped with 960 Graphics Processing Units (GPUs) from Intel (Ponte Vecchio). GPUs specialise in processing large amounts of image and graphics data, accelerating High Performance Computing (HPC) calculations and are also suitable for AI processes: "HPC and AI are driven by accelerators, especially GPUs," said Dieter Kranzlmüller, director of the Leibniz Supercomputing Centre (LRZ) and professor of computer science at LMU. "We are adding AI capabilities to SuperMUC-NG Phase 2 to enable new insights and discoveries. The new knowledge lies in the interaction between simulations and AI". Until now, classical simulations have often been performed on supercomputers and the results evaluated on dedicated AI clusters in order to optimise the modelling with these results. Classical modelling and AI are now possible on the same computer, but new processes and workloads are needed to enable both methods to be used smoothly, which the LRZ will now develop together with researchers at SNG-2.

SuperMUC-NG-2

SuperMUC-NG, Phase 2: The system integrates 960 GPU. Photo: LRZ

"The combination of physics-based simulations and AI is currently one of the hottest topics at congresses and conferences," says Prof. Ralf Ludwig from the Department of Geography and Complex Environmental Systems at the Ludwig-Maximilians-Universität in Munich (LMU). Projects using AI methods are not only of great interest in the environmental and life sciences. All disciplines are already experimenting with AI methods to analyse data and integrate them into existing modelling processes. "The environmental sciences have enormous data streams, for example from remote sensing," says Ludwig, citing a reason that applies to more than just his discipline. "Scientific working groups are now reaching their limits in terms of analysability." Of particular interest in the combination of methods are surrogate, substitute or meta-models and emulators, i.e. approximations of mathematically and physically calculated models that are created using methods such as pattern recognition and machine learning and that process measured values and other digital information from reality. The surrogate model can

  • be trained with the values and results of classical simulations and then help to vary their parameters and calculate more scenarios in less time,
  • replace computationally intensive parts of a simulation,
  • supplement the classic simulation with data on scientific phenomena that are difficult to calculate using formulae and equations or - if at all - can only be calculated approximately and with a great deal of effort.

Surrogate model and other AI methods

"The surrogate model is a substitute for a physical or mathematical model, although the input of training data does not necessarily have to come from a simulation," explains mathematician Dr Felix Dietrich from the Department of Scientific Computing at the Technical University of Munich (TUM). As head of the Emmy Noether Research Group, he is investigating data-driven surrogate models, specifying their structure and training them with data.  "Surrogate models are not developed to advance physics research, but to calculate faster and more accurately; they are a method to use models with many more parameters." Scepticism about their use in research remains high. AI produces results that are consistent but often incomprehensible. AI models are also seen as a kind of black box, into which Dietrich and his team are trying to shed some light.

KI

The working methods of AI models and procedures need to be researched so that the results can be
are predictable, reliable and, above all, free from bias. Photo Adobe

"Because of the efficiency of AI-based models, there is a danger that researchers will throw physically based simulations overboard more quickly," says environmental scientist Ludwig. In this respect, the term "surrogate model" leads to misunderstandings; these models cannot and should not replace the physical-mathematical calculation of natural phenomena or systems: "The surrogate model becomes an end in itself if the modelling is lost in the process," clarifies model researcher Dietrich. "As a result, its main purpose, i.e. understanding, is lost. The goal of science is not to build models, but to understand the world, and it is of no use for knowledge if the machine or computer has learned something. However, it is expected that surrogate models and other AI methods will not replace classical simulation, but on the contrary enrich, extend, differentiate and even refine it. Hybrid forms of modelling will emerge, and with them other methods. The following scenarios are already being explored:

Scenario 1: Analysing and differentiating simulation results with AI

Classical simulations are based on natural laws that can be formulated mathematically, usually using differential equations. The validity of the results is proven by observations in nature and reality or by experiments. Simulations are precise, but also require a lot of time and computing power to analyse the results. For the German-Canadian project "Climate Change and Extreme Events" (ClimEx) 2018, Ludwig's research group simulated climate development in Bavaria over a period of 150 years. For each year, 50 scenarios were calculated from climate and water management data to assess the consequences of extreme weather events and derive forecasts for the future. To calculate the 7500 model years, the LRZ supercomputer needed a total of almost 90 million core hours, resulting in a data set of more than 500 terabytes. By way of comparison, a terabyte is roughly equivalent to the capacity of almost 1,500 CD-ROMs.

Researchers can still gain new insights from this wealth of data, which the ClimEx team has also analysed using AI methods: "For example, we used pattern recognition to classify patterns in air pressure constellations that could favour extreme weather, such as heavy rain or drought," says Ludwig. This in turn helped to narrow down the parameters of the simulations and make the forecasts more accurate. The second phase of the ClimEx project is now underway, and the models will be expanded to include land-use change criteria to explain the occurrence of drought. "Using AI methods, we are now identifying spatial patterns that we can use to calculate the likelihood of droughts and heat waves occurring and derive predictions from them," says Ludwig. "We have a good physical understanding of the underlying processes and can use AI to capture their spatial and temporal patterns in a statistically robust and probabilistic way, or to uncover new patterns."

Climex

Climex I and II: The projects track climate change, show the consequences of heavy rainfall
in Bavaria and Ontario and are now also investigating the causes of droughts. Photo: ClimEx

Scenario 2: Obtaining training data for AI from simulations

AI not only supports evaluation, it can also be trained with the output data from classical simulations. In this way, surrogate models are created: "Simulations are precise, but also extremely expensive and slow," explains Prof. Volker Springel, Director of the Max Planck Institute for Astrophysics. "AI learns, reproduces and generalises what it knows from training data". And with these skills, it should now be able to enrich physical or mathematical simulations: "AI methods can get the maximum out of the data, that alone is an important help," says Springel. "Another lies in what is known as inference, i.e. how to filter out the fundamental variables of interest from the observational data". The astrophysicist cites the density of the universe, the Hubble constant and other cosmological parameters as examples. Together with an international working group, the astrophysicist has just presented MillenniumTNG, a highly acclaimed series of large-scale simulations of the universe. It covers a region of ten billion light years, calculating the formation of galaxies and stars, as well as processes such as supernova explosions and the growth of black holes. Massive neutrinos have also been included for the first time. The data sets contain three petabytes, or 3000 terabytes, of information.

Millennium

Millennium TNG: The picture shows images from the simulation.
zoom levels illustrate the size of the simulation from 2400 million
million light years (top centre) to a single galaxy (bottom right).
(bottom right). Photo: MPG


They will keep astrophysicists – artificial intelligence – busy for a long time to come. In addition to pattern recognition, Springel and his colleagues rely on training with data for evaluation: "AI systems can speed up predictions enormously by training so-called emulators on the basis of individual points that have been precisely simulated and interpolated between them." These emulators, in turn, replicate the functionality of the simulation by replacing calculations with observations and measurements, thereby enabling particularly fast conclusions or helping to confirm hypotheses. They thus replace some of the computationally intensive modelling: "A methodological breakthrough," says Springel. This speeds up computer simulations and increases the scientific value of individual models, as initial work has already shown.

Scenario 3: Varying simulations with surrogate models

While astrophysics leaves the computationally intensive parts of large-scale simulations to AI systems, other disciplines are developing specific AI models that are trained on simulation data to generate variants with less effort and time. "AI helps us model multiple scenarios more quickly. We can use it to derive further scenarios from existing WSF simulations in a short time and with modified parameters," says DLR researcher Mattia Marconcini. "This makes the results more reliable. If the modelling produces comparable results despite changed attributes, this supports the plausibility of the original simulation.

But that's not all: to date, changes in settlement areas have been documented on an annual basis, and the WSF has been re-modelled every two years or so, with fundamental improvements to the simulation. As an open source tool, the WSF also supports other projects.However, the DLR team wants to use surrogate models to represent short-term changes in settlements and megacities, and to recalculate the WSF every six months or even every quarter using updated satellite data. "This would not be possible without AI," says Marconcini. With the help of surrogate models, new data can be integrated and calculated more quickly. As a result, not only can the WSF be produced at shorter intervals, but its content can also be expanded to include additional information on the development of cities. For example, the WSF 3D can already be used to estimate the height of buildings. Although AI enables the representation of short-term development steps, the AI system trained with data cannot optimise the basic model.

WSF3D

The World Settlement Footprint exist now in a 3D version. AI supplements datas and enables the
estimate of the building's highs. Photo: DLR

Scenario 4: Complementing and refining classical computations with AI models

Finally, there are still phenomena and systems to be discovered in nature that cannot be adequately described by classical computational methods. Environmental researcher Ralf Ludwig points to coupled systems, in which natural processes are influenced by human actions without clear rules being apparent. Examples include water temperature and quality near industrial sites and dams, or temperature differences between agricultural and forest soils. Standard simulations cannot take such aspects into account; they can only be calculated with a large number of measurements and data. "In these cases, it is more efficient to supplement the process-based simulation with AI and to develop correlations from observations and measurements.

With the knowledge gained from simulations and with the help of measurements, weather and climate data, surrogate models are trained to fill gaps, extend model calculations or refine them with greater accuracy. In this context, Ludwig speaks of an artificial intelligence that is "explainable" or "physically aware" and whose results can be compared with classical methods. It is a process that could also improve the understanding of interrelationships in chemistry and physics, for example in molecular science and particle physics: "In principle, the model that I develop from huge amounts of data is nothing more than a summary of data, and I can use this as a replacement model in a simulation," says Dietrich, a numerician. "AI is not nebulous, its algorithms work with data across a system. That is the difference with mathematical models: If we collect more data, nothing changes in a mathematical model, but more training could change the AI model." This in turn can lead to uncertainty and therefore needs to be explained. But such models complement simulation by describing phenomena for which there is only empirical data.

New workloads for HPC and simulation

Surrogate models and AI methods are replacing computing power, augmenting physical-mathematical simulations with observational data, and ensuring greater accuracy in classical modelling. Where their use makes sense and how AI models should be built is now the subject of intensive research. Ludwig favours the coexistence of different methods. "Where we need a good understanding of physics and plausibility, we will use process-based approaches and simulations, and in areas where we spend a lot of time calculating or do not yet know the data well enough, we will incorporate approaches from AI in order to generate an overall robust model result." This coexistence will change research teams in the medium term: Where previously the specialist disciplines of numerics/mathematics and computer science were involved in building models, implementing them on HPC systems and running them, now the expertise required by data scientists will be added. It is no longer the computer technology or the necessary formulas and data sets, but rather the mutual understanding that becomes the biggest challenge in simulation. At the LMU Department of Physical Geography and Complex Environmental Systems, things are already in motion: They are looking for science communication specialists who can translate technical language and mediate between different fields and requirements. "AI," says Ludwig, "should not remain a black box for researchers. If we cannot look at the models in detail ourselves, then we need to work with reliable, competent partners who know exactly how to operate these systems. (S. Vieser/S. Schlechtweg/LRZ)