ZURUECK HOCH VOR INHALT SUCHEN

» Back to overview
Proposing Institution

Institut für Informatik, Lehrstuhl für Rechnertechnik und Rechnerorganisation ,TUM Garching
Project Manager

Prof.Dr.-Ing. Carsten Trinitis
Boltzmannstr. 3
85748 Garching
Abstract
In the BMBF funded Project ENVELOPE we investigate the possibility ofself-organization in HPC-Systems. Due to steadily increasingheterogeneity and an increasing number of components in HPCenvironments, the usability and fault-vulnerability have risen.Therefore, a way must be found to hide the complexity from theapplication programmer and to increase system reliability on futureExascale HPC-Systems.There are three major goals to be reached in this project: i) To hidethe complexity of heterogeneous HPC-Systems from applicationprogrammers; ii) To enable efficient usage with respect of applicationruntime and energy consumption on a given HPC-Platform; iii) To improvethe system's fault tolerance.In this project, new approaches for system monitoring and system stateidentification using e.g. machine learning will be developed. Using thisinformation, both a proactive a reactive approach for applicationintegrated fault tolerance are to be developed. In addition, anapplication transparent container-based migration method is to bedeveloped. Furthermore, several example applications will be portedusing these approaches and verified on real-world HPC-Systems.To be able to understand future Exascale systems' behaviour, it isnecessary to run our tests on large scale HPC systems using up-to-datetechnology. SuperMUC is the right platform to achieve the project goals,whereas smaller systems cannot give sufficiently detailed insights withregard to future Exascale systems.

Impressum, Conny Wendler