Analysis and Exchange of Big Data in Europe

aktenschrank

Clearly archived: If research data can be searched and is available, nothing stands in the way of data exchange - and erxpensive, time-consuming experiments, surveys or calculations can be saved. Photo: Jan A. Kolar/Unsplash

Measurements, images, statistics, analyses: The data of a study usually answer more questions than the first ones asked, and elaborate, time-consuming experiments are too expensive to repeat. Therefore, the analysis of research data on various supercomputers and the exchange of data is becoming more important in Europe. Researchers should be able to do their computing where the systems fit their requirements and are quickly available. "They often want to analyse and model their data on different supercomputers, as automatically as possible, easily and independently of location," says PhD physicist and head of the Research Data Management team at the Leibniz Supercomputing Centre (LRZ) Stephan Hachinger, outlining the problem. What is part of everyday life for users – the exchange of data – is a highly complex task in High Performance Computing (HPC). Tera- to petabytes are often processed here, which "despite fast networks, cannot be easily moved back and forth between strongly secured, heterogeneous systems", says Hachinger. User-friendly workflows for processing, analysing and exchanging data between locations are therefore required.

Simplifying Complex HPC and Cloud Worflows

Big Data analysis, simulations and workflows on different supercomputers, as well as data exchange between European HPC centres have been at the focus of the EU project  "Large-Scale Execution for Industry and Society", or LEXIS for short (H2020 GA No. 825532). Coordinated by the Czech national supercomputing centre IT4Innovations, 17 institutes, companies and data centres, including the LRZ, have developed workflows and HPC and cloud technology. The result is the LEXIS platform, which builds on existing cloud systems, networks and supercomputers in Europe and organises the data flow. Users from research and industry can find tools at https://portal.lexis.tech that efficiently start and simplify analysis and simulation processes. Data is prepared and made available in the background for the next work step. The portal also offers practical tools for managing data based on EUDAT services. Companies and researchers can process Big Data in the Czech Republic even though it is stored in Germany or Italy and vice versa, and collaborations can work together on data projects across Europe.

The FAIR principles play a major role in data management in the LEXIS system. According to these, research data should be findable, accessible, interoperable and reusable. Standardised metadata therefore indicate what individual data sets contain, how they were created and with which programmes they were created: "Within LEXIS, data is easily findable and it is immediately apparent how and for what purpose it can be further used," explains Hachinger. Although the LEXIS project will end in 2021, the portal and platform will continue to exist: Developed and optimised with partner organisations from the fields of meteorology, geophysics, polar and marine research and aircraft technology, the first companies and research groups are now testing the platform with Portal. One example is CompBioMed, an international project and centre of excellence for computational biomedicine, others are the specialised software manufacturer Pharmacelera from Spain or OpenEngineering from Belgium.

New services for the LRZ?

The LRZ takes positive stock of the work on LEXIS. In addition to simplified processes and tools, several publications were produced. "We spent a long time looking for solutions for workflow control, for example in the processing of weather and climate data," Hachinger also adds. "EUDAT's combination of workflows and data management is exciting because it enables European computing from different locations." LEXIS will be used and further developed in the coming years. At the LRZ, they also hope for further collaborations with LEXIS partners, especially with the IT4I and the Irish supercomputing centre ICHEC. The platform and experience should flow into European HPC projects such as EuroCC or the work with the Open Search Foundation (OSF). And to long term, the LRZ itself could also benefit from the technology and thus expand its services. These possibilities are currently being explored. "This project has also brought us forward as a team," says Hachinger. "We were able to learn a lot about cloud and HPC technologies and project work, and saw how far we can get when the goal is clear and the team spirit is right." (vs)

LRZ-Team

The LEXIS-Team at LRZ: Jirathana Dittrich, Dr. Stephan Hachinger, Dr. Rubén
Garcia Hernandez, Elham Shojaei, Mohamad Hayek (from left). Johannes Munke
is missed on the picture.