Improving Research through Data Management

TRR365 PlantMicrobe is a research consortium consisting of 20 research groups focused on studying the interactions between plants and microbes. Among the researchers’ key tools are the VERDA platform and practical solutions for managing research data in accordance with international FAIR principles.

Nature observations, photos, experimental measurements, and genome sequences: The Transregional Collaborative Research Center 365 “Genetic Diversity Shaping Biotic Interactions of Plants” or TRR365 PlantMicrobe for short, investigates the reciprocal relationships between microbes and plants. To ensure that the information generated during the research is traceable, accessible, and reusable for future projects, research data management is an integral part of the project’s work packages. To support this, the Leibniz Supercomputing Centre (LRZ), in collaboration with the University of Tübingen’s computing centre, is developing the Virtual Environment for Research Data and Analysis (VERDA)—a platform equipped with tools for data analysis, processing, management, and documentation. Biologist Dr. Dagmar Hann from Ludwig-Maximilians-Universität München (LMU) leads public outreach efforts within TRR365, together with Professor Dr. Gudrun Kadereit from the LMU Department of Biology. Among other activities, Dr. Hann produces the podcast “Plants and Their Microcosm”. For Episode 4, she spoke with Dr. Stephan Hachinger and Dr. Alex Wellmann from LRZ about research data management and the IT systems required to support it. This interview was compiled from that conversation.

Before we dive into your work, a personal question: Stephan, you hold a PhD in astrophysics but are now involved in a biology project. How did that come about?
Dr. Stephan Hachinger: Until 2014, I did a lot of computational work for astrophysics simulations. That’s how I got into high-performance computing, which eventually led me—through my position at the LRZ—into research data management. In this field, the scientific disciplines are actually quite comparable. It’s mostly natural sciences where data and findings need to be described, made accessible, and shared. That’s also what we’re doing in the TRR365 project. The fact that I was—and still am—a scientist helps me understand researchers and biologists, even if I’m not deeply familiar with their specific topics.

What’s your background, Alexander?
Dr. Alexander Wellmann: I earned my PhD in chemistry. During my doctoral work, I did a lot of programming and developed a strong interest in IT. I knew I wanted to move in that direction. Since the LRZ combines research with programming and IT, I joined as a staff member. Chemistry is, of course, a bit closer to biology, but thanks to my lab experience, I can transfer data management concepts to other disciplines as well.

Plants and their Microcosmos – the Podcast

In the fourth episode of her podcast, biologist Dagmar Hann discussed the advantages of research data management according to the international FAIR rules with Stephan Hachinger and Alexander Wellmann from the LRZ, as well as how technical platforms and tools support these tasks. (The podcast is in German)

zum Podcast

Dagmar Hann mit Stephan Hachinger (li) und Alexander Wellmann. © TRR365

What role does the LRZ play in science in Bavaria, and how is data coordinated across different research locations?
Hachinger: First and foremost, we’re an IT provider for the universities in Munich, offering basic services like email, networks, internet access, storage, and file sharing to students and researchers. We’re also one of the largest academic computing centres in Europe. Some of our services are provided in cooperation with universities and other computing centres throughout Bavaria. This is partly because we operate one of Germany’s three national supercomputers for research. In addition, the LRZ is involved in evaluating new technologies across many projects—this includes exploring new computing architectures and quantum computing. Artificial intelligence is another major area for us. For the transregional project TRR365, we’re working closely with the University of Tübingen’s computing centre—a collaboration that we find very rewarding.

Is AI an opportunity or a risk?
Wellmann: I try to stay open-minded and tend to see it more as an opportunity. However, the current hype around AI often leads to overestimations of what models and tools can actually do. Many things appear more capable than they really are. AI is definitely a very useful tool, and it’s here to stay—so we should focus on how best to integrate it into our work.

In the research consortium, you're responsible for VERDA.
Hachinger: VERDA is the IT platform for TRR365. More specifically, it’s a web-based platform where all participants can store, share, and prepare their research data for publication. The platform also includes collaborative services such as the Matrix chat system with the Element client. A key feature of VERDA is that it provides access to computing infrastructure, allowing researchers to process and analyse the data they’ve stored there. The Technical University of Munich (TUM) supports the project with a data steward, who guides researchers on how to store and publish their data according to the FAIR principles. FAIR stands for Findable, Accessible, Interoperable, and Reusable. In practice, this means datasets are enriched with standardised metadata, making them easier to access and reuse.

FAIR sounds great in theory. But thinking about my daily work, I see several challenges: when I'm analysing data in the lab, I often don't have my laptop with me. That means I have to store my data in a FAIR way later. Are there tools to make this easier?
Hachinger: At the moment, FAIR data storage is still mostly a manual process, which is why we, along with our colleagues at TUM, provide support for it. It’s not just about extra work—no one wants to push their research data into the public before it’s been properly analysed and published in a paper. FAIR is primarily about preparation. I describe the data and assign it unique identifiers—usually a number. That way, I can publish the data when I'm ready.

What kind of data is generated in TRR365?
Wellmann: Fundamentally, it includes data generated by researchers, as well as metadata that describes this information. A concrete example from the TRR365 project would be genome sequencing data and lab results, as well as lab notebooks that already contain basic metadata—like descriptions of experiments and their conditions. In the long term, metadata will also include publication-related information, such as diagrams, tables, and other elements derived from measurements and data analysis.

What special requirements does the TRR365 network place on data management?
Hachinger: Above all, huge data sets - genetic sequencing in particular generates an enormous amount of data. And then the researchers want to access VERDA and the data from their favourite computers. The biology department at Ludwig-Maximilians-Universität (LMU) in Munich, for example, has its own computing cluster with which the VERDA concept must be compatible. There are other such examples for sequencing tasks and for AI models and programs. In terms of administration, VERDA should also meet the standard requirements for creating metadata. This is all quite complex.

In the short term, I’ll probably have to invest some time to get familiar with VERDA. But what tools does VERDA offer, and how will the platform make my work easier in the long run?
Wellmann: VERDA includes, for example, the Matrix Element chat—a secure, internal communication tool for the project. This means chats and files don’t end up on WhatsApp servers. For data management, we use the GitLab platform. Like GitHub, it allows you to store, share, collaboratively edit, and version code—but not just code. You can also upload and organise lab and other biological information. Every version is saved; nothing can be overwritten. This is crucial for documenting who changed what, when, and how in a dataset.

I’ve used GitHub before. I liked how you can build a kind of tree for your project and create branches for side projects or new ideas.
Wellmann: Exactly. As is common in IT, both GitHub and GitLab allow you to work with branches, which you can even separate from the main project tree to start a new trunk. It’s very clear and helpful. Or you can stay on the main branch and track the development of your code or data, pulling out specific points to work on.

TRR365 involves 20 research groups. Does each team lead have a GitLab account, or does every researcher get access?
Hachinger: GitLab is hosted in Tübingen, but we’ve expanded it for federated use, so it’s available across all project sites. Team leads assign access rights and invite members. Each person gets a personalised login to ensure transparency about who is working on what. It’s not about control—it’s about traceability. Every user can open their own projects and create branches to manage their code and data.

Profile: TRR365 PlantMicrobe

Name: TRR365 “Genetic diversity shaping biotic interactions of plants” PlantMicrobe
Runtime: 2023 – 2026
Topic: Biological diversity, the interaction between plants and microbes
Participants: Eberhard Karls-Universität Tübingen, Ludwig-Maximilians-Universität, Technische Universität München, 20 working groups, approx. 50 researchers
Funding: Deutsche Forschungsgemeinschaft,
approx. 10 Mio. Euro

Where do you see the biggest gap between technical feasibility and researchers’ needs?
Hachinger: Researchers often imagine a data centre with a clean, intuitive interface and a wide range of tools to help them better analyse their data. They want to create advanced visualisations and solve complex analytical tasks—which is totally understandable. But scientific problems vary widely, and building tools that are broadly useful while addressing specific research needs is technically very challenging. One possible solution could be virtual Jupyter Notebooks, which allow users to program tools or analysis workflows directly through the web in a simplified way. We're exploring that idea, but it’s still a big challenge—maybe something for future funding phases of TRR365. For now, we have to work with the manpower and resources we have to build a solid foundation that enables everyone to meet FAIR standards. That’s one of the core goals set by the German Research Foundation, which funds the project.

Did you experienced some “Aha” moments in the development of VERDA?
Wellmann: Since I’m involved in the technical implementation of services, I’d say the integration of the Tübingen and Munich sites was a big one. It was quite a complex task, and I learned a lot through that process.
Hachinger: What really stood out to me is how the National Research Data Infrastructure (NFDI), which is currently being built in Germany, is already influencing TRR365. The NFDI is made up of 30 consortia, each working on data management methods and tools for specific fields. Tübingen is involved in the DataPLANT consortium, and I was surprised by how concrete some of their implementations already are—like the use of GitLab and directory structures such as the Annotated Research Context (ARC). These are excellent standards, and we're trying to build on them with VERDA.

Is there a particular success story from the work on VERDA?
Wellmann: Not the one success story, but there are many small successes - for two or three workshops, for example, we made the LRZ's cloud infrastructure available so that scientists could try out computationally intensive tools that require a lot of computing power during the courses. I also count the GitLab platform as a success; you can see that projects are uploaded here, so the platform is well used.

If students and researchers want to qualify in the field of data management, what skills and qualities do they need?
Wellmann: An interest in FAIR data and open science, the publication of data so that it can be reused, is certainly very good. What is definitely helpful and what researchers already have is experience with scientific work. In my experience, knowledge of IT is helpful, but not essential; at the very least, there should be an interest in learning new techniques or skills.

What attracted you to the work of VERDA, why did you want to become part of TRR365?
Hachinger: TRR365 was an excellent opportunity to put into practice what the NFDI consortia are planning. It is exciting to apply the requirements in a subject area that is rather foreign to me and to try to bring together the needs of female scientists with my ideas on research data management and those of the FAIR pioneers.

What role does VERDA play for Open Science?
Hachinger: Open science is not possible without FAIR data. Not everything that is FAIR has to be open, but everything that is open has to be described in some way in order to be accessible, interoperable and reusable. Data management is a step towards making data accessible, and that is also the core of Open Science. Tools such as GitLab also make it possible to trace how research results or codes were created. We are trying to implement this and promote it with researchers.

„Gitlab in Research Data Management”, Krüger et al., 2024
„Collaboration Across Boundaries: A Science Gateway with Federated Backend, Saker et al., 2024
„5 Fragen“ – Five Questions, an interview about data management as a patrt of research with Dr. St. Hachinger (in German)

It would be great if TRR3675 is funded by the DFG beyond 2026. The DFG is an institution of society: why should it be interested in data projects like VERDA?
Wellmann: The question has two levels. On the one hand, it improves the quality of results if many researchers have access to the data. The more researchers can look at and check the data, the better the quality. Similar results from other projects or from different approaches, in turn, support the findings. If more high-quality data is openly accessible, this also benefits society. On the one hand, it finances research and the results can be used for social or economic purposes. And everyone can access data themselves and form their own opinion about knowledge. That is very valuable.

If continued funding is approved for TRR365, where do you see VERDA in five or ten years' time?
Hachinger: In the first four years, we will certainly manage to establish the system, researchers will overcome reservations and learn how datasets are published. After that, we could go one step further: on the one hand, work processes should be simplified and automated, such as publishing a paper in the university library. More practical user interfaces for displaying or processing research data or linking electronic lab notebooks with VERDA - we can certainly become even more versatile.

Is there a long-term goal for VERDA that goes beyond the TRR365 research network?
Hachinger: We are already technically implementing the rules of DataPlant, the NFDI landscape and the Research Data Alliance, so we hope that VERDA will find its place in the research landscape and become part of a collection of file-sharing and publication platforms. Perhaps VERDA will also be married to other systems that support this. On the other hand, we are making our installation scripts openly available so that anyone can make a copy for similar purposes.

To conclude – what have you learned for science, technology and collaboration through your work on VERDA?
Wellmann: Despite differences in the specialist disciplines, I have found that researchers often have similar problems in everyday life that can be solved technically with comparable means. My own experiences from chemistry came back to me in this project.
Hachinger: When we were still active in science, IT systems were not yet so advanced. We often found that offers were not user-friendly. In this respect, we hope to have done better now. It's also interesting to see that every science has its own style. (Dr. Dagmar Hann, Biologist | LMU)