More than nine billion web pages in 185 languages were crawled, indexing around 1,330 tebibytes (approximately 1,462 terabytes) of content: this work forms the basis of the open web index OWI, which the EU‑funded OpenWebSearch.EU (OWS.EU) project released in May 2025. “This index can be used by individuals or organizations to develop their own search engines,” says Prof. Michael Granitzer of the University of Passau, project lead of OWS.EU.
The vision behind OWS.EU when it launched in autumn 2022 was to create the technical foundations for new, specialized search services that can reflect Europe’s linguistic and socio‑cultural diversity. Fourteen organizations from seven European countries—including universities, research institutes and companies, as well as supercomputing centres such as the Leibniz Supercomputing Centre (LRZ) – worked together to develop an infrastructure for collecting and cataloguing internet data. Without this infrastructure, neither search engines nor search services based on large AI language models (LLM) would be conceivable. OWI is the central result of these efforts – unlike its commercial counterparts, the web index is publicly accessible and freely available like open‑source software. In addition, OWI is designed transparently, making it easy to trace and verify the sources of search results.
New services are primarily a question of “resources and costs,” as Granitzer explains. For OWS.EU, servers at several supercomputing centres were in constant use, crawling around 100 million web addresses per day. Even so, this represents only a fraction of the effort invested by commercial search engine operators such as Google, Bing, or Baidu. “To keep up, we would have to increase our efforts by a factor of 20 to 30,” says Granitzer. That would be feasible, “but we would have to hire staff to maintain the service and buy more storage.” Operating a web index 24 hours a day, year‑round is costly—and naturally beyond the scope of a publicly funded research project. Nevertheless, there is hope that entrepreneurs and investors will emerge who use OWI to build innovative businesses and services.
That OWI provides a solid foundation has already been demonstrated. Seven community projects funded by OWS.EU developed business ideas, search services, and tools for companies or organisations based on the web index – for example, a fact‑checking service for current topics or a tool for building online shops from data in enterprise resource planning (ERP) systems. The Know Research Center in Graz, which develops solutions for business and society from research results, built a health‑focused search engine using 200,000 websites indexed by OWI and AI models. Tilde does not rank results solely by popularity or reach, but by reliability: “For each search result, Tilde lists the sources and assesses trustworthiness,” explains project lead Dr. Michael Jantscher. “Users can also decide for themselves which information they consider more important—scientific studies, specialist articles, blogs, or social media.” Based on OWI, the service has become a blueprint for further thematic search engines as well as follow‑up projects: “We can reuse the experience and strategies from the OWS project for other tasks,” says Jantscher.
Like Tilde, the other use cases and technical solutions for indexing web content are intended to inspire founders and companies. And that can pay off—for businesses as well as for Europe and society. According to a study by the Munich‑based consultancy Mücke, Roth & Company, investments based on OWI could become profitable after around four years of operation. The profit the EU could gain from online search services, the resulting economic and social improvements, and increased technological competitiveness is estimated at around €4.5 billion.
Research also continues: the project “Scalable, Open, and Comprehensive Recognition of Disinformation Campaigns on the Web,” or SOURCE for short, will build on OWS.EU and develop a Europe‑wide, open research and analysis infrastructure for identifying and investigating disinformation campaigns. Its goal will be, among other things, to continuously collect large volumes of web and social media content using OWI and analyse them with the help of AI. This will create a freely available database that stores disinformation content and can be used to verify online texts and images or to train AI‑based fact‑checking tools. (vs | LRZ)