Finnish LUMI AI Factory Launches Dataset-as-a-Service, Streamlining Data Access for AI Development
2026-04-02 09:40
Favorite

en.Wedoany.com Reported - Finland's LUMI AI Factory has recently launched Dataset-as-a-Service (DaaS), aiming to address the time and resource consumption issues associated with data movement in traditional data processing. This service makes data visible where computing power resides, shortening the distance from data to results and enhancing the efficiency of experiments and research. By integrating metadata, access permissions, and data location, Dataset-as-a-Service makes datasets immediately usable on the LUMI supercomputer. This is crucial for AI development, as the proximity of data to computation significantly impacts performance.

Dataset-as-a-Service provides users with a data catalog interface. Data producers can publish datasets in a controlled manner, while data users can discover them without manual searching. The service simplifies access to AI-ready datasets, eliminates bottlenecks during the replication of large datasets, and provides data providers with a standardized publishing pathway, thereby increasing data visibility and utilization. Unlike traditional data repositories, Dataset-as-a-Service focuses on usage rather than long-term preservation. It orchestrates data access, allowing users to work with datasets without moving the data itself.

Dataset-as-a-Service is built on existing components, including CSC's Fairdata-Metax metadata repository and Fairdata-Etsin search tool, as well as LUMI-O object storage and the REMS authorization system. This modular architecture enables cost-effectiveness and scalability. Currently, a pre-production version of the service is available. The data catalog contains ten dataset collections, such as an open web search index, comprising over 1,000 datasets with a total capacity exceeding one petabyte, supporting search engine development and large language model training. As Dataset-as-a-Service matures towards full production, it will accelerate AI development and promote the immediate availability of data where value is created.

This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com