Spain's BSC-CNS Releases dislib 1.0.0, Simplifying Distributed Machine Learning and Big Data Analytics
2026-05-16 15:52
Favorite

en.Wedoany.com Reported - The Barcelona Supercomputing Center (BSC-CNS) in Spain officially released the distributed computing library dislib 1.0.0 on May 15, providing a mature and stable toolset for executing big data analytics and machine learning tasks on distributed platforms such as clusters, clouds, and supercomputers. This version significantly enhances compatibility and ease of use for advanced research in ultra-large-scale distributed environments by offering a stable and robust API, marking dislib's evolution from a research prototype into a production-grade codebase suitable for critical applications.

dislib 1.0.0 is a Python library built on top of the PyCOMPSs parallel framework. Its core design philosophy is to allow users to handle large-scale distributed computing using a sequential programming approach with a simple interface similar to scikit-learn. This version iteration deeply integrates PyTorch and PyEDDL, systematically supporting distributed neural network training for the first time, enabling researchers to complete complex full-process tasks from traditional machine learning to deep learning directly within the library.

A fundamental technical restructuring serves as the cornerstone of this update. The library's underlying layer features a distributed multi-dimensional array structure called ds-array, which partitions and stores massive datasets on remote nodes. All algorithms built upon it, such as clustering, classification, regression, and recommendation systems, are defined as parallel executable tasks, automatically scheduled in the background by the PyCOMPSs runtime. This series of improvements ensures that dislib can efficiently process datasets too large to fit into a single machine's memory and is fully compatible with the new generation scientific computing ecosystem, including COMPSs 3.4 and NumPy 2.x.

BSC researcher Eduardo Iraola commented that dislib 1.0.0 is no longer just a research prototype but a mature codebase, and seeing it already supporting real-world applications like seismic impact assessment, personalized medicine, and digital twins is the best proof of moving in the right direction.

The maturity of dislib has been validated in numerous scientific research projects. In the field of astrophysics, dislib, combined with data from the European Space Agency's Gaia mission, successfully executed DBSCAN clustering algorithm analysis, revealing open star clusters in the Milky Way. In the healthcare sector, the AI-SPRINT project utilized dislib's random forest model for atrial fibrillation detection, advancing the development of personalized medicine. Furthermore, in European high-performance computing projects, dislib has been widely used in areas such as natural disaster early warning, manufacturing digital twins, aerospace composite material design, and extreme climate impact assessment, fully demonstrating its high stability and broad applicability in multidisciplinary integrated computing.

dislib is open-sourced under the permissive Apache 2.0 license. Researchers and developers can install it locally via the pip install dislib command or load it directly on top-tier supercomputers like MareNostrum. BSC-CNS has also simultaneously optimized project documentation and Docker images, lowering the technical barrier for beginners and professionals in the convergence of HPC and AI by providing two independent environments: base and PyTorch.

This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com