Wedoany.com Report on Mar 13th, The U.S. AI data platform Protege recently launched the DataLab research initiative, aiming to transform AI data into a more rigorous scientific discipline to address the increasingly prominent data bottleneck in current artificial intelligence development. As AI systems evolve towards complex real-world applications, data quality, selection, and evaluation have become key factors constraining progress.

DataLab, as a dedicated research institution, is committed to helping researchers tackle core challenges in the field of data science. The team consists of internal experts and has already secured preliminary collaborative support from several tech giants including Amazon, Apple, Alphabet, Microsoft, Nvidia, Meta, and Tesla. A recent Snowflake survey revealed that while generative AI projects deliver significant returns, data preparation and quality issues remain widespread obstacles, further underscoring the importance of optimizing the AI data layer.
Protege CEO Bobby Samuels pointed out: "We understand the three core pillars driving AI: models, chips, and data. We believe that with the right datasets—the third, underdeveloped pillar—we can push the entire frontier forward." He emphasized that the company "created DataLab to treat data as infrastructure, not waste," advocating for improved system reliability through the establishment of better standards, reproducibility, and scientific norms.
DataLab will focus on three core areas: fostering scientific collaboration, building high-value datasets and data products, and leading AI data research. This work will balance academic exploration and commercial application, with plans to release benchmarks and technical research findings. Protege co-founder Engy Ziedan stated: "The strength of DataLab lies in its ability to integrate often siloed perspectives." He further explained that this "requires thinking at the margin, where we weigh the marginal value of a data point in learning against the opportunity cost of choosing the wrong dataset," to ensure dataset design is disciplined and possesses a deep understanding of real-world complexity.

As AI technology penetrates deeper into scientific and critical application fields, the demand for data precision has increased significantly. Researchers are increasingly focusing on the marginal value of data, i.e., how a single data point influences model behavior. Protege stated that DataLab will play a role at this level by making scientific decisions regarding data selection, structure, and impact assessment, ensuring AI systems operate reliably in real-world environments and providing support for the scientization of AI data.









