en.Wedoany.com Reported - Yongsheng Intelligence, a subsidiary of MGI Tech, and the Shanghai Artificial Intelligence Laboratory jointly released two new achievements in the AI for Bio field: the multi-agent system ProtoPilot and the full-process Agent evaluation system BioLab Bench, realizing a complete closed loop from experimental intent to wet-lab physical execution by AI.
Currently, the AI for Bio track has attracted a large number of technology companies, including OpenAI's GPT-Rosalind, Google's Co-Scientist and ERA, and Anthropic's Claude Science Workbench. These players aim to enable large models to generate experimental protocols and execute them in the lab, but the industry has generally remained at the stage of "being able to produce protocols but not results." Specifically, an experimental intent must traverse five layers of transformation: scientific intent, protocol design, standard operating procedure, device code, and physical execution. An error at any step can lead to experimental failure.

The newly released ProtoPilot adopts a multi-agent collaborative architecture, comprising the Orchestrator Agent, Protocol Expert Agent, and Coding Agent. The Orchestrator Agent is responsible for coordinating the workflow and decomposing tasks, the Protocol Expert Agent generates experimental protocols and SOPs, and the Coding Agent converts protocols into device-executable code. The system has a built-in validator that checks code safety and executability line by line, and feeds back failure reasons, expert judgments, and experimental results, forming a closed-loop learning capability.
On the industry-recognized third-party benchmark ProtocolQA, ProtoPilot scored 52.38% in open-ended questions, approaching the level of human experts (54%), and 85.18% in non-open-ended questions, surpassing expert level. For comparison, OpenAI's current flagship model GPT-5.6 Sol scored 43.5% in open-ended questions. In the Protocol task evaluation, ProtoPilot achieved a comprehensive score of 94.7 (out of 100), with parameter rationality at 98.9, methodological consistency at 97.7, and content completeness at 98.4. In a blind evaluation, three independent wet-lab scientists, unaware of the system's identity, ranked ProtoPilot first in 70.6% of cases and in the top three in 90.2% of cases. For the highest complexity L3 tasks, ProtoPilot achieved a pass rate of 60%, while the industry benchmark OpenTrons-AI had a pass rate of zero.

In the code conversion and device execution phase, ProtoPilot's Protocol2Code code quality median reached 95.5, with a Gate Pass Rate of 96.6%. For comparison, LabScript-AI had a pass rate of 64.6%, Grok-4.3 at 35%, and GPT-5.5 at 17.7%. In cross-device migration tests, the system's Gate Pass Rate fluctuated by only 5.9 percentage points across four mainstream platforms: MGI AlphaTool, Hamilton STAR, OpenTrons OT-2, and Tecan EVO. On the OpenTrons OT-2, ProtoPilot achieved a pass rate of 88.24%, while OpenTrons' official AI was only 32.35%.

BioLab Bench is the first full-process Agent evaluation system in the life sciences field, covering the entire pipeline from user requirements to device executability. Task scopes are stratified by difficulty from L1 to L3, encompassing experimental intent understanding, Design2Protocol, Protocol2SOP, SOP2Code, device code, and real experimental execution, with support for cross-platform verification.

In real wet-lab validation, ProtoPilot completed four sets of experiments with increasing difficulty. The first set involved inoculation and culture in a 96-well plate, with all 96 wells showing growth and stable OD600 readings. The second set consisted of 24 colony PCRs, all amplifying the expected bands. The third set involved plasmid construction and site-directed mutagenesis, successfully constructing GLuc-WT and RLuc-WT plasmids, along with 15 mutants confirmed by Sanger sequencing. The fourth set was DNA assembly based on the PCA method, involving seven steps, with an initial screening positive rate of 96.9% (93 positives out of 96 candidate clones), and Sanger sequencing confirmed the successful construction of all four target DNA sequences. Additionally, after the first round of PCA assembly and transformation failed due to antibiotic selection issues, the system automatically analyzed the cause and generated a revised protocol, successfully obtaining pickable single colonies in the second round, which were confirmed by sequencing.
Founded in March of this year, Yongsheng Intelligence is a subsidiary of MGI Tech, focusing on the AI4S field. Its team has previously published projects such as EvoPlay and PrimeGen in Nature sub-journals and led the development of the flash sequencer E25 Flash. MGI Tech owns intelligent experimental automation products including PrepALL, AlphaTool, and the AIO all-in-one machine, accumulating over 3,800 users globally by the end of 2025. The capabilities of ProtoPilot and BioLab Bench have been fed back into Yongsheng Intelligence's product ecosystem, providing evaluable and correctable capabilities for αLab Brain, while enabling hardware devices such as AlphaTool and PrepALL to connect to the Bio Agent ecosystem through Protocol2Code.
Paper: https://arxiv.org/abs/2606.31763










