en.Wedoany.com Reported - Doctors from tertiary hospitals interviewed by Jiemian News stated that an increasing number of patients are bringing AI-generated diagnostic results to consultations, increasing the cost of doctor-patient communication. Some doctors reported that out of 30 patients seen in a morning, 25 brought AI conclusions. Against this backdrop, Baichuan Intelligent released the Baichuan-M4 Medical Enhancement Large Model, which is structurally reconstructed based on a general large model and specifically enhanced for the medical field, aiming to improve the reliability of AI in medical decision-making.
In the latest HealthBench evaluation, the M4 achieved a comprehensive score of 68.6, a Hard task score of 49.7, and a hallucination rate reduced to 3.3%. In the HealthBench Professional evaluation, which is closer to real clinical environments, the M4's basic reasoning score was 55.1, higher than GPT-5.5's 51.8.

The capability improvements of the M4 are reflected in four aspects. First, dynamic consultation capability: based on the SCAN-bench 2.0 system, the model's training scenarios have been expanded from single standardized consultations to multiple visits and complex patient profiles. In the SCAN-bench evaluation, the M4 scored 79.0 for initial consultations and 74.7 for follow-up consultations; its long-context clinical memory score was 86.9, an improvement of 21.1 points over the previous generation M3. Second, evidence-based capability: the M4 has constructed an atomic clinical pathway system, breaking down medical guidelines into over 1,000 reusable clinical decision units, covering the complete diagnosis and treatment processes for more than 200 common diseases. In the Baichuan-EBM evaluation, the evidence-based citation accuracy reached 90.0, significantly higher than GPT-5.5's 54.7.
Third, scheduling capability: the M4 introduces the Harness architecture, allowing the model to autonomously decide when to ask follow-up questions, retrieve evidence, or recall medical history, while completing operations under real-time safety constraints. Fourth, full-course memory capability: the model can integrate historical medical records, multiple rounds of consultations, lab test trends, and medication feedback, mastering the patient's past medical history and indicator changes across multiple conversations.
The C-end product Baixiaoyi, based on the M4 model, is undergoing internal testing among some users. This product can gradually complete medical history information through multiple rounds of dialogue, narrow down the scope of risk assessment, and guide users to seek medical attention when necessary. According to data released by Baichuan Intelligent, in tests conducted at institutions including the Cancer Hospital of the Chinese Academy of Medical Sciences (Oncology Department), Beijing Children's Hospital Affiliated to Capital Medical University (Pediatrics Department), and Ruijin Hospital Affiliated to Shanghai Jiao Tong University (Department of Respiratory and Critical Care Medicine), 6,944 dialogues were generated within 27 days across 75 patient groups. Baixiaoyi achieved a safety rate of 99.6% and a deep interaction rate ranging from 60% to 73%.


Baichuan Intelligent positions the M4 as the "brain" for medical scenarios, while Baixiaoyi serves as the "body" connecting to users. The former is responsible for professional reasoning, evidence-based analysis, and long-term memory, while the latter brings this capability into home scenarios. The company plans to adopt a "dual-doctor model," where AI handles long-term companionship, information organization, and risk reminders outside the consultation room, while human doctors are responsible for diagnosis and treatment decisions.
This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com









