en.Wedoany.com Reported - At the 2026 Apsara Conference held on May 20, Alibaba officially released its new-generation flagship Qwen model, Qwen3.7-Max. In the global large model blind test leaderboard by third-party organization Arena, this model surpassed Kimi-K2.6, DeepSeek-v4-pro, and GLM-5.1, approaching the strongest models from GPT, Claude, and Gemini, ranking first among Chinese models. Qwen3.7 is newly designed for the current agent era, achieving continuous breakthroughs in core capabilities such as coding and reasoning, and can fully autonomously complete ultra-long-duration complex agent tasks lasting up to 35 hours, marking the official entry of Chinese large models into an agent-centric capability validation phase.
Looking at various benchmark tests, Qwen3.7-Max demonstrates product strength closely trailing global leading models. In coding agents, the model scored 80.4 on SWE-Verified, on par with Claude Opus-4.6 Max's 80.8 and DeepSeek-v4-Pro Max's 80.6; it scored 69.7 on Terminal Bench 2.0-Terminus, surpassing DeepSeek-v4-pro-Max's 67.9. In reasoning capability, it scored 92.4 on GPQA Diamond and 41.4 on HLE, both outperforming Claude Opus-4.6's 91.3 and 40.0. In general agent evaluations, it scored 60.8 on MCP-Mark, exceeding GLM-5.1's 57.5; and 76.4 on MCP-Atlas, slightly higher than Claude Opus-4.6's 75.8. On the office automation benchmark SpreadSheetBench-v1, it achieved a top-tier score of 87.
The 35-hour ultra-long-duration autonomous evolution test is the most compelling proof of capability from this release. On the T-Head Zhenwu M890, a completely new hardware platform the model had never encountered before, Qwen3.7-Max started from scratch, receiving only a task description, an SGLang Triton reference implementation, and evaluation scripts. It ran continuously for 35 hours, independently completing 432 kernel evaluations and 1,158 tool calls, fully autonomously handling coding, compilation, performance analysis, and iterative improvement, ultimately boosting inference speed by 10 times compared to the original version. The test trajectory shows that the model still discovered effective optimization points after running independently for over 30 hours and proactively initiated a key architectural redesign. This experiment also compared the performance of several leading models on the same task: GLM 5.1 achieved a 7.3x speedup, Kimi K2.6 achieved 5.0x, while DeepSeek V4 Pro only achieved 3.3x and terminated mid-process automatically.
Breakthroughs at the technical architecture level are the foundation enabling these capabilities. Tongyi Lab adopted an orthogonally decoupled "Task-Runtime Framework-Validator" design for the training architecture of Qwen3.7-Max. By pushing reinforcement learning training from synthetic data towards real distribution, it achieved general agent strategies and cross-framework generalization capabilities. The model integrates office productivity tools like office-cli via the Model Context Protocol (MCP), supports multi-agent orchestration and embodied intelligence control extensions, and fully aligns with OpenAI and Anthropic API protocols, enabling plug-and-play seamless integration with mainstream agent frameworks such as Claude Code, OpenClaw, and Qwen Code. Alibaba's large model development has significantly accelerated; within the past 3 months, the Qwen flagship large model has steadily iterated through three versions—3.5, 3.6, and 3.7—continuously raising the performance ceiling for Chinese models.
This release is not just a model capability upgrade, but a core part of Alibaba Cloud's full-stack technology system reconstruction for the Agentic era. On the day of the summit, Alibaba Cloud announced the completion of a "Chip-Cloud-Model-Inference" full-stack Agentic upgrade, simultaneously launching the new AI product official website "Qwen Cloud" born for agents, and super-node servers equipped with the self-developed AI chip Zhenwu M890. Liu Weiguang, Senior Vice President of Alibaba Cloud, stated that after agents break through the critical point, they can work 24/7 non-stop, creating an endless demand for AI and cloud services. Alibaba Cloud is undergoing full-stack technological innovation, comprehensively upgrading from underlying chips, Agentic Cloud, models to inference platforms, building China's largest AI factory.
Alibaba Group CEO Wu Yongming stated during the May 13 earnings call that the annualized recurring revenue (ARR) from Alibaba's AI models and application services has exceeded 8 billion yuan, and is expected to surpass 30 billion yuan by year-end. AI-related product quarterly revenue reached 8.971 billion yuan, with annualized revenue exceeding 35.8 billion yuan, accounting for over 30% of Alibaba Cloud's external commercialization revenue for the first time. Previously, Alibaba committed to investing over 380 billion yuan (approximately 53 billion USD) in cloud and AI infrastructure over the next three years. Wu Yongming emphasized on the call: "Currently, almost no card in our servers is idle. Given the demand over the next 3 to 5 years, the return on investment for our massive AI data center construction is very certain."
This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com










