alphaXiv Adopts Chinese Open-Source Model GLM-5.2 as Alternative to Restricted Claude Models
2026-06-23 09:17
Favorite

en.Wedoany.com Reported - In its automated researcher demonstration, alphaXiv has adopted the Chinese open-source model GLM-5.2 for the first time, replacing Anthropic's frontier models—Claude Fable 5 and Mythos 5—which were previously inaccessible due to restrictions imposed by U.S. authorities. The alphaXiv team explicitly stated that this is merely their own demonstration activity, not an independent test; the choice of an open-source alternative stems from frontier models being closed off from research, prompting the open-source community to seek alternatives.

In the showcased run, GLM-5.2 autonomously completed a comparison of two reinforcement learning training schemes—a fully asynchronous approach versus a merged synchronous approach. The experiment was conducted on two nodes, each equipped with eight H100 accelerators, based on the SkyRL framework, using the Harbor code competition task set. According to the team's description, the agent autonomously resolved environmental issues (the libnuma dependency), completed all runs, and aggregated final comparative data on throughput and reward stability.

alphaXiv's automated researcher feature is designed to address the reproducibility of paper code. When users change "arxiv" to "autoarxiv" in a paper's URL, the agent automatically deploys the repository, fixes the environment, runs minimal reproducibility checks, and evaluates the cost of fully reproducing the results. This process involves engineering tasks—setting up and validating others' code—rather than scientific discovery. For private code, there is a separate platform, OpenResearch.sh.

GLM-5.2, developed by China's Z.ai (formerly Zhipu AI), is an open-source model based on the MoE architecture, with approximately 750 billion parameters, activating around 40 billion parameters per token, a context length of 1 million tokens, and licensed under MIT. The team noted that the key feature of this model is not its benchmark scores, but the fact that its open-source weights cannot be revoked by regulatory bodies—a safeguard for tools requiring predictable access.

The alphaXiv team acknowledged that GLM-5.2 lacks visual capabilities: while other models directly read trends from charts in WandB (an experiment tracking service), GLM writes numpy code to parse raw numbers—sufficient for simple runs but potentially cumbersome for complex tasks. The team stated that the current model has not yet truly engaged in research; its strength lies in solving implementation issues and reproducing existing work. Here, autonomous research refers to the engineering cycle of experiments, not scientific discovery.

This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com