China's StepFun Launches Step 3.7 Flash Reasoning Model
2026-06-15 16:43
Favorite

en.Wedoany.com Reported - Nvidia continues to expand its product portfolio, adding several models, with the largest one only previewed. Meanwhile, Microsoft released a series of models at its Build conference in early June, but unfortunately, all are closed-source, further distancing itself from OpenAI.

Shanghai-based AI company StepFun, after successfully launching its Model 3.5 in spring, has released a new reasoning model, Step 3.7 Flash. The model's architecture is similar to its predecessor but adds a Vision Encoder, enabling it to understand images. Reasoning capabilities are now configurable, avoiding the immediate accumulation of a large number of tokens for simple questions, which is particularly useful for agentic applications. Like many Chinese models, its predecessor faced strict scrutiny; the 3.7 version has changed little, but the model provides facts in the reasoning region, which are then suppressed in the final answer, apparently due to guardrails imposed during the final training phase. Apart from this, the answers are mostly correct. Interestingly, for German-language questions, the reasoning process is largely conducted in German, while interjections like "wait" remain in English, differing from almost all other models that reason solely in English. The community has rated this model highly, especially for use with Coding Agents. On the StepFun website, its data significantly outperforms older models, even surpassing DeepSeek V4 Flash. Results for Step 3.7 Flash can be found in the GitHub repository of this article.

MiniMax's M3 model, though labeled as "Open Weight," currently cannot have its weights downloaded from Hugging Face; it can only be tried directly via MiniMax.ai or OpenRouter. MiniMax has optimized the attention architecture: the first stage determines which tokens are important, and the second stage passes these tokens to full attention computation. MiniMax claims that M3 processes prompts nearly ten times faster than M2, with generation speeds even 15 times faster. No public benchmarks are available yet, but MiniMax's own data suggests that in the coding domain, if the data is accurate, it could roughly compete with Anthropic's best models. Results for MiniMax M3 can be found in the GitHub repository of this article.

Liquid AI has adopted a unique architecture for its Liquid Foundation Models, making token generation highly efficient and capable of running well on CPUs. The newly launched LFM2.5-8B-A1B has only one billion active parameters and aims to compete with larger models such as gpt-oss-20b, Qwen3-30B-A3B-Thinking-2507, and Gemma-4-26B-A4B-IT. On a Mac Studio M2 Ultra, this model achieves speeds of nearly 200 tokens per second. While it cannot fully match larger models, it is suitable for specialized applications or agent scenarios. Results for LFM2.5-8B-A1B can be found in the GitHub repository of this article.

Nvidia has released several model updates. The LocateAnything model can be used to analyze images and output bounding boxes containing specific objects; its processing is highly parallelized and can even analyze scanned documents, making it suitable for identifying GUI elements and operating browsers via agents. The model is approximately 8 GB in size and can run on consumer-grade GPUs. The Pixel Diffusion Decoder introduces a novel diffusion model in pixel space, but operation remains cumbersome, requiring checkpoint downloads from the Hugging Face page and processing with specialized programs. The Nemotron 3 Ultra model has 550 billion parameters, with 55 billion active, utilizing the NVFP4 data type and optimized attention mechanisms (including numerous Mamba layers), with a context length of up to 1 million tokens. However, Nemotron 3 Ultra has not yet fully caught up with Chinese open-source models. Like all Nemotron models, Nvidia provides most of the training data and code, achieving a high level of transparency, comparable only to much smaller AI companies like Olmo or the Apertus model. The model's Western origin is evident in its responses: where Chinese models are cautious, this model often provides clearer, more politically neutral, or differing viewpoints. Results for Nemotron 3 Ultra can be found in the GitHub repository of this article.

This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com