DeepSeek API Input Cache Price Reduced to One-Tenth of Launch Price, V4-Pro Limited-Time Offer at 0.025 CNY/Million Tokens

2026-04-27 10:16

Keywords:

DeepSeek

Favorite

en.Wedoany.com Reported - On April 26, DeepSeek announced an API price adjustment. The price for all API cache hits across the entire series has been reduced to one-tenth of the launch price. With an additional limited-time 2.5-fold discount on the V4-Pro, the cost per million tokens for cache hits is as low as 0.025 CNY, setting a new global low for large model pricing.

According to the pricing page for DeepSeek's official API, this price cut covers the entire V4 series, with a core focus on cache hit scenarios. For DeepSeek-V4-Flash, the cache hit price has been reduced from 0.2 CNY to 0.02 CNY per million tokens. For enterprise users, the DeepSeek-V4-Pro offers even greater savings. Its cache input price was lowered from the original 1 CNY to 0.1 CNY per million tokens. Before May 5, 2026, an additional limited-time 2.5-fold discount brings the effective price to just 0.025 CNY per million tokens. For cache misses, the input price decreased from 12 CNY to 3 CNY, and the output price from 24 CNY to 6 CNY.

The underlying support for this significant price reduction comes from the technological upgrades in DeepSeek-V4. The preview version of DeepSeek-V4 was officially released and open-sourced on April 24, featuring two models: V4-Pro and V4-Flash, both supporting a super-long context of up to 1 million tokens. The proprietary sparse attention architecture drastically reduces inference compute costs. For the Pro version, compute power per token is only 27% of that of V3.2, and KV cache is reduced to 10%, achieving cost optimization at the foundational level. Official parameters show that DeepSeek-V4-Pro has 49B activated parameters and is trained on 33T data points, positioning it as a high-performance flagship. DeepSeek-V4-Flash has 13B activated parameters, trained on 32T data points, and focuses on high speed and low cost.

In agent capability evaluations, DeepSeek-V4-Pro has already achieved the best performance among current open-source models and has also performed excellently in other agent-related benchmarks. Internally, DeepSeek has adopted V4 as the agentic coding model for its employees. Evaluation feedback indicates that user experience is better than Sonnet 4.5, and delivery quality is close to Claude Opus 4.6 in non-thinking mode. In world knowledge assessments, V4-Pro significantly surpasses other open-source models but lags slightly behind the top-tier closed-source model Gemini 3.1 Pro. In mathematics, STEM, and competitive coding benchmarks, V4-Pro outperforms all currently evaluated open-source models, rivaling world-class closed-source models. The V4-Flash version slightly trails the Pro version in world knowledge reserves but exhibits comparable reasoning capabilities. Due to its smaller model parameters and activation, it offers faster and more economical API services.

On the computing power ecosystem front, the deep synergy between DeepSeek-V4 and Huawei's Ascend is another key factor driving the price reduction. The entire Ascend SuperNode product line now supports the DeepSeek-V4 series models. In its technical report, DeepSeek disclosed that the fine-grained expert parallelism scheme has been validated on both NVIDIA GPUs and Huawei Ascend NPU platforms. Compared to a strong non-fusion baseline, this scheme achieves a 1.50 to 1.73 times speedup in general inference tasks and up to 1.96 times speedup in latency-sensitive scenarios. DeepSeek emphasized that with the mass rollout of the entire Ascend SuperNode product line in the second half of 2026, further price reductions for the Pro version can be expected. Cost reductions for high-frequency calling and long-text processing scenarios exceed 90%. Applications with high cache hit rates, such as RAG knowledge bases, intelligent customer service, and document analysis, can directly achieve substantial decreases in commercial operating costs.

This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com