China's DeepSeek-V4-Pro API Adjusted to 1/4 of Original Pricing, Long-Term Low-Price Strategy Drives Down Large Model Inference Costs

2026-05-23 17:37

Favorite

en.Wedoany.com Reported - On May 22, DeepSeek's official pricing page showed that the API price for the DeepSeek-V4-Pro model will officially be adjusted to 1/4 of its original pricing after the 75% discount promotion ends at 23:59 Beijing time on May 31, 2026. This means the model's temporary 75% off price will become the new official price after the promotional period concludes.

This adjustment directly changes developers' cost expectations for invoking high-tier models. The original pricing for DeepSeek-V4-Pro was 0.1 yuan per million tokens for cache hit input, 12 yuan per million tokens for cache miss input, and 24 yuan per million tokens for output; after adjustment to 1/4 of the original pricing, the corresponding prices are 0.025 yuan per million tokens for cache hit input, 3 yuan per million tokens for cache miss input, and 6 yuan per million tokens for output. DeepSeek's English pricing page simultaneously shows that the discounted price for DeepSeek-V4-Pro is $0.003625 per million tokens for cache hit input, $0.435 per million tokens for cache miss input, and $0.87 per million tokens for output, with pre-discount prices being $0.0145, $1.74, and $3.48 respectively.

The industry impact of this price adjustment is concentrated on AI application development, agent invocation, and enterprise-level model replacement costs. If API pricing is only a short-term promotion, development teams typically find it difficult to restructure long-term product costs based on it; after the official price reduction, enterprises can more stably estimate invocation costs when building customer service robots, code assistants, knowledge base Q&A, data analysis, automated workflows, and multi-agent systems. For high-frequency invocation scenarios, output prices and cache miss input prices constitute the bulk of costs. DeepSeek-V4-Pro fixing these two types of prices at 1/4 of the original will significantly reduce the unit cost for complex reasoning, long text generation, code generation, and multi-turn task orchestration.

DeepSeek-V4-Pro's pricing page also shows that the model has a context length of 1M, a maximum output length of 384K, and supports features such as JSON Output, Tool Calls, conversation prefix continuation, and FIM completion; the concurrency limit is 500. Compared with DeepSeek-V4-Flash, V4-Pro is geared towards higher complexity tasks and is more expensive, but after officially dropping to 1/4 of the original pricing, developers will have a clearer tiered choice between "low-cost batch invocation" and "high-capability complex task invocation."

The large model price war is shifting from single launches to platform operational capability competition. Lower model API prices will encourage more small and medium developers, vertical industry software vendors, and internal enterprise technical teams to try embedding large models into existing business systems. However, low pricing alone cannot replace model stability, concurrency capability, context processing, tool invocation, observability, and data security capabilities. For enterprise customers, API pricing is only part of the total cost of ownership; actual deployment must also consider prompt engineering, caching strategies, invocation frequency, failure retries, audit logs, permission control, private data integration, and application-layer security.

DeepSeek had previously reduced cache hit input prices across its entire model series. The official pricing page explains that the cache hit input price for the entire model series has been reduced to 1/10 of the initial launch price, effective from 20:15 Beijing time on April 26, 2026. The reduction in cache hit pricing is particularly important for long contexts, repetitive system prompts, knowledge base retrieval, and agent tasks, as enterprise applications often repeatedly pass in similar background information, and the caching mechanism can significantly reduce repetitive input costs.

Subsequent variables will focus on the actual user experience after the price adjustment, concurrency limits, developer migration speed, and enterprise application cost estimation. DeepSeek's pricing page also notes that product prices may change, the platform reserves the right to modify prices, and users should top up based on actual usage and regularly check the latest pricing information. What can be confirmed at this stage is that the DeepSeek-V4-Pro API will officially be adjusted to 1/4 of the original pricing after the 75% discount ends on May 31; this should not be extrapolated to mean free access, unlimited concurrency, or simultaneous effect on all historical top-ups and third-party platform pricing.

This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com

China

This bulletin is compiled and reposted from information of global Internet and strategic partners, aiming to provide communication for readers. If there is any infringement or other issues, please inform us in time. We will make modifications or deletions accordingly. Unauthorized reproduction of this article is strictly prohibited. Email: news@wedoany.com

Previous：France's Macron Adds €1 Billion to Quantum Plan Funding, Europe's Quantum Computing Sovereignty Chain Enters Expansion Investment Phase

Next：Spain's Tecnalia Completes 4,759 R&D&I Projects in 2025 with Revenue of €156 Million