China's ShengShu Technology Launches Real-Time Interactive Video Model Vidu S1
2026-07-04 10:43
Favorite

en.Wedoany.com Reported - On July 3, at the 2026 Global Digital Economy Conference, ShengShu Technology released the next-generation video foundation model Vidu S1, which achieves real-time interactive video generation, transforming AI video from generating single clips to supporting continuous real-time interaction.

Vidu S1 supports real-time video dialogue with character control via voice guidance. Users can naturally control AI avatars through voice input and engage in unlimited continuous interactions. The model offers 540P (960x540) resolution at 25 FPS (up to 42 FPS), allowing users to instantly create personalized interactive characters from a single image (real people, anime characters, or even pets) with customizable voice options. The entire system runs on consumer-grade GPUs, significantly lowering the hardware barrier for real-time interactive video generation.

Most existing video generation models use offline workflows, where users submit prompts and wait for video generation, with content fixed once generated. Vidu S1 introduces a real-time interactive video generation framework, enabling users to continuously provide voice input during real-time video conversations. The model processes voice input along with dialogue context and current visual context, allowing subsequent video content to be generated and updated in real time. This model does not rely on audio-driven lip movements or predefined animation libraries; instead, it understands the semantics, intent, and emotional context of voice input, generating synchronized lip movements, facial expressions, eye movements, gestures, body postures, and full-body actions in real time.

Vidu S1 adopts an autoregressive diffusion (AR+Diffusion) architecture. Instead of pre-generating the entire video, it continuously predicts and generates subsequent video content based on already generated frames, current voice commands, and dialogue context. When users provide new instructions, the model updates the character's expressions, actions, and subsequent behaviors in real time, allowing interactions to evolve continuously during the conversation. This model is a leading model for infinite-duration real-time video generation, achieving real-time responses in long conversations while maintaining character identity consistency, natural and coherent motion, and continuous processing of user input.

To achieve real-time interactive video generation at 540P (960x540) resolution and 25 FPS, with support for up to 42 FPS, ShengShu Technology employs inference acceleration techniques at the model level, including TurboDiffusion, low-bit SageAttention, sparse attention methods SLA and SpargeAttention, reducing per-frame computational costs through few-step generation, model quantization, and optimized inference kernels. At the system level, the inference service engine TurboServe efficiently schedules inference workloads, dynamically allocating computing resources based on interaction states. These optimizations enable Vidu S1 to run real-time interactive generation on consumer-grade GPUs, providing a technical foundation for applications such as real-time video dialogue, interactive live streaming, AI companions, interactive games, and XR experiences.

In character creation, Vidu S1 introduces a fully generative workflow. Users only need to upload a single image, and the model captures the character's identity, appearance, and visual style, generating synchronized lip movements, facial expressions, gestures, and full-body actions in real time without requiring modeling or training for specific characters. Whether based on real people, anime characters, or pets, a single image can be transformed into a real-time interactive character with customizable voice options.

Vidu S1 is now publicly available. Users can create and interact with AI avatars in real time from their custom images. Its API platform allows developers and enterprise partners to build real-time interactive applications.

This bulletin is compiled and reposted from information of global Internet and strategic partners, aiming to provide communication for readers. If there is any infringement or other issues, please inform us in time. We will make modifications or deletions accordingly. Unauthorized reproduction of this article is strictly prohibited. Email: news@wedoany.com