en.Wedoany.com Reported - From June 2 to 3, Microsoft released updates to Windows AI APIs at Build 2026, adding an on-device speech recognition API and introducing two types of small language models for local operation: Aion 1.0 Instruct and Aion 1.0 Plan. These capabilities are primarily aimed at Windows 11 developers, enabling speech-to-text, intelligent text processing, and local agent-based task execution on personal computers.
This update further pushes language processing capabilities to the terminal side. The new speech recognition API supports generating real-time or batch transcription results from microphones, audio streams, and audio files, which can be used for caption generation, dictation input, audio/video applications, and accessibility tools. Microsoft emphasizes that by running locally, this capability can generate transcriptions even without a network connection, reducing reliance on cloud-based inference. For enterprise software, meeting tools, industrial field recording, remote operations and maintenance, and education and training systems, the value of on-device speech transcription lies in reducing latency, lowering cloud invocation costs, and allowing some sensitive voice data to be processed locally on the device. As AI enters more office and industry terminals, speech recognition is transitioning from a standalone functional module to an operating system-level foundational capability.
Aion 1.0 Instruct is positioned as a small language model for on-device workloads, supporting tasks such as summarization, rewriting, intent recognition, and accessibility-related intelligent text processing.
Aion 1.0 Plan, on the other hand, is designed for local agent-based reasoning scenarios. With 14 billion parameters, it supports a context length of 32,000 tokens and tool invocation capabilities, helping applications understand user intent, call tools, manage files, and orchestrate sub-agents. Microsoft plans to have this model run as part of Windows on eligible devices, moving some agent-based workflows from the cloud to local devices. For developers, this means future desktop applications can directly invoke text understanding, speech recognition, and tool orchestration capabilities at the operating system level, without needing to integrate external model services for each application individually. For enterprise IT departments, on-device models also introduce new governance issues, including model permissions, file access boundaries, user identity recognition, data retention, device performance, and cross-application auditing. Whether these capabilities can be widely adopted in enterprise scenarios will depend on the simultaneous maturity of local AI capabilities and security management mechanisms.
Microsoft also announced that Windows AI APIs will be extended to more Windows 11 PCs. In addition to NPUs, some capabilities will also support CPUs and GPUs. The speech recognition API will initially support English, with gradual expansion to more global markets. As on-device models, speech recognition, and local agent capabilities are integrated into the Windows development ecosystem, language processing technology is transitioning from cloud service interfaces to the terminal operating system layer, becoming a key foundational component for application development, accessible interaction, and enterprise intelligent workflows.
This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com









