en.Wedoany.com Reported - Nutanix has launched Agent Gateway, an AI control plane designed to provide enterprises with a centralized means of managing model usage and optimizing token costs. This tool aims to address the "disorderly" token overhead caused by employees potentially using frontier models for tasks such as simple document summaries, offering enterprises a unified view of who is using what and how to control token usage.

Deployed between users, applications, and the growing number of open-weight and frontier models, the gateway allows enterprises to establish policies on who can use which models based on different workloads and costs. Nutanix CEO Rajiv Ramaswami positions this control plane as a tool for defining the ROI of AI deployments, clarifying which teams can use which tools and models, for which use cases, and how many tokens they are allowed to spend.
Ramaswami stated in a press conference that currently anyone can access anything, but with Agent Gateway, enterprises can set rules allowing engineering teams to use "simple models" for a set of use cases, reserving the most advanced systems for the most challenging multi-agent applications. He noted that the AI gateway concept resonated with executives before its launch, receiving widespread attention at CIO and COO meetings during a trip to London, and has become a concern for senior executives including CIOs, COOs, and even CFOs. Nutanix is pushing partners to catch up and convey this message to customers.
Agent Gateway is part of the Nutanix AI stack (Enterprise AI 2.7), connecting AI users and agents to models as well as tools and servers compatible with the Model Context Protocol (MCP), while enforcing policies and rules preset by infrastructure operators. Ramaswami promised that the platform will evolve over time, envisioning it as an "AI within AI" that can become smarter and understand applications themselves, thereby selecting appropriate models and optimizing costs autonomously.
Currently, the gateway is designed for Nutanix's GPU-based inference stack, which runs on Kubernetes and provides shared inference endpoints for a mix of open-weight and frontier models. The stack is currently based on Nvidia, but Nutanix plans to support AMD "by the end of this year," a move following AMD's $150 million investment in Nutanix in February. Ramaswami stated that Nutanix ultimately aims to be hardware-agnostic, expanding support for multiple hardware platforms, providing the inference stack and gateway to help enterprises deploy and use AI cost-effectively, potentially running on Google TPUs or AMD GPUs in the future, offering a range of cost options.










