At Cisco Live 2026, company executives made token cost management a central theme, arguing that enterprises are spending on AI without sufficient visibility into what they are consuming or what they are getting for it.
Cisco Chief Product Officer Jeetu Patel described a layered assessment model that spans GPU utilization, model performance, application behavior, and agent activity, with token spend as the connective thread across all of them.
The core commercial concern is that organizations are reaching the end of budget cycles having spent more than they expected, without corresponding gains in output or value.
The token billing model that now dominates AI services is structurally different from the fixed-cost software licensing most enterprises are accustomed to. Costs vary with prompt length, task complexity, and reasoning depth. Chain-of-thought reasoning, which enables more accurate model responses, can consume up to a hundred times more tokens per inference than standard tasks.
As AI becomes embedded in more workflows and as more capable models get deployed, consumption tends to rise even as per-token prices fall. This is a version of Jevons' Paradox: cheaper tokens do not necessarily mean lower total spending, because cheaper access encourages more expansive use. The result is that budgeting for AI looks less like procuring software and more like managing a utility with unpredictable and variable demand.
Cisco's response is to position itself as the observability and control layer for this problem. The stack assessment Patel described is not a product announcement in itself but a framing for how Cisco intends to organize a set of tools across infrastructure, model, application, and agent layers.
The most operationally specific element he described is agent behavior monitoring, which Cisco is building in part through its recent acquisition of Galileo, a company that developed models capable of detecting when an agent is operating outside its expected parameters.
Patel described the ability to intercept and terminate an agent that is consuming tokens beyond defined limits. The Galileo acquisition is a build-versus-buy decision in favor of buying specialized capability, rather than developing behavioral evaluation models internally. That choice reflects both the speed at which the agentic AI market is moving and the difficulty of building reliable behavior assessment from scratch.
Patel's broader argument is that enterprises are still in an early phase where they are learning to use AI effectively, and that token spending during this phase does not yet produce proportionate value. His framing is that value accretes meaningfully only after organizations get past the familiarization stage, and that the risk to the market is a pullback if costs and outputs stay out of alignment for too long.
For Cisco, the commercial opportunity is to reduce that risk by giving enterprises the instrumentation to manage spending before it becomes a boardroom problem.





