From tokens to trust: AI’s $2.5-trillion reckoning

/ 4 min read
Summarise

The era of speed and access is ending. What replaces it will define enterprise winners and losers.

Gartner forecasts worldwide AI spending will reach $2.52 trillion in 2026, up 44% year-on-year.
Gartner forecasts worldwide AI spending will reach $2.52 trillion in 2026, up 44% year-on-year. | Credits: Getty Images

For two years, enterprise AI strategy ran on a single instinct: to get to the frontier fastest. The default path was a public cloud account, an API key from OpenAI or Anthropic, and a willingness to absorb cost in exchange for speed. That instinct produced extraordinary experimentation—and it is now hitting a wall.

ADVERTISEMENT
Sign up for Fortune India's ad-free experience
Enjoy uninterrupted access to premium content and insights.

Gartner forecasts worldwide AI spending will reach $2.52 trillion in 2026, up 44% year-on-year, with $1.37 trillion of that flowing into AI infrastructure alone. In fact, in mid-2025, it claimed that AI for procurement has entered a “Trough of Disillusionment,” where scaling depends on predictable ROI rather than visionary pilots. The pressure has shifted from how fast enterprises can pilot AI to whether they can sustain, govern, and defend it in production.

From frontier access to inference economics

We are moving from AI 1.0, where access to cutting-edge models was the differentiator, to AI 2.0, where inference economics, data gravity, latency, and control decide the outcome. Token prices have fallen roughly tenfold every year since 2021, yet total AI spend at most enterprises has gone up, not down, because more capable models invite more ambitious workflows.

ADVERTISEMENT

Anthropic, OpenAI, and Mistral are now stratifying offerings between flagship reasoners and lower-cost workhorses precisely because customers refuse to pay flagship prices for every task. McKinsey’s 2025 State of AI survey confirms the pattern—adoption is broadening, but scaled impact remains elusive at most organizations. The question CIOs are asking is no longer which model but which workload runs where, at what cost, under whose policy.

The next best action test

Take a familiar enterprise use case: a bank delivering the next best action—the in-app, in-branch, or call-centre recommendation served in milliseconds against a customer’s live context. The best banks we work with proved that personalisation at this layer can lift revenue by 5–15%. A global bank we work with launched an AI assistant that has already resolved more than 1.5 million customer inquiries in its first year.

However, the inference math is unforgiving. A single agentic decision can chain five to 20 model calls, each carrying its own context window. The gap between $0.50 and $3.30 per million input tokens—trivial in a single-turn demo—becomes the difference between a margin-positive feature and one that quietly burns capital across hundreds of millions of customer events.

Recent analyses suggest enterprises running a single top-tier model for every tasks are overspending 40–85% on inference. Decagon, after re-architecting onto an open-source multi-model stack on NVIDIA Blackwell, dropped cost per voice query 6x. The next best action isn’t a marketing decision anymore; it’s a unit economics decision, made one routed token at a time.

Recommended Stories

Sovereignty becomes the strategy

The public cloud versus private AI debate is no longer ideological—it is workload-specific, and geopolitics has entered the equation.

The EU AI Act’s high-risk obligations enter full force in August 2026, carrying fines up to €35 million or 7% of global turnover; France and Germany are tilting national procurement towards Mistral and open-weight sovereign stacks.

ADVERTISEMENT

In Asia, AI regulation is taking shape in markedly different ways. Singapore’s Model AI Governance Framework and IMDA testing tools have become regional templates. In the east, Japan’s AI Promotion Act layers sector rules atop voluntary guidance; South Korea’s AI Basic Act mandates liability insurance for high-risk systems. Meanwhile, India launched its sovereign LLM at the February 2026 AI Impact Summit and is channeling $1.25 billion into the IndiaAI Mission, with the DPDP Act phasing through 2027. China’s state-directed open-source push, Indonesia’s PDP Law, and Australia’s pragmatic sector approach complete a map where no two jurisdictions look alike — and 96% of APAC organisations planned to increase AI investment, mostly via hybrid infrastructure. A single-cloud, single-jurisdiction AI architecture is now a structural liability.

Our hybrid inference work with NVIDIA, and the broader shift toward on-prem AI for regulated workloads, is a direct response.

Fortune 500 India 2025A definitive ranking of India’s largest companies driving economic growth and industry leadership.
RANK
COMPANY NAME
REVENUE
(INR CR)
View Full List >

The real moat is above the model

The hardest lesson of the past 18 months is that model commoditisation does not reduce enterprise complexity but relocates it. Open weights from Mistral or DeepSeek cut experimentation cost, but orchestration, governance, evaluation, and integration burdens move up the stack and sit with the buyer.

The same dynamic is now playing out in physical AI and defense tech. Physical Intelligence, Figure AI, and Skild AI are pushing robot foundation models into factories, fulfillment centres, and homes where latency, sovereignty, and data residency matter more than benchmark scores. Fei-Fei Li’s World Labs is building the spatial intelligence layer—world models that perceive and reason in 3D—that will anchor the next generation of industrial digital twins. Palantir and Anduril have built entire franchises around the assumption that the control plane, not the model, is the durable advantage.

Enterprise leaders should be measuring unit economics per useful task, operational burden per deployed agent, and the ratio of inference spent on the governance scaffolding around it. That ratio is typically 1:5 or worse.

What’s next for banks, telcos, factories

A second architectural shift is arriving: sub-quadratic attention. Approaches from DeepSeek, Google, and Cartesia are collapsing the cost of long-context reasoning by orders of magnitude, with recent benchmarks showing 100x to 300x cost reductions at comparable accuracy.

ADVERTISEMENT

For large banks, this turns whole-portfolio risk modeling, multi-decade fraud pattern detection, and cross-jurisdiction Know-Your-Customer (KYC) into single-pass operations rather than chunked retrieval workarounds.

For telcos, agentic network operations, predictive maintenance, and multi-year customer journey reasoning become economically defensible at scale.

ADVERTISEMENT

For manufacturers, full-plant simulation and supply-chain disruption forecasting shift from periodic batch jobs to continuous reasoning.

The architecture that wins will not be the one with the cheapest token. It will be the one that places compute closest to the data, under the right jurisdiction, with governance that holds. Sustainable, sovereign, controlled—that is the new triad. The enterprises that build for it now will define the next decade.

ADVERTISEMENT

(The author is chief strategy officer, Cloudera. Views are personal)