PUE is dead. Long live tokens-per-watt

May 16, 2026

Category: Sovereignty  |  Read time: ~3 minutes

Latest Posts

PUE is dead. Long live tokens-per-watt

May 16, 2026

Can you trust where your AI support engineer is sitting?

May 16, 2026

Hallucination isn’t a bug to patch — it’s a risk to be managed

May 16, 2026

When your cloud provider decides to negate your sovereignty

May 16, 2026

Power Usage Effectiveness has been the data centre industry’s primary sustainability metric for nearly two decades. Developed by The Green Grid in 2007, it measures how much of the total power consumed by a facility is actually used for IT equipment, with the remainder going to cooling, power distribution, and other overhead.
A PUE of 1.0 is theoretically perfect, with no energy at all needed to manage the IT work estate.
Whilst the best hyperscale facilities claim values in the low 1.1s (of which few are independently confirmed); the industry average continues to stubbornly sit around 1.5.

PUE was undoubtedly a useful metric for its time: it drove meaningful improvements in cooling efficiency, power distribution design, and facility engineering. For traditional data centres running databases, web servers, and general compute workloads, it captured the dominant efficiency variable reasonably well.

For AI inference workloads, it is however almost entirely the wrong measure — and continuing to use it as the primary sustainability metric for AI infrastructure is producing systematically misleading procurement and reporting decisions.

Why PUE doesn’t work for AI inference

The core problem is that PUE measures facility overhead efficiency, not compute efficiency. It tells you how much of your power bill goes to cooling rather than computation, but it tells you nothing about how much useful AI output you get per unit of computation.

Two facilities with identical PUE values — one routing all queries to efficiently sized models using intelligent dispatch, the other routing everything to the largest available frontier model regardless of task complexity — will have radically different actual sustainability profiles; but PUE won’t distinguish between them.

The AI inference workload profile compounds this issue in ways that may not be immediately obvious. Unlike traditional compute, AI inference is characterised by highly variable demand, spiky request patterns, and extreme sensitivity to context window length.

Research published this year established what its authors call the "1/W law":
Tokens-per-watt halves every time the serving context window doubles, because concurrency halves whilst the power draw stays flat.

A facility running long-context requests on a homogeneous GPU fleet will thus have dramatically lower useful output per watt than one using routing topology to match context length to appropriately configured hardware pools — even if both facilities have identical PUE.

PUE also fails to capture the energy cost of data movement, currently a notable consideration whilst AI processing hubs are limited to specific geographies, but users span the globe.
An inference query that originates in London and is processed in Virginia consumes energy across the full transatlantic cable infrastructure, regional interconnects, and multiple routing hops, but none of that transit energy appears in any PUE calculation.

For global cloud AI platforms where inference routing is determined by the provider’s load balancer rather than the customer’s data governance requirements, this invisible transit cost can represent a substantial fraction of the actual energy associated with a given workload.

What the right metrics look like

The primary measure for AI inference efficiency is tokens-per-watt: measuring the useful AI output per unit of energy consumed at the compute layer.

Combined with cost-per-inference, it provides the operational economics picture that PUE alone cannot. Some in the industry are already extending this further to tokens-per-watt-per-dollar, capturing the combined efficiency of energy and cost in a single comparable figure.

For sovereign and edge deployments, where the power envelope is physically constrained, tokens-per-second-per-watt is however the unified metric that matters — capturing throughput, efficiency, and the real-world constraint of fixed power availability simultaneously.

For sustainability reporting, what’s needed alongside hardware efficiency metrics is full transparency on processing location: where inference actually happens, what the energy mix of that facility is, and whether the carbon cost of data transit is included in the disclosure.

For organisations subject to Scope 3 emissions reporting, or preparing for mandatory climate disclosure frameworks that will increasingly capture digital supply chain emissions, the distinction between visible and invisible carbon important.

A provider that cannot answer “where does inference for our workload actually occur, and what is the carbon intensity of that location?” isn’t providing the information needed for accurate sustainability reporting, regardless of their PUE.

The procurement shift that’s coming

The shift from PUE to tokens-per-watt as the primary AI infrastructure sustainability metric is already underway in the research community and among the most sophisticated infrastructure buyers.

It will no doubt become a procurement standard as sustainability reporting requirements mature and in particular as the energy constraints already reshaping the UK data centre market — 50 gigawatts of projects queued for grid connections against 45 gigawatts of peak national demand — force a more honest accounting of what AI infrastructure actually costs to run.

The organisations that get ahead of this shift — specifying tokens-per-watt in procurement requirements, demanding processing location transparency in sustainability disclosures, and building AI infrastructure around efficient routing architecture rather than maximum throughput — will have both a cost advantage, and a reporting advantage as those requirements become standard.

Axiom Edge’s performance benchmarks are published in our tokens-per-second-per-watt measurements under production inference conditions, not an irrelevant theoretical peak throughput.
Processing location is fixed and auditable by design. Energy source is a deployment decision openly made, not a provider driven variable. These are architectural properties of how we build sustainable performant sovereign AI solutions,, not aspirational commitments.

Axiom Edge is a sovereign AI inference and cloud provider. Our infrastructure is benchmarked in tokens-per-second-per-watt, with energy source and processing location fixed at deployment. Learn more at axiom-edge.a

Related Posts

PUE is dead. Long live tokens-per-watt

May 16, 2026

Can you trust where your AI support engineer is sitting?

May 16, 2026

Hallucination isn’t a bug to patch — it’s a risk to be managed

May 16, 2026