Every figure we give has a verified source.
We don't claim efficiency — we calculate it, from component specifications and peer-reviewed independent benchmarks. Every number on this page is traceable to a primary source we'll share on request.
"Average power draw" suggests sustainability, but hides the real power consumption.
Average power is an industry sustainability trick because its unknowable at design time.
Power draw depends on workload mix, ambient temperature, CPU & RAM utilisation, and storage I/O patterns that vary continuously.
For this reason you can only define average power in situ, during live operations.
Quoting an 'average' allows vendors to present a number that will rarely be achieved in production — while the real number you need to determine your facility requirements, your power contract, and your energy bills, is component design maximum power.
We use maximum component-verified draw throughout.
It's the only figure that's independently verifiable, defensible in procurement, and truly honest about what your facility needs to support.
Sources: Component TDP specifications from Qualcomm, AMD, and SoftIron datasheets. Conventional figures from Dell PowerEdge XE9680 and R760 published specifications. Power cost calculated at £0.25/kWh commercial rate, PUE 1.2 (Autonomy) / 1.35 (conventional). Full methodology available on request.
Benchmarked against NVIDIA. Better on efficiency.
Competitive on throughput.
The Qualcomm AI100 Ultra cards we use by default in Autonomy Cloud are programmable AI accelerators — purpose-built for inference workloads.
In 2025, researchers at UC San Diego's National Research Platform benchmarked them directly against NVIDIA A100, H200, and AMD MI300A hardware across 15 open-source LLMs, measuring throughput (tokens per second) and energy efficiency (tokens per second per watt). Results were published at ACM PEARC '25 and are available at arXiv:2507.00418.
We show the data as it is.
The AI100 Ultra wins on energy efficiency across the majority of models, and wins outright on raw throughput for most models up to ~32B parameters.
Only at 70B+ parameters, does the H200 become competitive or pull ahead on throughput — though the AI100 Ultra still matches or beats it on efficiency for some models at that scale.
This pattern is consistent with what you'd expect from a purpose-built inference accelerator versus a general-purpose GPU. Better on most, similar on the rest.
Most importantly, the AI100 Ultra is optimised by design for the energy-constrained, complex deployment environments where Autonomy Cloud operates.
Autonomy Cloud with Qualcomm is specifically built for a life on the AI Edge.
Energy efficiency — tokens per second per watt vs NVIDIA H200
tok/s/W calculated from Table 1, Sada et al. (2025). H200 figures use single-GPU configuration. Qualcomm AI100 Ultra figures use minimum required SoC configuration per model.
Raw throughput — up to ~32B params
AI100 Ultra vs single H200 GPU, tokens per second
70B+ models — throughput & efficiency
A more mixed position...
Llama3.3-70B
H200 5,366 vs AI100 4,528
H200 wins (16.9 vs 13.1)
DeepSeek-70B
AI100 4,528 vs H200 4,333
H200 wins (11.4 vs 10.3)
Llama3.3-90B Vision
AI100 5,961 vs H200 3,556
AI100 wins (13.6 vs 7.5)
What the data shows
Where the AI100 Ultra consistently leads
Energy efficiency (tok/s/W) for models up to ~32B parameters — a factor of 1.2× to 2.6× versus H200.
Raw throughput for the same model range — typically 1.6× to 2.1× more tokens / second / card.
NB: This is the sweet spot for enterprise LLM inference -
coding assistants, RAG pipelines, document analysis, and real-time classification workloads.
For Edge AI:
Optimised models, speed, and power efficiency are everything and the AI100 Ultra on Autonomy Cloud has no Edge AI parallel.
Where H200 is still competitive
Raw throughput for 70B parameter models H200's large memory bandwidth and NVLink scaling give it some advantages for latency sensitive high volume processing.
NB: If your primary business workload is serving a single 70B model at maximum throughput, the H200 may be faster — but the efficiency delta at this scale narrows quickly.
At the cloud core:
H200 can make sense; but for Edge AI the H200 is both undeployable and often inferior.
VM density — like-for-like workload comparison
Autonomy delivers the same VM workload at 6.1× higher VM density per kW than the equivalent Dell R760/VMware estate.
Same workload output. Higher density. Less than one sixth of the power draw per virtual machine.
Source: Sada et al., "Serving LLMs in HPC Clusters: A Comparative Study of Qualcomm Cloud AI 100 Ultra and High-Performance GPUs," ACM PEARC '25, Columbus OH, July 2025. arXiv:2507.00418v1. tok/s/W figures calculated from Tables 1 and 2. VM density figures calculated from component specifications — full methodology available on request.
The full picture.
Every cost included.
The comparison below includes hardware, software licensing, power, facility, and staffing — every cost category that a real deployment incurs. The conventional estate figure uses published VMware VCSP v5.3 list pricing. Nothing is excluded to make either number look better than it is.
The conventional estate's software licensing bill alone exceeds the entire Autonomy platform cost. The hardware CapEx gap closes before year three. From year three onwards, Autonomy is cheaper in every cost category simultaneously — power, facility, licensing, and staffing.
Based on component specifications and VMware VCSP v5.3 published list pricing, March 2026. Staffing based on minimum viable FTE for each platform type.
Full working methodology available on request.
We show our workings.
Ask us to share.
Every figure on this page has a traceable source.
Contact us for the full comparison document —
component specs, pricing basis, assumptions,
and our calculations.