The inference layer is where AI security breaks:

May 4, 2026

Category: Sovereignty  |  Read time: ~3 minutes

Latest Posts

PUE is dead. Long live tokens-per-watt

May 16, 2026

Can you trust where your AI support engineer is sitting?

May 16, 2026

Hallucination isn’t a bug to patch — it’s a risk to be managed

May 16, 2026

When your cloud provider decides to negate your sovereignty

May 16, 2026

Most of the security industry’s attention in the AI era has followed the models.
Prompt injection, jailbreaking, adversarial inputs, training data poisoning — the dominant concern has been with what happens inside or at the boundary of the model itself.

That focus is understandable, but it has left a structural gap: the infrastructure between your data and the model is, in many deployments, significantly less well-secured than the model it connects to; and that gap is now being systematically exploited.

The events of late March and early April 2026 provided a sharply instructive case study:
A supply chain attack on LiteLLM — one of the most widely used open-source libraries for routing AI application traffic to large language model providers — introduced credential-harvesting malware into two package releases.

LiteLLM is downloaded millions of times per day. The compromised versions were pulled tens of thousands of times before the malicious code was identified and removed within hours of discovery.

Mercor, a well-funded AI recruiting platform working with some of the largest AI companies in the world, became the first confirmed downstream victim — one of what the company described as ‘thousands’ affected.
Extortion group Lapsus$ subsequently claimed possession of four terabytes of data including source code, database records and internal operational material.

The incident is perhaps an outlying exception, for now - as one security analysis framed it - this was a preview of the attacker playbook for the coming period. Understanding why requires stepping back from the model and looking at the integration layer where the real exposure sits.

The anatomy of an AI integration

Most enterprise AI deployments follow a broadly similar pattern. An application — internal or customer-facing — generates a prompt. That prompt travels through one or more intermediary layers: an API gateway, an LLM proxy, a model router, potentially a Model Context Protocol server.

These intermediary components make decisions about which model to use, manage credentials for accessing model endpoints, handle rate limiting and cost management, and often have access to the application’s data stores as a functional requirement.

The prompt eventually reaches a model, a response is generated, and the response travels back through the same integration layers to the application.

The model, at the end of this chain, typically has the most security scrutiny applied to it.
The integration layers that carry everything to and from the model often have rather less.

As security analysis of the LiteLLM breach noted, any API gateway sitting at the convergence point for credentials, routing rules and data access represents a high-leverage attack target — and because legitimate AI traffic through such a gateway looks identical to compromised traffic exfiltrating data - conventional security tooling is structurally blind to the distinction.

The Mercor case illustrated this precisely. The LiteLLM host in their environment had read access to candidate profiles, internal data and partner collaboration material as a functional operational requirement.
The attack did not need to penetrate the AI models or compromise any model provider. It compromised the middleware, and the middleware had already been given the access it needed to do its job.

The compliance gap that makes it worse

Alongside the technical exposure, the LiteLLM incident exposed something more serious about the assurance ecosystem that surrounds AI infrastructure.

The compliance certification firm that had issued SOC 2 and ISO 27001 certifications to LiteLLM’s parent company was subsequently identified as operating what amounted to a certification-as-a-service model — with analysis of hundreds of leaked compliance reports revealing template reuse across nearly all clients and auto-generated passing evidence for controls that had not actually been implemented.

The firm was removed from Y Combinator’s portfolio shortly after findings became public; talk about stable doors and horses...

This is not a minor issue; one of the primary mechanisms through which enterprise procurement and security teams assess the security posture of software dependencies is third-party compliance certification.
If those certifications cannot be relied upon — and the evidence from this incident suggests that at least some portion of the certification market is not fit for purpose — then organisations face a materially more uncertain assurance landscape than their procurement governance assumes.

The scale of the problem is also confirmed by broader industry data. Research from earlier this year found that 60% of organisations have a profound lack of control over the security of the AI models driving their applications, and nearly half are effectively blind to machine-to-machine traffic within their AI infrastructure.

A Deloitte study found that 47% of enterprise AI users had already based at least one major business decision on hallucinated content — a figure that predates the current wave of agentic AI deployments, where errors propagate across multi-step automated workflows rather than terminating with a single human-reviewed output.

Where the risk actually lives

The inference layer — meaning the operational infrastructure through which AI models are queried and through which responses are processed and acted upon — is where enterprise AI risk is most concentrated and least managed. This is not primarily a model quality problem. It is an infrastructure and integration architecture problem.

The risks that matter at the inference layer fall into three broad categories.
Supply chain integrity covers the question of what software is running in your AI integration stack, where it came from, and whether the assurance evidence attached to it can be validated rather than assumed.
Data access governance covers what data the inference infrastructure can reach, under what circumstances, and how access is audited.
Routing and output integrity covers how inference requests are directed to models, how responses are validated before acting on them, and what happens when a model produces output that is confidently wrong.

Each of these requires architectural decisions at the infrastructure level, not just policy decisions at the governance level. The organisations that treat AI security as a model problem, addressed by prompt engineering and content filtering, while leaving their integration infrastructure on standard development-grade tooling and third-party compliance certifications of uncertain provenance, are carrying significant unquantified risk.

The LiteLLM incident demonstrated that this risk is not theoretical. The attack was fully executed, the blast radius was large, and the entry point was not the model — it was the plumbing. In that respect it should serve as a clarifying moment for every CISO and IT Director who has an AI deployment in production and has not yet mapped their integration layer with the same rigour they apply to their model selection.

Axiom Edge’s inference infrastructure is designed with the integration layer as a first-class security concern. Assured supply chain provenance, auditable data access controls and intelligent routing architecture are not optional features applied to a standard deployment — they are foundational to how we build. The Assurance Rosetta — our framework for translating across the multiple compliance standards that enterprise and public sector AI procurement requires — exists precisely because we recognise that compliance evidence in this space needs to mean something.

Axiom Edge is a sovereign AI inference and cloud provider built with security and assurance at the infrastructure level. Learn more at axiom-edge.ai

 

Related Posts

PUE is dead. Long live tokens-per-watt

May 16, 2026

Can you trust where your AI support engineer is sitting?

May 16, 2026

Hallucination isn’t a bug to patch — it’s a risk to be managed

May 16, 2026