Skip to content

AI Assurance Is Not AI Safety

Safety asks whether the model is aligned. The assurance industry asks whether you can prove your program followed the rules. Neither answers the question you have the moment an agent acts.

Two terms get used as if they were the same thing, and they are not. The gap between them is worth clearing up, because that gap is where the next layer of AI infrastructure has to be built, and most of it doesn't exist yet.

AI safety, in 2026, mostly means alignment. Red-teaming, model-level evaluations, the study of whether a model will deceive, scheme, or pursue goals its operators didn't intend. This is slow, important work, and it happens largely at the frontier labs that train the models. It is a property of the model. It asks one question: is this system, in general, disposed to cause harm?

AI assurance is a different discipline — and, contrary to how the startup world sometimes talks, it is not missing. It is large and growing. The UK government's DSIT counts 524 firms supplying AI assurance services, 84 of them specialised. There are standards bodies — ISO, IEC, IEEE, ETSI — conformity-assessment regimes, third-party algorithmic audits, and model-risk validation practices. The EU AI Act runs on it. Assurance, as this industry practices it, is broader than safety: it verifies fairness, robustness, privacy, interpretability, and compliance, not only harm. DSIT and ISACA both define it as the work of measuring, evaluating, and evidencing that an AI system meets a standard.

So assurance is not safety, and the assurance world already knows this. The interesting question is not whether the two differ. It is what kind of assurance the AI industry has actually built — and what kind it hasn't.

The word has an older meaning than the audit market uses

In aerospace, software assurance is a named engineering discipline, distinct from software safety, with its own standards. NASA defines it as the planned, systematic activities that ensure software and its lifecycle conform to requirements — quality, reliability, verification and validation — woven through the work as it happens. Safety is one strand inside it. Assurance is the umbrella. And in this model assurance is engineering: it runs alongside the build, continuously, as part of how the system is made and operated.

That is not what the AI assurance market sells. What it sells is audit.

Audit is the wrong shape for a runtime question

Today's AI assurance is third-party, periodic, and document-shaped. An external assessor examines your system against a framework, on a cadence — quarterly, annually, at procurement — and issues evidence that you conformed. Even the newer continuous-auditing standards, like ETSI's TS 104 008, are built for external auditors to pull evidence through a secure interface. The shape is consistent: someone outside your team, at intervals, proving to someone else that your AI program meets a bar.

This is real and necessary. It is also structurally incapable of answering the question most builders actually have.

The question audit cannot answer is this: did this agent do what it claimed, this afternoon, for this customer? Was an operational gate bypassed on this run? Were the inputs adversarial? Has the audit chain been altered since? A quarterly third-party assessment does not see this. A questionnaire does not see this — periodic assessment was never designed to capture runtime behavior, and the audit industry says as much itself. The model can pass every alignment eval, your organisation can pass every conformity audit, and a specific agent can still quietly do the wrong thing on a Tuesday, with nothing in the current assurance stack watching when it happens.

The layer that isn't built

The missing piece is assurance in the older sense, applied to agents: assurance as engineering rather than assurance as audit. Runtime, not periodic. Built by the team shipping the system, not pulled by an assessor outside it. Continuous and local, woven into the workflow — a gate that fires when the agent acts, a check that verifies the agent's report matches what actually happened, a tamper-evident record written as the work runs rather than reconstructed at audit time.

Concretely: an agent is asked to refund one customer, and it reports back that it issued a single $40 refund. The engineering-assurance question is whether the actions it actually took match the report — one refund, for that amount, to that account — checked against what the agent did rather than what it said, and recorded the moment it happens. If it issued two refunds, credited the wrong account, or reported a refund it never made, that is caught on the run, not at the next quarterly review. No alignment eval and no conformity questionnaire would have been watching that afternoon.

Very little is built for this. The tools that run at runtime today are shaped for something else: observability traces what happened so you can reason backwards, and runtime-security tools intercept and block an action before it executes. Neither one proves that an action that was allowed to proceed was the correct one. That proof — did the agent do what it said, and can you show it from evidence captured in the moment — is its own discipline, and it barely has vendors. It is the broader category that ambient assurance sits inside.

Why this opens now

It opens because agents now take actions. When AI produced text you read before acting, "is the model aligned" and "did our program pass audit" covered most of the risk. When an agent modifies a billing record, files a pull request, or sends a customer an email on its own, a new question opens up between the model and the audit, and it lives at runtime. The EU AI Act places the responsibility exactly here: for many high-risk systems, internal self-assessment — not third-party review — is the default conformity path. The builder is on the hook. But the builder has been handed audit-shaped tools for a runtime problem.

Three things, two words

There are three things routinely collapsed into two words. AI safety: is the model disposed to cause harm. AI assurance as audit: can you prove, periodically and from the outside, that your program met a standard. And AI assurance as engineering: did this agent do the right thing, on this run, and can you show it from evidence captured while it happened. The first has the frontier labs. The second has a 500-firm industry. The third — the runtime, builder-side, engineering layer — is the one worth building, precisely because it does not look like the audit business the assurance market grew up as.

Assurance is not a certificate someone hands you once a year. In every serious engineering field, it is something the system does, continuously, while it works. AI is the field that hasn't built that part yet.

Roli Bosch is the founder of Hermes Labs, where we build the auditability and epistemic-engineering layer for production AI systems. Engineering assurance — the runtime, builder-side discipline this post argues for — is what we're building, as open infrastructure rather than periodic audit. The drift it catches is documented in our preprint, A Taxonomy of Epistemic Failure Modes in Large Language Models (DOI: 10.5281/zenodo.19042469).