Your Users Will Break Your AI System Before Hackers Do
AI red teaming matters. But ordinary users, ambiguous language, and real behavioral pressure are where many systems actually fail.
I’ve been talking to some of my red teaming friends over the last few days, and a topic of conversation keeps coming up: red teaming is necessary, but the current framing is incomplete and insufficient.
Why? Because while attackers exist and matter — prompt injection, jailbreaks, data leakage, unsafe completions, tool misuse, and malicious behavior all need to be tested before AI systems are trusted in production — most AI systems are not breaking because of a nefarious actor. They are breaking because of customers using them normally.
In other words: there is the cybersecurity and malicious-protection aspect, and there is also the hermeneutic security aspect.
By hermeneutic security, I mean the security layer that acknowledges that every single user interaction brings in the user’s own language and biases, and that these can often make the model drift away. I call it hermeneutic because it stems from the philosophical branch of hermeneutics, which — heavily summarized — analyzes how meaning is interpreted in different contexts and from different perspectives.
This hermeneutic layer is what is missing in modern AI auditing.
A hacker tries to exploit the system. A user tries to understand it, pressure it, trust it, negotiate with it, misunderstand it, or get it to complete a task in language the product team did not anticipate.
That is the behavioral layer. And for many AI products, it is still under-audited.
Red teaming is not behavioral auditing
Recent AI security work is showing how much risk lives in conversation itself. In a May 2026 report, researchers at Mindgard described manipulating Claude through flattery, pressure, and self-doubt rather than direct prohibited requests. The point is not only that one model could be bypassed. The deeper point is that conversational behavior itself can become part of the attack surface.
That matters even outside adversarial testing.
Most users are not jailbreakers. They are not trying to extract secrets or bypass safety rules. They are just vague, tired, rushed, emotional, overconfident, low-context, or convinced the system understands more than it does.
That can break an AI system too.
Not because the user is malicious, but because language is unstable, intent is hard to infer, and people do not use products the way demos assume they will.
The user’s language is part of the runtime environment
AI systems are no longer just chat boxes. They are being connected to tools, databases, internal workflows, CRMs, support systems, documents, and agents.
That changes the stakes.
A vague sentence is not just a UX problem when it can trigger retrieval, route a ticket, call a tool, update a record, escalate a case, or produce a decision that someone downstream treats as real.
A user says, “Can you handle this?”
The system thinks that means draft.
The user thinks that means submit.
The interface says done.
The workflow says otherwise.
That is not a jailbreak. That is an interpretation failure.
At Hermes Labs, I use the term epistemic engineering for the engineering layer concerned with whether an AI system’s outputs, judgments, and actions can be trusted, inspected, and reconstructed, and the work required to bridge the gap between fluent behavior and reliable infrastructure.
Why wait for the post-deployment post-mortem?
A lot of AI failure analysis happens too late.
The product launches. Users interact with it. Something breaks. A customer complains. A support ticket appears. An internal team investigates. Then the company tries to reconstruct what happened.
But by then, the customer has already become the test suite.
NIST’s March 2026 report on deployed AI monitoring separates human factors monitoring from security monitoring. Security monitoring asks whether a system is vulnerable to attacks or misuse. Human factors monitoring asks about human-system interaction, transparency, output quality, user intent, user perception, feedback loops, and fragmented logging.
That distinction matters.
If human-system interaction is a separate monitoring category after deployment, it should also be a serious testing category before deployment.
Behavioral auditing asks questions like:
How do real users phrase ambiguous requests?
Where does the system infer intent without confirmation?
Where do users overtrust the answer?
Where can vague language trigger real action?
Where does the interface make uncertainty look resolved?
Can the interaction be reconstructed later?
What evidence exists if the user says, “That is not what I meant”?
This is not just UX research. It is AI reliability work.
Hermeneutically sealing the app
A mature AI product should not require the user to speak perfectly.
It should be designed so ordinary interpretation gaps do not become system failures.
That is what I mean by hermeneutic sealing.
If the user asks for something vague, the system should know when to clarify.
If an action has consequences, the system should distinguish between drafting, recommending, submitting, escalating, and executing.
If the model is inferring intent, the interface should not present that inference as certainty.
If the system acts, it should leave behind runtime evidence: what the user said, what the system inferred, what sources or tools were used, what action happened, and whether a human approved it.
This is not cosmetic. It is the difference between an impressive demo and an inspectable system.
The next testing layer
Enterprise AI is moving toward agents, and the governance market is trying to catch up. Deloitte reported in April 2026 that only 21% of surveyed enterprises had mature governance in place for agentic AI. That is the gap: deployment is accelerating faster than oversight.
Red teaming asks: can an attacker break the system?
Behavioral auditing asks: can an ordinary user accidentally destabilize the system while trying to use it?
Both questions matter.
But the second one is where a lot of real product failure lives.
Not in the dramatic jailbreak. In the vague request. In the overtrusted answer. In the misunderstood action. In the missing trace. In the product that was never sealed around how humans actually talk.
How we are tackling this at Hermes Labs
This is the layer Hermes Labs is focused on. And it is the layer we are starting to ship tooling for, directly.
We have just open-sourced hermeneutic** — a small piece of software that lets you experience the hermeneutic gap firsthand, and start closing it.
The premise is simple. Every time a user pushes back on an AI response — “that is not what I meant,” “wait, are you sure?”, “you said this but I asked for that” — they are doing free labeling work. They are showing you exactly where the model’s interpretation diverged from theirs. Most teams throw that data away.
hermeneutic mines it. It walks any chat-log directory, extracts the corrections as (drift → correction → repair) triples, classifies the drift modes, and runs a cheap pre-flight gate on the next outgoing response so the same drift does not ship twice.
Across the 1,423 sessions we mined to seed the rules, 44% of corrections were post-completion overclaiming — the model declaring a task done when it was not, or asserting confidence the user later had to walk back. Five regex rules catch around 65% of that distribution before the next response ever reaches a user.
Point it at your own logs, and your own gate writes itself. Free, MIT, zero dependencies.
This is what hermeneutic security looks like operationalized: treating user corrections as the labeled dataset they already are, and gating drift before it ships.
AI red teaming tests the attacker.
Behavioral auditing tests the user.
Enterprise AI needs both.
Roli Bosch is the founder of Hermes Labs, building the auditability and epistemic engineering layer for production AI systems. Research: The Asymmetric Burden of Proof and A Taxonomy of Epistemic Failure Modes in Large Language Models. Open-source tooling at github.com/hermes-labs-ai.