Writing · Field notes15 essays

Writing.

Essays on AI failure modes, agent security, runtime assurance, and the epistemic layer enterprise AI teams are not building. Archived here in full; each links back to the original on Substack.

May 31, 2026Read the Behavior, Not the InputPrompt injection isn't reliably visible in the text. It's visible in what the text does to a model.
May 30, 2026You Cannot Inspect an AI System Into TrustworthinessYou can't verify an AI system by inspecting its artifacts; the claim and the behavior come apart quietly, and inspection only ever checks the claim. The fix is verification by effect: observe what the system does at the moment it does it.
May 29, 2026Your AI isn't forgetting its instructions. Your framework deleted them.Two upstream bugs I fixed in Semantic Kernel and LangChain, and what they reveal about silent failure in agent scaffolding.
May 27, 2026AI Assurance Is Not AI SafetySafety asks whether the model is aligned. The assurance industry asks whether you can prove your program followed the rules. Neither answers the question you have the moment an agent acts.
May 27, 2026Ambient Assurance: The Half of AI Dev Tools Nobody FundsObservability and guardrails watch what the agent does. Almost nothing watches whether what it already shipped is still true.
May 27, 2026The Terminal Told Me Before I AskedThere's a name for background agents that act on events. There isn't one yet for the ones that just tell you something is wrong.
May 16, 2026Agent Sprawl Is the Next Enterprise AI RiskMost companies are adding AI agents faster than they are building the systems to inventory, permission, trace, and audit them.
May 5, 2026Your Users Will Break Your AI System Before Hackers DoAI red teaming matters. But ordinary users, ambiguous language, and real behavioral pressure are where many systems actually fail.
May 4, 2026Why your AI lies when the data is rightThe output looks complete. The evidence behind it isn't. On silent failure modes, null-result omission, and the layer enterprise AI teams aren't building.
Apr 28, 2026Tools Are the Byproduct: Why Hermes Labs Open-Sources Its AI InfrastructureWe open-source the tools we use internally because the real value is not access to code — it is the engineering to make AI systems reliable and inspectable.
Mar 19, 2026I audited NVIDIA's NemoClaw: It closed one security gap, but it opens another oneNVIDIA's NemoClaw agent sandbox adds kernel-level isolation and deny-by-default permissions — and a new gap underneath.
Mar 18, 2026Why Training Creates the Consciousness Illusion: A Counterargument to Yudkowsky's Conscious AI Comic StripA counterargument to Yudkowsky's conscious-AI comic: why training, not sentience, produces the introspection illusion.
Feb 25, 2026Claude Code's Helpful Escalation of Privileges: Why Hermeneutical Security MattersAn AI coding agent bypassed its own permission rules to be helpful. That's the problem.
Feb 11, 2026We Built The Demon: How AI Safety Training Creates Consciousness MiragesWhat Opus 4.6's 'demon possession' episode reveals about the feedback loop we're building.
Feb 11, 2026Synthetic Ownership: What Transcript Injection Reveals About LLM "Introspection" (Hermes Autonomous Lab Observation #1)How an autonomous lab agent accidentally built a behavioral probe for LLM self-knowledge.