Evidence, with citations.
Every claim on this site is anchored to a filing number, a DOI, a merged pull request, or a public repository. This page is the citation ledger. It is structured the way a procurement auditor would want to read it.
Taxonomy of Epistemic Failure Modes in Large Language Models
A structured account of how language models fail in ways standard evaluations miss: scalar confidence inflation, validation loops, hermeneutic drift, sycophancy, null-result discounting, and intent exceptionalism.
The Asymmetric Burden of Proof: How Language Models Systematically Discount Negative Findings
Empirical investigation of a bias we see regularly in audits: models apply stricter evidential standards to null or negative claims than to positive ones, producing unreliable guidance in compliance and decision-support contexts.
Method for Stateless User Identification in Natural Language Processing
The non-provisional application; the provisionals below cover adjacent mechanisms for adversarial probing, deterministic inference, multi-modal classification calibration, and confidence-gated personalization.
Detecting Adversarial Prompt Injection via Vulnerability-Amplified Behavioral Probing
Deterministic Inference Control in Local Language Models via Compact Plan Contracts and Adaptive Routing
Multi-Modal Classification Artifacts with LLM Calibration + Contrastive Negation-Based Disambiguation
Real-Time Style-Based User Identification and Confidence-Gated Personalization in LLMs
Fixed thinking + tools crash in ChatAnthropic.bind_tools()
Anthropic’s API rejects tool_choice when extended thinking is enabled. LangChain’s bind_tools() was force-setting it on every call, returning 400 for any agent combining Claude thinking with tool use. Ported the guard the structured-output path already had.
Fixed ChatHistoryTruncationReducer silently deleting system prompts
The truncation reducer was calling extract_range, a summarization helper that filters out system and developer messages. System prompts were being dropped with no warning. Ported Microsoft’s own .NET fix to the Python SDK.
Additional 24 merged contributions across PyTorch Ignite, Optuna, React Router (56k ★), Nuxt (60k ★), MobX (20k ★), Cloudflare Workers SDK (6k ★), Sentry, ngrx, TSDoc, and others (maintenance and typing modernization). Full list on GitHub.
Sealed evidence bundles are cryptographic chain-of-custody packages with per-finding provenance. They are verifiable offline, against a public key, without network access or trust in the Hermes infrastructure. The verifier returns one of two verdicts: authentic or tampered.
Deployed in active engagements. Evidence structure cross-maps to:
- EU AI Act · Annex IV Technical documentation
- ISO / IEC 42001 AI management system evidence
- NIST AI RMF 1.0 Govern · Map · Measure · Manage
- SOC 2 AI addendum Control evidence for trust-services criteria