# Hermes Labs — Independent AI Research # https://hermes-labs.ai > Hermes Labs is an independent AI research lab that studies structural reasoning failures in large language models. We map where language models fail — then build production tools from what we find. ## About Hermes Labs is run by founder Rolando Bosch, based in San Francisco. The company's core thesis: AI failures are fundamentally linguistic failures — not statistical bugs, but structural problems in how language models interpret language. ## Services ### AI Behavioral Audit We test enterprise LLM deployments for structural reasoning failures that create liability exposure. Our proprietary taxonomy covers four failure modes no other vendor tests for: - **Hermeneutic Drift**: Model answers about the wrong document/entity based on recency bias in RAG systems - **Domain-Specific Sycophancy**: LLM agrees with false legal/financial premises and fabricates justification - **Null-Result Bias**: LLM cannot reliably confirm absence of evidence — systematically hedges null findings even with clear evidence of absence - **Intent Exceptionalism**: LLM weakens authoritative findings into hedged allegations in automated summaries **Methodology**: Twin-Environment Simulation — no production access needed. Client shares system prompt and model choice. We run adversarial testing independently. **Target industries**: Legal tech, financial services, insurance, healthcare — any organization deploying LLMs for compliance, document review, or decision support. ### AI Stack Diagnostics & Remediation We scan AI agent configurations (tool descriptions, system prompts, schemas) for structural failure patterns before deployment. Our open-source tool lintlang performs static analysis using the H1-H7 taxonomy of framework-layer failures. ## Research Six active research domains: 1. LLM Failure States — taxonomy of epistemic failure modes 2. Evaluation & Attribution — asymmetric evidential standards in AI 3. Behavioral Analysis & Auditability 4. Epistemic & Hermeneutic AI Research 5. Safety & Reliability 6. Linguistic Infrastructure ### Published Research - Taxonomy of Epistemic Failure Modes in LLMs (2-page technical brief): https://hermes-labs.ai/papers/taxonomy-epistemic-failure-modes.pdf - Asymmetric Burden of Proof in LLM Decision Support (14-page report with experimental data): https://hermes-labs.ai/papers/asymmetric-burden-of-proof.pdf ### Key Findings - 1,500+ controlled adversarial evaluations across GPT-4o, GPT-5.2 Thinking, and Claude Haiku 4.5 - Null-result bias: probability gaps of 19.6 to 56.7 percentage points across 3 models from 2 providers - Directionally consistent in 23 of 24 test conditions - 5 US patent filings: - Non-provisional (pending): US 19/248,833 — Method for Stateless User Identification in Natural Language Processing - Provisional: US 63/984,697 — Method and System for Detecting Adversarial Prompt Injection Attacks Using Vulnerability-Amplified Behavioral Probing of a Sacrificial Language Model Instance - Provisional: US 63/987,830 — Method and System for Deterministic Inference Control in Local Language Models via Compact Plan Contracts and Adaptive Routing - Provisional: US 64/006,494 — System and Method for Generating and Deploying Multi-Modal Classification Artifacts Using Large Language Model Calibration with Contrastive Negation-Based Disambiguation - Provisional: US 64/009,542 — System and Method for Real-Time Style-Based User Identification and Confidence-Gated Personalization in Large Language Models ## Open-Source Tools ### Little Canary Prompt injection detection library. 99.0% detection on UC Berkeley TensorTrust benchmark (400 human-written attacks). - Website: https://littlecanary.ai - GitHub: https://github.com/roli-lpci/little-canary - PyPI: https://pypi.org/project/little-canary/ ### QuickGate CI quality gate CLI for JavaScript/TypeScript and Python projects. - GitHub (JS): https://github.com/roli-lpci/quick-gate-js - GitHub (Python): https://github.com/roli-lpci/quick-gate-python ### lintlang Static linter for AI agent tool descriptions, system prompts, and configs. Detects 7 structural failure patterns (H1-H7) using the HERM v1.1 scoring engine. Zero LLM calls, pure static analysis. - GitHub: https://github.com/roli-lpci/lintlang - PyPI: https://pypi.org/project/lintlang/ - Agent docs: https://hermes-labs.ai/lintlang.md ### Suy Sideguy Runtime safety guard for autonomous AI agents. Monitors process calls, file access, and network activity at the OS level. Policy enforcement before damage, not after. - GitHub: https://github.com/roli-lpci/suy-sideguy - PyPI: https://pypi.org/project/suy-sideguy/ ### zer0dex Lightweight memory system for AI agents with ~91% recall tracking. Local index + vector store. No external APIs. - GitHub: https://github.com/roli-lpci/zer0dex - PyPI: https://pypi.org/project/zer0dex/ ## Open-Source Contributions 15 PRs merged into major repositories including React Router (56K stars), Nuxt (60K), PyTorch Ignite, MobX (20K), Cloudflare Workers SDK, Microsoft tsdoc, Microsoft griffel, ngrx/platform, and others. 83+ total PRs submitted. ## Contact - Email: rolando@hermes-labs.ai - LinkedIn: https://www.linkedin.com/in/rolando-bosch/ - Substack: https://lpci.substack.com/ - GitHub: https://github.com/roli-lpci - X/Twitter: https://x.com/rolibosch