Skip to content

Tools Are the Byproduct: Why Hermes Labs Open-Sources Its AI Infrastructure

We open-source the tools we use internally because the real value is not access to code — it is the engineering to make AI systems reliable and inspectable.

Open-source the tools. Sell the engineering. That’s how we run Hermes Labs.

We open-source every tool we use internally. If we rely on it, it should be public. Not a crippled “community edition.” Not a repo that exists only to funnel people into a subscription. Not “free” until you hit the point where it’s actually useful. And no, we’re not doing opt-in telemetry games either. If you use our tools, we’re not sneaking analytics out of your infra and calling it product insight.

The whole point is simple: tools are cheap. Engineering is not.

You can see the public work here: github.com/hermes-labs-ai

And yes, there’s a lot of it now.

The flagships in the reliability stack: hermes-rubric (evidence-first LLM scoring with κ=0.629 inter-rater agreement), fidelis (zero-LLM agent memory with retrieval fidelity), hermes-blind (multi-turn drift recovery), hermeneutic (overclaim gate for AI), lintlang (static analysis for agent configs), claude-router (no-LLM scaffold-aware routing for the Claude API).

The long tail covers what you’d expect from a working lab. Hooks, routers, utilities, gates, linters, harnesses, evals, audit tools. Every one an internal piece we’ve needed to ship something we trust. Some are tiny. Some are pretty opinionated. Some exist because we got tired of watching the same failure mode happen for the fifth time and wanted a clean way to catch it.

But shipping the repo is the easy part.

The hard part is integrating those pieces into infrastructure teams actually depend on. The hard part is making them survive contact with auth, queues, secrets, bad data, vague ownership, model drift, provider weirdness, procurement rules, internal governance, audit requirements, and the very normal fact that nobody wants a new “AI platform” dropped into their stack like a glitter bomb.

That’s the work people should pay us for.

Not the pieces. Not the wrappers. Not access to some gated hosted version of a thing you could run yourself in twenty minutes. Pay us to make it fit your environment. Pay us to make it reliable. Pay us to connect it to systems that matter. Pay us to be around when something breaks in production and the issue is not theoretical anymore.

Because that’s where the real cost is.

Anyone can publish a router. The question is whether that router behaves correctly when model pricing changes, a provider silently degrades, your fallback path starts looping, or legal suddenly says a class of prompts needs a different retention policy. Anyone can publish a benchmark. The question is whether you can trust the eval enough to use it in release decisions. Anyone can publish lint rules. The question is whether those rules map to how your org actually ships, or whether they just create noise until everyone ignores them.

A lot of “AI products” right now are just pretty UIs wired to an inaccurate stochastic parrot via API. That’s why so many of them feel flimsy the second they touch a real company. The demo and UI is the easy part. The integration and reliability burden got pushed onto the buyer, then renamed onboarding or iteration.

I don’t like that model. I think it’s lazy.

If a tool is useful, you should have it. You should be able to inspect it, fork it, run it locally, pin versions, patch it, and decide for yourself how much you trust it. We’re not interested in trapping value inside closed boxes and then charging rent on access or tell you “trust me bro, the AI knows.” We’d rather make the boxes good, give them away, and focus our effort where effort actually compounds: architecture, integration, operations, rollout, measurement, debugging, and judgment.

That last one matters more than people admit. There is always a moment where the tool isn’t enough. Something weird is happening, the traces don’t line up, the team is blocked, and now you need someone who has seen this class of mess before and can just say, no, that’s not right because X. Here’s how we fix it. That is the service. That is the value. Not the zip file.

So yes, we open-source everything we use internally. We’ll keep doing that.

If all you need is the tool, great. Take it. Use it. Break it. Improve it.

If you need the system around it, the part that has to work, keep working, and make sense inside an organization that cannot afford surprises, that’s where we come in.

That’s what Hermes Labs is for.

And that’s why the tools are the byproduct.

Roli Bosch is the founder of Hermes Labs, where he builds epistemic engineering infrastructure for production AI: reliability tools, drift detection, and evidence-first scoring. Follow on [X](https://x.com/rolibosch) or [GitHub](https://github.com/hermes-labs-ai). Academic publications under his full legal name, Rolando Bosch Rodriguez.