Why I Built Holo (AI Context Layer) and Open-Sourced It

What I learned generalizing an internal tool into open infrastructure

Jun 09, 2026

I just recently open-sourced Holo. It’s a self-hostable shared context layer for AI agents, and it’s the kind of project I would have loved to find six months ago.

This article is partly the story of why I built it, partly the engineering decisions that shaped it, and partly a reflection on what I learned about open-sourcing infrastructure that started as an internal tool. If you’re a technical founder shipping AI agents into production right now, the problem we hit will probably sound familiar.

The pattern I kept seeing

At Kombo, the YC company where I run the engineering team, we internally ship AI agents. Several of them. A Slack support bot. An interview-prep agent. A customer-success assistant that drafts replies. A compliance questionnaire responder.

Each one started as a small project. And each one ended up reimplementing the same thing: a custom connector stack, a bespoke retriever, a tangle of prompts about how to handle Slack vs. GitHub vs. Notion. Three agents meant three Linear integrations, three GitHub integrations, three different ACL stories, three drift trajectories.

That’s the part nobody warns you about when you start building agents. The hard part isn’t the agent loop. The hard part is that every agent needs the same context, just at slightly different times, in slightly different shapes, with slightly different filters.

So we built an internal version of what would become Holo. A GitHub repo that mirrored the company’s connected data, with Cursor as the agent driving the loop. The sales and customer solutions teams have been using it daily for months via Slack. Deal prep, support replies, compliance questionnaires, all routed through one backend that owns the context once.

After a few months of seeing that thing actually work, I kept asking the same question: why doesn’t this exist as open infrastructure? Every team shipping multiple agents is hitting this wall. The market has Onyx for enterprise search, Dust for chat assistants, half a dozen RAG-as-a-service products. But the shared context layer that sits underneath all your agents and ingests once on their behalf? Nobody was building that for self-hosters.

So we started rebuilding it in the open. From scratch. Generalized.

What Holo actually is

Holo is one MCP server (with a parallel REST surface) that:

Ingests from 20 connectors today, including GitHub, GitLab, Slack, Notion, Linear, Jira, Confluence, Google Drive, HubSpot, Salesforce, Pylon, Zendesk, Grain, and the usual suspects.
Indexes everything into a single ACL-aware Postgres store with pgvector. No separate vector DB to operate.
Exposes two primitives to any MCP-compatible agent: search for hybrid retrieval, and bash over a read-only virtual filesystem of every synced artifact.
Logs every call. Who asked, what tools fired, what context grounded the answer.
Runs on docker compose up -d. Multi-tenant orgs from day one.

The pitch is simple: connect once, serve every agent. Same chunks, same ACL, same audit trail feeding Claude, Cursor, your in-house agent, all through one backend.

But the part I want to talk about isn’t the feature list. It’s the design bets we made along the way, because those are the parts that I think generalize to anyone building agent infrastructure right now.

Bet #1: Bash and a virtual filesystem, not 20 typed MCP tools

When we started, the obvious thing to do was give each connector its own typed MCP tool. get_pr, get_thread, get_doc, get_call, get_ticket. Every source got a getter with bespoke arguments (owner+repo+number for GitHub, channel+timestamp for Slack, recording_id for Grain) and a bespoke return shape.

I shipped that. It worked. And then it didn’t.

Each new connector meant adding a tool, which meant teaching every agent prompt about it. Twenty connectors quickly turned into twenty tools the model had to remember, each with its own argument shape, each with its own quirks. Prompts got longer. Agents got slower. The maintenance burden compounded.

We replaced the whole tool zoo with a single bash sandbox over a deterministic virtual filesystem. Every synced artifact lives at a path you can guess:

Slack threads at /slack/#<channel>/<date>/thread-<ts>.md
GitHub PRs at /github/<owner>/<repo>/pulls/<n>.md
Pylon tickets at /pylon/tickets/<id>.md
Notion pages at /notion/<workspace>/<page>.md

Agents get ls, cat, grep, find, head, tail, wc, sort, uniq, tree, and echo. That’s it. No eval, no network, no shell escape.

Our bet is that frontier models already know how to drive a filesystem. They’ve been trained on millions of lines of shell scripts. They don’t need prompt engineering to figure out that grep -r "stripe" /github/kombo/api/pulls/ | head is a reasonable thing to try. They just do it.

We measured this on our internal Slack bot. The old path (search, then read N chunks, then call an LLM to summarize) took 38.6 seconds end-to-end. The new path (the agent navigates the filesystem directly) took 9.7 seconds. Roughly 4x faster, and the answers were more grounded because the model could pull the exact bytes it wanted instead of getting back a soup of chunks.

Deeper lesson here: when you’re designing tools for LLMs, prefer surfaces the model already understands. Don’t invent a new vocabulary. Use the one it grew up with.

Bet #2: Postgres-only, no separate vector DB

Every architecture guide for AI infrastructure right now will tell you to run a dedicated vector database. Pinecone, Weaviate, Qdrant, Chroma. Pick one.

We went the other way. Holo runs hybrid search inside a single Postgres instance with pgvector for embeddings and pg_trgm + tsvector for keyword search. The two are fused in one SQL CTE with reciprocal rank fusion.

Operational simplicity is the reason. Most self-hosted teams I know are running one Postgres anyway. Adding a second specialized database means a second backup strategy, a second auth boundary, a second monitoring story, and a second thing that can fail in a way you don’t know how to debug at 2am. For the scale most teams are at, pgvector is fast enough, and the ergonomics of doing retrieval in SQL alongside your ACL joins is huge.

If we hit a wall later, we can swap the vector tier. But starting with two databases when one would do is the kind of premature complexity that costs you six months of yak-shaving you don’t have.

Bet #3: Audit everything, replay nothing magical

Every Holo call gets logged. Who asked, what tools fired, what files the agent read, what context grounded the final answer. The dashboard has a replay view that shows the recorded query and result diff.

This wasn’t on the original roadmap. It got added the first time someone in customer success asked me “why did the agent say that?” and I didn’t have a clean answer. The agent had pulled the right docs, but the prompt path was opaque, and reconstructing it from logs across three systems took longer than just rewriting the response by hand.

If you’re building production agents, you need observability that’s structurally different from what you have for your web app. Stack traces don’t help. Latency graphs don’t help. You need a record of which pieces of context flowed into which decisions. Build that on day one or you’ll wish you had.

One honest caveat: replay shows the recorded query and the recorded result. It’s not live re-execution against current state. That’s a deliberate trade-off, not an oversight. Live replay sounds clean in a demo and is a nightmare in practice once your underlying data has shifted.

Why open-source, and why AGPL

The licensing question got more attention from me than I expected. I had three options on the table: MIT, AGPL, and a fair-source license like BSL.

I picked AGPL-3.0 for the Community Edition and a separate commercial license for the Enterprise Edition. Same model Cal.com, Plausible, and Mattermost use. Self-host inside your company freely. If you want to wrap Holo in a hosted commercial product, talk to me.

The reasoning is straightforward. Holo is infrastructure. Infrastructure gets adopted when teams can read the source, fork it, and run it in their own environment without asking permission. MIT would have made that easier in theory, but it also would have let any well-funded company resell Holo as a hosted product without contributing back. AGPL doesn’t prevent that, but it makes the obligation explicit: if you offer Holo as a network service to third parties, you publish your modifications.

That’s the deal that keeps community-funded infrastructure projects alive.

What I learned shipping this

A few things stand out from the past few weeks.

Generalizing an internal tool is its own project. The version we run at Kombo had a thousand assumptions baked in. Channel naming conventions, ACL shortcuts, prompt tweaks that only made sense for our team. Pulling those out and rebuilding the same primitives in a way that works for any team took longer than I expected. If you’re considering open-sourcing something internal, budget twice the time you think it’ll take.

The MCP ecosystem is moving fast. When we started, Dynamic Client Registration (RFC 7591) for OAuth 2.1 in MCP was barely a spec people were implementing. By the time we shipped it, it was a requirement to interop with the latest clients. If you’re building anything in this space, assume the protocol underneath you will shift twice before you launch.

Don’t pre-build the marketplace. Our original design had a full skills/marketplace UI for sharing procedures between teams. We shipped v0.1 with that surface returning 501 Not Implemented. The plumbing is in the repo, but the routes are off. Procedure synthesis only makes sense once there’s enough cross-connector signal from real usage to bootstrap it. Building the marketplace before the network effects exist is the dotcom-era mistake of every Web3 project from 2021.

Small teams ship faster than you expect, but only on bets you’re sure about. I rebuilt the core of what a team at Kombo had built internally, in a few weeks. That’s only possible because we’d already validated the design choices on real production traffic. We weren’t exploring. We were generalizing.

What’s next

Holo is pre-alpha. The repo is public. The Slack bot path works. The bash sandbox works. Twenty connectors are live. The dashboard is functional. There are rough edges, and I would not put it in front of a customer tomorrow.

The honest reason for posting about this now isn’t that it’s done. It’s that the design assumptions are baked in deep enough that breaking them later will be expensive, and I want to find out where they’re wrong before that happens. If you’re a technical founder running multiple AI agents in production, the duplication pain is probably real for you too. I’d love to know where it hurts most and whether the surface we picked feels right.

Repo: github.com/maakle/holo Site: holobase.dev

The best part of open-sourcing something is that you stop arguing with yourself about whether it’s good enough and start arguing with strangers who will tell you exactly what’s broken. That’s the loop I’m here for.

Tech Founder Stack

Discussion about this post

Ready for more?