The agent framework we were replacing had grown to thousands of lines. It had a plugin system, an event bus, a session manager, a health monitor with configurable intervals, a hot-reload mechanism, and a skill registry with dependency injection. Each piece was built to solve a real problem. Together, they were a system that required its own documentation to debug.
When something broke — and things broke weekly — the diagnosis started with “which subsystem?” and ended with reading through three or four files to understand the interaction pattern. A 20-minute fix took 90 minutes because 70 minutes was understanding.
The Problem
Agent runtimes accumulate complexity because each feature addition feels justified in isolation. You need a health check, so you add a health monitor. The health monitor needs configuration, so you add a config system. The config system needs hot-reload, so you add file watching. File watching needs debouncing, so you add a utility. Each step is small. The destination is a codebase where nobody can hold the whole system in their head.
The cost isn’t in writing the code. It’s in reading it six months later at 11pm when the agent stops responding.
Why This Happens
Frameworks are designed for generality. They solve problems you might have, not just problems you do have. That’s their value for a large community — and their liability for a single team. When you adopt a framework, you adopt its assumptions about what problems matter. When you build your own runtime, you can choose to only solve the problems in front of you.
The temptation is to build the framework anyway. “What if we need plugins later?” “What if we need multiple transports?” But “what if” code is inventory, and inventory has carrying costs. Every line of speculative code is a line that needs to be understood, maintained, and debugged.
Part of this is a deliberate learning exercise. I want to understand every layer of the agent stack by building it myself — not by reading someone else’s abstractions. When you build your own runtime, you learn where the real complexity lives and where frameworks are adding ceremony for problems you don’t have. You also learn a critical instinct: use code when code works. An if statement that routes messages is cheaper, faster, and more debuggable than asking an LLM to classify intent. A cron job that fires a morning briefing is better than an LLM deciding it’s time to send one. Not everything needs to be “intelligent” — most of the runtime is just wiring, and wiring should be deterministic.
This philosophy extends to the agent architecture itself. Instead of building one monolithic agent that does everything, separate agents by function. Your chief of staff doesn’t need to be your artist. Your engineering lead doesn’t need to parse emails. Each agent has a clear role, a focused skill set, and a profile that fits in its own head. The runtime is simple because each agent is simple. Composition beats complexity.
The Fix
Set a constraint: the runtime stays small enough to debug any problem in 30 minutes. For a Node.js agent runtime, that’s roughly 400 lines across the core files.
What fits in 400 lines:
src/
index.js (~120 lines) — load config, load agent, connect transport,
wire services, route messages, health endpoint
agent-loader.js (~80 lines) — read profile directory, parse identity,
assemble system prompt, discover skills
services/
claude.js (~60 lines) — Anthropic API wrapper with tool use
ollama.js (~40 lines) — local LLM wrapper
memory.js (~50 lines) — JSON-based conversation persistence
handlers/
commands.js (~50 lines) — command routing (harness + skills)
What stays out:
Plugin systems. If you have two agent skills, you don’t need a plugin architecture. A skills/ directory with markdown files and a loader that reads them is sufficient.
Event buses. Direct function calls are simpler than pub/sub when you have one publisher and one subscriber. An event bus solves the problem of decoupling components that don’t know about each other. In a 400-line codebase, everything knows about everything.
Hot-reload. Restart the process. launchd does it automatically when the process exits. The operational cost of a restart is lower than the code complexity of file watchers, state preservation, and reconnection logic.
Abstraction layers. If your orchestrator has a MessageProcessor that calls a ResponseGenerator that wraps an LLMAdapter — you’ve built three layers to do what one function call does. Each layer is a place where bugs hide and debugging slows down.
The 30-minute test:
After every change, apply this test: if this breaks at midnight and I get a message about it, can I trace the problem from symptom to root cause in 30 minutes? If the answer is no, the change added too much complexity.
This isn’t about writing clever code. It’s about writing obvious code. Every function should be readable without context. Every file should have one clear responsibility. The runtime should be something you can explain to someone in five minutes at a whiteboard.
Push complexity to the edges:
The runtime stays small because the interesting stuff lives elsewhere:
- Agent personality → markdown files in the profile directory
- Agent skills →
SKILL.mdfiles in the skills subdirectory - Transport quirks → transport implementations (Discord splits at 2000 chars — that’s Discord’s file, not the orchestrator’s)
- Business logic → the LLM handles it, prompted by the agent profile
The runtime is just plumbing. Plumbing should be boring.
Key Takeaway
A small codebase isn’t a limitation — it’s a reliability feature. Every line you don’t write is a line that can’t break. Set a size budget for your agent runtime and treat it like a real constraint. When a new feature doesn’t fit, the answer isn’t “make the budget bigger.” The answer is “find a simpler way to do this, or decide it doesn’t belong in the runtime.” Your future self, debugging at midnight, will thank you.