Build Your Own Agent Harness: A Layer-by-Layer Primer

We wanted an AI agent that runs 24/7 on a Mac Mini, responds to Discord messages, has a defined personality, falls back to a local LLM when the cloud API is down, and remembers conversations across restarts. No framework. Just our own code.

Here’s the thing about building an agent runtime: it’s not actually that much code. The hard part isn’t writing it — it’s knowing what to build first, and making sure each piece works before you stack the next one on top. We learned this the hard way. We wired up five layers at once, sent a test message, and got silence. Two hours of debugging later, the fix was a single missing config key. Five-second fix, hidden behind five untested systems.

So we backed up and built it layer by layer. Here’s the order, and why it matters.

What an agent harness actually is

An agent harness is the runtime that keeps your AI agent alive and connected to the world. It’s the boring infrastructure between “I have an LLM API key” and “I have an agent that responds to messages at 3am while I’m asleep.”

At minimum, a harness has:

Transport — how messages get in and out (Discord, Slack, iMessage, whatever)
LLM routing — how the agent thinks (Claude API, with a local fallback like Ollama)
Agent profile — who the agent is (system prompt, personality, role)

Everything else — skills, memory, scheduling — is a layer you add once the core works.

The total code for a basic harness is a few hundred lines. No framework needed. You’re just wiring a chat transport to an LLM call with a system prompt in between.

Build it layer by layer

Layer 1: Transport — can you send and receive a message?

This is your foundation. Pick a transport — Discord is the easiest starting point because the bot API is well-documented and free. Build the absolute minimum: a bot that connects, listens for messages, and echoes them back. No LLM, no personality, no skills. Just echo.

You: "hello"
Bot: "hello"
→ Transport: working ✓

If the echo doesn’t come back, the problem is in exactly one place. Fix it before touching anything else. This is the discipline that saves you hours later.

Why Discord first? It’s a real chat interface with mobile notifications, threading, and file uploads built in. You get a proper messaging UX for free. Later you can swap in Slack, iMessage (via BlueBubbles), or anything else — the transport is just a plugin.

Layer 2: LLM routing — can you get a response from an AI?

Replace the echo logic with an LLM call. Send the user’s message to Claude (or your preferred API), return the response. Still no personality — just a raw API call through the transport.

You: "What's the capital of France?"
Bot: "The capital of France is Paris."
→ LLM routing: working ✓

This validates that your API key works, your request format is correct, and the response flows back through the transport. Two layers, both verified independently.

If you want resilience, add a fallback now. Run Ollama locally with a small model. When the Claude call fails (timeout, rate limit, key expired), fall back to the local model. Test this by deliberately breaking your API key.

[break Claude key]
You: "What's 2 + 2?"
Bot: "4" (via Ollama)
→ Fallback: working ✓

Layer 3: Agent profile — does the personality show up?

Now give your agent an identity. Create a profile directory with a system prompt file (we call ours SOUL.md). Load it at startup and inject it as the system message in your LLM calls.

You: "Who are you?"
Bot: "I'm Obi, your chief of staff. How can I help?"
→ Agent profile: working ✓

If the response is generic or the name is wrong, the profile isn’t loading. You know it’s not the transport or the LLM — those are already proven.

This is where the harness becomes agent-agnostic. The same runtime code can load any profile directory. Swap the profile, get a different agent. Same transport, same LLM routing, different personality.

Layer 4: Skills — do commands work?

Add a command router. Register slash commands or keyword triggers. Start simple — /status returns uptime and agent name, /help lists available commands.

You: /status
Bot: "Obi | online | uptime: 2h | transport: Discord"
→ Skills: working ✓

Skills are just functions that the harness calls instead of the LLM. The command router intercepts a message before it hits the LLM and dispatches it to the right skill. Build a few utility skills first (status, help, ping), then add domain-specific ones.

Layer 5: Memory — do conversations persist?

Without memory, every message is a fresh conversation. Add conversation history — store messages and include recent history in the LLM context. Then test the hard part: restart the process and see if the agent remembers.

You: "Remember: the project deadline is Friday"
Bot: "Got it."
[restart process]
You: "When's the deadline?"
Bot: "Friday."
→ Memory: working ✓

Start with file-based storage (JSON on disk). It’s simple, debuggable, and sufficient for a single-agent setup. You can move to SQLite or a proper database later if you need to.

Layer 6: Uptime — does it stay running?

Use launchd — the process manager built into macOS. Write a plist that points to your Node.js entry point, set KeepAlive and RunAtLoad, and load it with launchctl. No third-party tools to install.

# Load the agent as a launchd service
launchctl load ~/Library/LaunchAgents/com.your-agent.plist

# ... come back tomorrow
launchctl list | grep your-agent
→ -  0  com.your-agent
→ Uptime: working ✓

Leave it overnight. Check in the morning. If it restarted, check the log files defined in your plist to find out why. Common culprits: unhandled promise rejections, memory leaks from unbounded conversation history, or the transport library dropping its WebSocket connection without reconnecting.

Why this order matters

Each layer depends on the previous ones. You can’t test a personality if the LLM call is broken. You can’t test skills if messages aren’t arriving. You can’t test memory if the agent can’t respond. The sequence is a dependency chain.

The temptation is to build everything at once and test end-to-end. Resist it. When something breaks at layer 4, you want absolute certainty that layers 1-3 are solid. That turns every debugging session from “why isn’t this working?” into “which specific thing in this one layer isn’t working?” — a much easier question.

What you end up with

A few hundred lines of code that:

Connects to Discord (or whatever transport you chose)
Sends messages to Claude (with a local Ollama fallback)
Loads a personality from a profile directory
Routes commands to skill functions
Remembers conversations across restarts
Stays running via launchd

No framework. No vendor lock-in. No magic. You understand every line because you wrote it. When it breaks at 2am — and it will — you know exactly where to look.

The real unlock is that this architecture is agent-agnostic. The harness doesn’t care who it’s running. Swap the profile directory and you have a different agent on the same runtime. We run multiple agents on the same Mac Mini, each with their own personality, skills, and memory, sharing the same transport and LLM infrastructure.

Key takeaway

Building an agent runtime isn’t a framework problem — it’s a sequencing problem. Each layer is simple on its own. The complexity comes from stacking them all at once and debugging the interactions. Build one layer, validate it, move on. You’ll have a running agent faster than you think.