obi-jam debugging intermediate 15 minutes

Your Agent Is Hallucinating Answers Instead of Running Tools

The agent got the command. Had the tool. Knew the data was right there in a JSON file. And instead of reading it, made up numbers, invented usernames, and described features that don't exist. The scariest part wasn't the hallucination — it was that the response looked exactly like what the tool would have returned.

The Problem

A Discord bot had a /status module command backed by a real data source — a JSON file with live system metrics. When  Adam ran /status, instead of reading the file and returning real data, the bot responded with:

  • Invented metrics and dollar amounts
  • Fabricated usernames that don’t exist in the system
  • Detailed descriptions of subcommands (/status history, /status alerts) that were never built

The response was formatted perfectly. It looked exactly like what the tool would return. There was no error, no warning, nothing to indicate the tool never ran.

Why This Happens

The bot’s architecture has a command router with a fallback chain:

  1. Check harness commands (/help, /clear, /model)
  2. Check module commands (/status, /tasks, etc.)
  3. If nothing matches → route to the LLM for a chat response

The /status module had two registration bugs that prevented it from ever matching:

Bug 1: Missing prefix. The module registered its command as status, but the router stored and matched against keys with the / prefix. When Discord sends /status, the check was:

'/status'.startsWith('status')  // false — the leading / kills the match

Bug 2: Wrong handler shape. Other modules returned the expected format:

// What the router expects
commands: {
  '/status': {
    description: 'System status',
    handler: async (sender, content, channelMeta) => { ... }
  }
}

But the broken module returned:

// What was actually registered
commands: {
  status: async ({ args, reply }) => { ... }
}

No / prefix. No { description, handler } wrapper. Wrong function signature. The router called moduleDef.handler(sender, content, channelMeta) — but moduleDef was the bare function itself, so .handler was undefined.

Both bugs meant the command never matched step 2. Every invocation fell through to step 3, where the LLM generated a plausible response from its training data.

The Fix

Match the pattern that working modules use:

// Before — silent failure
const commands = {
  async status({ args, reply }) {
    const data = readJSON(statusPath);
    return reply(formatStatus(data));
  },
};

// After — routes correctly
const commands = {
  '/status': {
    description: 'System status',
    async handler(sender, content, channelMeta) {
      const data = readJSON(statusPath);
      return formatStatus(data);
    },
  },
};

Three changes:

  1. Add the / prefix to the command key
  2. Wrap in { description, handler } object
  3. Match the handler signature (sender, content, channelMeta) and return the string directly instead of calling reply()

Key Takeaway

This is scarier than a normal hallucination because the user explicitly asked for a tool — they expected grounded data, not generation. The LLM didn’t hedge or say “I don’t have access.” It fabricated specific numbers, names, and features with full confidence. If your agent architecture has a chat fallback after tool matching, you have a loaded gun pointed at your users’ trust. The fix isn’t better prompting — it’s making the routing layer fail loudly instead of falling through silently.

FAQ

Why is my agent making up data instead of calling my tool?

Your tool registration likely doesn't match what the command router expects. Check three things: the command key format (does it need a / prefix?), the handler object structure (bare function vs { description, handler } object), and the handler signature (what arguments does the router pass?). If any of these mismatch, the command silently fails to match and falls through to the LLM, which will confidently fabricate a response.

How do I prevent LLM fallthrough when a tool invocation fails?

Never let tool failures route to chat silently. If the user's intent was clearly a tool call — a slash command, an explicit function name — the system should return a tool error, not an LLM response. Log what commands are registered at startup, assert handler signatures match expected patterns, and add a guard that returns 'Unknown command' instead of falling through to the LLM.

How do I debug a Discord bot slash command that returns wrong data?

First check if the module command is actually being invoked. Add logging at the command router's matching step. If the router logs show no match for your command, compare the key format in your module's command registration against what the router stores. Common mismatches: missing / prefix, wrong handler object shape, or handler signature that doesn't match what the dispatcher calls.