How do I send Discord image attachments to Claude's vision API?

Extract attachment URLs from the Discord message object, filter for image content types, build an array of content blocks with type 'image' using source.type 'url' pointing to the Discord CDN URL, append the text message as a 'text' block, and pass the array as the user message content in your Claude API call.

Do I need to download Discord images before sending them to Claude?

No. Discord CDN URLs are publicly accessible. Use source.type 'url' with the Discord attachment URL directly. Claude's API fetches the image from the URL. No need to download, base64-encode, or store the image locally.

Does the order of content blocks matter when sending images to Claude?

Yes. Put image blocks before text blocks in the content array. Claude processes content blocks sequentially — placing images first gives the model visual context before it reads the text prompt. This produces better responses when the text references the image.

What image formats does Claude's vision API support via Discord attachments?

Claude supports PNG, JPEG, GIF, and WebP. Filter Discord attachments by content type (image/png, image/jpeg, etc.) or by file extension (.png, .jpg, .jpeg, .gif, .webp) to ensure you only send supported formats.

Discord Image Attachments → Claude Vision in Five Lines

A user sends your Discord bot a screenshot and asks “what’s wrong with this error?” Your bot reads the text, ignores the image, and gives a generic answer. The image had the actual error message. The fix is shorter than you’d expect.

The Problem

Discord messages can include image attachments. The discord.js message object exposes them, but most bot implementations only process message.content (the text). The image attachment is metadata sitting right there, unused.

Claude’s API supports vision — you can send images as content blocks alongside text. But the message format is different from a plain string. You need to build an array of content blocks instead.

Step 1: Extract Attachments From Discord

The Discord message object has an attachments collection. Pull out the metadata you need:

// In your Discord message handler
let attachments = [];

if (message.attachments.size > 0) {
  attachments = [...message.attachments.values()].map(a => ({
    url: a.url,
    filename: a.name,
    contentType: a.contentType,
    width: a.width,
    height: a.height,
  }));
}

Pass these through your transport layer alongside the message text. Your command handler needs both.

Step 2: Build Vision Content Blocks

Filter for images, build Claude-compatible content blocks, and append the text:

// In your command handler, before calling Claude
let userContent = text;

const imageAttachments = (attachments || []).filter(a =>
  a.contentType?.startsWith('image/') ||
  /\.(png|jpg|jpeg|gif|webp)$/i.test(a.filename || '')
);

if (imageAttachments.length > 0) {
  const contentBlocks = [];

  for (const img of imageAttachments) {
    contentBlocks.push({
      type: 'image',
      source: { type: 'url', url: img.url },
    });
  }

  if (text.trim()) {
    contentBlocks.push({ type: 'text', text });
  }

  userContent = contentBlocks;
}

That’s it. userContent is either a plain string (no images) or an array of content blocks (images + text). Pass it to Claude as the user message.

Step 3: Send to Claude

The Claude API accepts either a string or an array of content blocks for the content field:

const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 4096,
  system: systemPrompt,
  messages: [
    ...conversationHistory,
    { role: 'user', content: userContent },
  ],
});

No special flag or parameter needed. If content is an array with image blocks, Claude processes them as vision input.

Why Images Go Before Text

Notice the code pushes image blocks first, then the text block. This matters.

Claude processes content blocks sequentially. When a user sends a screenshot and asks “what’s this error?”, you want Claude to see the image before reading the question. This mirrors how a human would process it — look at the image, then read the question about it.

// ✅ Good: image → text
[
  { type: 'image', source: { type: 'url', url: '...' } },
  { type: 'text', text: 'What error is this showing?' }
]

// ⚠️ Works but worse: text → image
[
  { type: 'text', text: 'What error is this showing?' },
  { type: 'image', source: { type: 'url', url: '...' } }
]

Both work. The first one produces better responses because the model has visual context before it encounters the question.

No Download Needed

Discord CDN URLs (cdn.discordapp.com) are publicly accessible without authentication. Claude’s API fetches the image directly from the URL when you use source.type: 'url'. No need to:

Download the image to your server
Base64-encode it
Store it temporarily
Manage cleanup

Just pass the URL. This is the simplest path and it works because Discord’s CDN doesn’t require auth for attachment URLs.

If you’re in an environment where the image URLs are behind authentication (not Discord, but other platforms), you’d need to download the image, base64-encode it, and use source.type: 'base64' with media_type and data fields instead.

Conversation History

If your agent maintains conversation memory, the mixed content blocks go into history as-is:

// Memory stores whatever content was sent
history.push({
  role: 'user',
  content: userContent,  // string or array of blocks
  timestamp: new Date().toISOString(),
});

On the next turn, the API receives the full history including image blocks from previous messages. Claude can reference images from earlier in the conversation — “that screenshot you showed me earlier” works correctly.

Multiple Images

The pattern handles multiple attachments naturally. Each image gets its own content block, all before the text:

// User sends 3 screenshots + "compare these"
[
  { type: 'image', source: { type: 'url', url: 'cdn.../screenshot1.png' } },
  { type: 'image', source: { type: 'url', url: 'cdn.../screenshot2.png' } },
  { type: 'image', source: { type: 'url', url: 'cdn.../screenshot3.png' } },
  { type: 'text', text: 'Compare these three — which layout is better?' }
]

Claude sees all three images and can compare them. The API supports up to 20 images per message.

Key Takeaway

Adding vision to a Discord bot is a content format change, not an architecture change. Extract attachment URLs from the message, filter for images, build content blocks with images before text, pass the array instead of a string. Five lines of block assembly and your bot can see what users send it. The Discord CDN URLs work directly with Claude’s URL-based image source — no download step, no encoding, no temp files.