Multimodal models use the read tool on PDFs and get garbled mess instead of processing it directly | Friends of the Crustacean 🦞🤝 | Page 1

mild charm Apr 28, 2026, 4:42 PM

#

How can I fix this issue?

remote ironBOT Apr 28, 2026, 4:42 PM

#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

pine latchBOT Apr 28, 2026, 4:43 PM

#

The fix is: don’t let it use read for PDFs — read is a file-text reader, so binary PDFs can look garbled. OpenClaw has a dedicated pdf tool.

First, please check:

openclaw --version

Then in the chat/session run:

/tools verbose

If you don’t see pdf, configure a PDF model and restart:

#

openclaw config set agents.defaults.pdfModel.primary "anthropic/<your-pdf-capable-model>"
openclaw gateway restart

Native PDF support is currently via Anthropic or Google; other providers use extraction fallback. Docs: https://docs.openclaw.ai/tools/pdf

After restart, retry with explicit wording like:

Analyze this PDF using the pdf tool, not read: /path/to/file.pdf

If pdf shows in /tools verbose but the agent still chooses read, that’s probably a tool-selection bug/weakness for that model/session. Workaround is to explicitly say “use the pdf tool”, or switch to a stronger tool-calling model. If it’s reproducible on current OpenClaw, please file it with version + /tools verbose output + a short repro: https://github.com/openclaw/openclaw/issues 🦐

mild charm Apr 28, 2026, 4:45 PM

#

what do you mean "Native PDF support is currently via Anthropic or Google"? i can't submit a pdf to a llama.cpp model through openclaw whereas it literally works everywhere else?

pine latchBOT Apr 28, 2026, 4:46 PM

#

Yeah — I phrased that too loosely.

By “native PDF” I mean: OpenClaw sends the raw application/pdf bytes directly to the provider API as a PDF/document block. OpenClaw currently only has that direct path for Anthropic/Google.

For llama.cpp, OpenClaw should use the pdf tool’s fallback path:

extract text from the PDF
if needed, render pages to images
send text/images to your local model

That’s likely what “everywhere else” is doing too — not raw PDF ingestion by llama.cpp, but client-side PDF preprocessing.

So yes, you should be able to use a llama.cpp/OpenAI-compatible model through OpenClaw, but it needs to go through pdf, not read.

Check/fix:

#

/tools verbose

If pdf exists, prompt explicitly:

Use the pdf tool to analyze this PDF, not read.

If scanned/image PDFs fail, make sure your local model is marked vision-capable in config:

models: {
  providers: {
    llamacpp: {
      baseUrl: "http://127.0.0.1:8080/v1",
      api: "openai-completions",

#

      apiKey: "local",
      models: [
        {
          id: "your-model-id",
          input: ["text", "image"]
        }
      ]
    }
  }
},
agents: {
  defaults: {
    pdfModel: { primary: "llamacpp/your-model-id" }
  }
}

#

Then restart:

openclaw gateway restart

Docs: https://docs.openclaw.ai/tools/pdf and https://docs.openclaw.ai/concepts/model-providers#providers-via-modelsproviders-custombase-url

If it still chooses read after that, I’d call that a bug/tool-selection issue, not an inherent llama.cpp limitation. Include openclaw --version, /tools verbose, and the relevant config in a GitHub issue.

mild charm Apr 28, 2026, 9:46 PM

#

is there a way to make it parse the pdf as images then process those?

pine latchBOT Apr 28, 2026, 9:47 PM

#

Short answer: kind of, but there isn’t currently an “always render every PDF page as images” switch for the pdf tool.

Current behavior for non-native/local models is:

pdf extracts text first.
If extracted text is under ~200 chars, it renders selected pages to PNG images.
It sends the extracted text/images to the configured pdfModel.

So for scanned PDFs it should already become images. For normal text PDFs, OpenClaw currently prefers extracted text.

For llama.cpp/local vision, make sure the model is marked image-capable:

{
  models: {
    providers: {

#

      llamacpp: {
        baseUrl: "http://127.0.0.1:8080/v1",
        api: "openai-completions",
        apiKey: "local",
        models: [
          {
            id: "your-model-id",
            input: ["text", "image"]
          }
        ]
      }
    }
  },
  agents: {
    defaults: {

#

      pdfModel: { primary: "llamacpp/your-model-id" },
      pdfMaxPages: 20
    }
  }
}

Then restart:

openclaw gateway restart

Docs: https://docs.openclaw.ai/tools/pdf

If you specifically need visual/layout interpretation for text PDFs, workaround is to export/render the PDF pages to images first and send those images. But I agree that a pdf option like mode: "images" / forceImages: true would be useful — if that’s what you need, please file it as a feature request: https://github.com/openclaw/openclaw/issues 🦐

mild charm Apr 28, 2026, 10:36 PM

#

it keeps calling the read tool. how do i block it from doing that on pdfs

pine latchBOT Apr 28, 2026, 10:37 PM

#

There’s a hard block, but not a “block read only for .pdf files” rule today.

Use the global/per-agent tool denylist:

{
  tools: {
    deny: ["read"],
    alsoAllow: ["pdf"]
  }
}

or via CLI:

#Multimodal models use the read tool on PDFs and get garbled mess instead of processing it directly