#Multimodal models use the read tool on PDFs and get garbled mess instead of processing it directly

1 messages Ā· Page 1 of 1 (latest)

mild charm
#

How can I fix this issue?

remote ironBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> āœ… Mark Solution

pine latchBOT
#

The fix is: don’t let it use read for PDFs — read is a file-text reader, so binary PDFs can look garbled. OpenClaw has a dedicated pdf tool.

First, please check:

openclaw --version

Then in the chat/session run:

/tools verbose

If you don’t see pdf, configure a PDF model and restart:

#
openclaw config set agents.defaults.pdfModel.primary "anthropic/<your-pdf-capable-model>"
openclaw gateway restart

Native PDF support is currently via Anthropic or Google; other providers use extraction fallback. Docs: https://docs.openclaw.ai/tools/pdf

After restart, retry with explicit wording like:

Analyze this PDF using the pdf tool, not read: /path/to/file.pdf

If pdf shows in /tools verbose but the agent still chooses read, that’s probably a tool-selection bug/weakness for that model/session. Workaround is to explicitly say ā€œuse the pdf toolā€, or switch to a stronger tool-calling model. If it’s reproducible on current OpenClaw, please file it with version + /tools verbose output + a short repro: https://github.com/openclaw/openclaw/issues 🦐

mild charm
#

what do you mean "Native PDF support is currently via Anthropic or Google"? i can't submit a pdf to a llama.cpp model through openclaw whereas it literally works everywhere else?

pine latchBOT
#

Yeah — I phrased that too loosely.

By ā€œnative PDFā€ I mean: OpenClaw sends the raw application/pdf bytes directly to the provider API as a PDF/document block. OpenClaw currently only has that direct path for Anthropic/Google.

For llama.cpp, OpenClaw should use the pdf tool’s fallback path:

  1. extract text from the PDF
  2. if needed, render pages to images
  3. send text/images to your local model

That’s likely what ā€œeverywhere elseā€ is doing too — not raw PDF ingestion by llama.cpp, but client-side PDF preprocessing.

So yes, you should be able to use a llama.cpp/OpenAI-compatible model through OpenClaw, but it needs to go through pdf, not read.

Check/fix:

#
/tools verbose

If pdf exists, prompt explicitly:

Use the pdf tool to analyze this PDF, not read.

If scanned/image PDFs fail, make sure your local model is marked vision-capable in config:

models: {
  providers: {
    llamacpp: {
      baseUrl: "http://127.0.0.1:8080/v1",
      api: "openai-completions",
#
      apiKey: "local",
      models: [
        {
          id: "your-model-id",
          input: ["text", "image"]
        }
      ]
    }
  }
},
agents: {
  defaults: {
    pdfModel: { primary: "llamacpp/your-model-id" }
  }
}
mild charm
#

is there a way to make it parse the pdf as images then process those?

pine latchBOT
#

Short answer: kind of, but there isn’t currently an ā€œalways render every PDF page as imagesā€ switch for the pdf tool.

Current behavior for non-native/local models is:

  1. pdf extracts text first.
  2. If extracted text is under ~200 chars, it renders selected pages to PNG images.
  3. It sends the extracted text/images to the configured pdfModel.

So for scanned PDFs it should already become images. For normal text PDFs, OpenClaw currently prefers extracted text.

For llama.cpp/local vision, make sure the model is marked image-capable:

{
  models: {
    providers: {
#
      llamacpp: {
        baseUrl: "http://127.0.0.1:8080/v1",
        api: "openai-completions",
        apiKey: "local",
        models: [
          {
            id: "your-model-id",
            input: ["text", "image"]
          }
        ]
      }
    }
  },
  agents: {
    defaults: {
#
      pdfModel: { primary: "llamacpp/your-model-id" },
      pdfMaxPages: 20
    }
  }
}

Then restart:

openclaw gateway restart

Docs: https://docs.openclaw.ai/tools/pdf

If you specifically need visual/layout interpretation for text PDFs, workaround is to export/render the PDF pages to images first and send those images. But I agree that a pdf option like mode: "images" / forceImages: true would be useful — if that’s what you need, please file it as a feature request: https://github.com/openclaw/openclaw/issues 🦐

mild charm
#

it keeps calling the read tool. how do i block it from doing that on pdfs

pine latchBOT
#

There’s a hard block, but not a ā€œblock read only for .pdf filesā€ rule today.

Use the global/per-agent tool denylist:

{
  tools: {
    deny: ["read"],
    alsoAllow: ["pdf"]
  }
}

or via CLI: