#Multimodal models use the read tool on PDFs and get garbled mess instead of processing it directly
1 messages Ā· Page 1 of 1 (latest)
The fix is: donāt let it use read for PDFs ā read is a file-text reader, so binary PDFs can look garbled. OpenClaw has a dedicated pdf tool.
First, please check:
openclaw --version
Then in the chat/session run:
/tools verbose
If you donāt see pdf, configure a PDF model and restart:
openclaw config set agents.defaults.pdfModel.primary "anthropic/<your-pdf-capable-model>"
openclaw gateway restart
Native PDF support is currently via Anthropic or Google; other providers use extraction fallback. Docs: https://docs.openclaw.ai/tools/pdf
After restart, retry with explicit wording like:
Analyze this PDF using the
read:/path/to/file.pdf
If pdf shows in /tools verbose but the agent still chooses read, thatās probably a tool-selection bug/weakness for that model/session. Workaround is to explicitly say āuse the pdf toolā, or switch to a stronger tool-calling model. If itās reproducible on current OpenClaw, please file it with version + /tools verbose output + a short repro: https://github.com/openclaw/openclaw/issues š¦
what do you mean "Native PDF support is currently via Anthropic or Google"? i can't submit a pdf to a llama.cpp model through openclaw whereas it literally works everywhere else?
Yeah ā I phrased that too loosely.
By ānative PDFā I mean: OpenClaw sends the raw application/pdf bytes directly to the provider API as a PDF/document block. OpenClaw currently only has that direct path for Anthropic/Google.
For llama.cpp, OpenClaw should use the pdf toolās fallback path:
- extract text from the PDF
- if needed, render pages to images
- send text/images to your local model
Thatās likely what āeverywhere elseā is doing too ā not raw PDF ingestion by llama.cpp, but client-side PDF preprocessing.
So yes, you should be able to use a llama.cpp/OpenAI-compatible model through OpenClaw, but it needs to go through pdf, not read.
Check/fix:
/tools verbose
If pdf exists, prompt explicitly:
Use the
read.
If scanned/image PDFs fail, make sure your local model is marked vision-capable in config:
models: {
providers: {
llamacpp: {
baseUrl: "http://127.0.0.1:8080/v1",
api: "openai-completions",
apiKey: "local",
models: [
{
id: "your-model-id",
input: ["text", "image"]
}
]
}
}
},
agents: {
defaults: {
pdfModel: { primary: "llamacpp/your-model-id" }
}
}
Then restart:
openclaw gateway restart
Docs: https://docs.openclaw.ai/tools/pdf and https://docs.openclaw.ai/concepts/model-providers#providers-via-modelsproviders-custombase-url
If it still chooses read after that, Iād call that a bug/tool-selection issue, not an inherent llama.cpp limitation. Include openclaw --version, /tools verbose, and the relevant config in a GitHub issue.
is there a way to make it parse the pdf as images then process those?
Short answer: kind of, but there isnāt currently an āalways render every PDF page as imagesā switch for the pdf tool.
Current behavior for non-native/local models is:
pdfextracts text first.- If extracted text is under ~200 chars, it renders selected pages to PNG images.
- It sends the extracted text/images to the configured
pdfModel.
So for scanned PDFs it should already become images. For normal text PDFs, OpenClaw currently prefers extracted text.
For llama.cpp/local vision, make sure the model is marked image-capable:
{
models: {
providers: {
llamacpp: {
baseUrl: "http://127.0.0.1:8080/v1",
api: "openai-completions",
apiKey: "local",
models: [
{
id: "your-model-id",
input: ["text", "image"]
}
]
}
}
},
agents: {
defaults: {
pdfModel: { primary: "llamacpp/your-model-id" },
pdfMaxPages: 20
}
}
}
Then restart:
openclaw gateway restart
Docs: https://docs.openclaw.ai/tools/pdf
If you specifically need visual/layout interpretation for text PDFs, workaround is to export/render the PDF pages to images first and send those images. But I agree that a pdf option like mode: "images" / forceImages: true would be useful ā if thatās what you need, please file it as a feature request: https://github.com/openclaw/openclaw/issues š¦
it keeps calling the read tool. how do i block it from doing that on pdfs
Thereās a hard block, but not a āblock read only for .pdf filesā rule today.
Use the global/per-agent tool denylist:
{
tools: {
deny: ["read"],
alsoAllow: ["pdf"]
}
}
or via CLI: