I ran this by some people and was given something to try out to see if I got a response. My openclaw is on 192.168.1.101, my Ollama is on 192.168.1.102
I can get to the openclaw browser and login, but he is just dumb. no response. We know that Ollama is working as we can test it from teh command line and a browser. but we ran out of things to try to find out where the bug or issue is until I read this to try. . It shows that if you have a listener sitting on the ollama server and hit it with something from openclaw, you should see it respond. I am not seeing anything using sudo tcpdump -i lo port 11434 despite me pestering the openclaw interface . We know that Ollama on 192.168.1.102 responds to requests. but it seems openclaw is not sending any? Does anyone have any idea where I may look for the problem here.
#openclaw isn't communicating with Ollama it seems
1 messages · Page 1 of 1 (latest)
Let me check the current docs/source for the Ollama bits so I don’t guess.
I found one likely gotcha already: tcpdump -i lo on the Ollama box would miss traffic coming from 192.168.1.101, because that arrives on the NIC, not loopback. I’m checking the rest so I can give you a solid checklist instead of vibes.
A couple things jump out.
tcpdump -i lo port 11434is the wrong interface for this test.
If OpenClaw is on192.168.1.101and Ollama is on192.168.1.102, that traffic hits the network NIC, not loopback.
On the Ollama box, use:
sudo tcpdump -ni any host 192.168.1.101 and port 11434
- Test from the OpenClaw host (
192.168.1.101), not from your browser machine:
curl http://192.168.1.102:11434/api/tags
openclaw --version
openclaw status
openclaw models status
openclaw logs --follow
- Common Ollama gotcha: the base URL should be the native Ollama API, not
/v1.
Good:
baseUrl: "http://192.168.1.102:11434",
api: "ollama"
Bad:
baseUrl: "http://192.168.1.102:11434/v1"
OpenClaw’s Ollama docs are here: https://docs.openclaw.ai/providers/ollama
Fast triage docs: https://docs.openclaw.ai/help/faq
Troubleshooting: https://docs.openclaw.ai/gateway/troubleshooting
- Best end-to-end smoke test:
openclaw infer model run --model ollama/<your-exact-model> --prompt "Reply with exactly: smoke-ok" --json
Run that while openclaw logs --follow and the tcpdump above are open.
If you paste these, we can usually pin it down pretty fast:
openclaw --versionopenclaw models status- result of
curl http://192.168.1.102:11434/api/tagsfrom192.168.1.101 - your
models.providers.ollamaconfig, if you set one manually
If curl from .101 fails, this is probably bind/firewall on the Ollama host. If curl works but infer model run shows no packets, OpenClaw probably is not actually selecting the Ollama provider.
the response from the smoke test - 11:26:38 [agent/embedded] embedded run agent end: runId=f85051b5-17d6-40c4-a6b1-cda16fd8cdf3 isError=true model=qwen3.5:latest provider=ollama error=500 {"error":"llama runner process has terminated: %!w(\u003cnil\u003e)"} rawError=500 {"error":"llama runner process has terminated: %!w(\u003cnil\u003e)"}
11:27:01 [agent/embedded] embedded run agent end: runId=f85051b5-17d6-40c4-a6b1-cda16fd8cdf3 isError=true model=qwen3.5:latest provider=ollama error=LLM request failed: network connection error. rawError=fetch failed | other side closed
11:27:28 [agent/embedded] embedded run agent end: runId=f85051b5-17d6-40c4-a6b1-cda16fd8cdf3 isError=true model=qwen3.5:latest provider=ollama error=500 {"error":"llama runner process has terminated: %!w(\u003cnil\u003e)"} rawError=500 {"error":"llama runner process has terminated: %!w(\u003cnil\u003e)"}
11:27:59 [agent/embedded] embedded run agent end: runId=f85051b5-17d6-40c4-a6b1-cda16fd8cdf3 isError=true model=qwen3.5:latest provider=ollama error=LLM request failed: network connection error. rawError=fetch failed | other side closed
11:27:59 [agent/embedded] embedded run failover decision: runId=f85051b5-17d6-40c4-a6b1-cda16fd8cdf3 stage=assistant decision=surface_error reason=timeout provider=ollama/qwen3.5:latest profile=sha256:9c018ec112cf
{
"ok": true,
"capability": "model.run",
"transport": "local",
"provider": "ollama",
"model": "qwen3.5:latest",
"attempts": [],
"outputs": [
{
"text": "LLM request failed: network connection error.",
"mediaUrl": null
}
]
}
This is useful, and honestly a relief: it means OpenClaw is reaching Ollama. I’m checking one config detail before I tell you the next thing to try.
Yep, this changes the diagnosis.
OpenClaw is reaching Ollama. The failing piece is Ollama itself:
500 {"error":"llama runner process has terminated..."}is an Ollama-side errorfetch failed | other side closedis what OpenClaw sees when the Ollama runner dies mid-request
So I’d stop chasing OpenClaw networking now and debug the Ollama runner/model.
Try these next:
-
On
192.168.1.102, watch Ollama logs live while you run the smoke test- if it’s a service:
journalctl -u ollama -f - if not, run
ollama servein the foreground
- if it’s a service:
-
From the OpenClaw host (
192.168.1.101), hit Ollama directly with the same model:
curl http://192.168.1.102:11434/api/chat \
-H 'content-type: application/json' \
-d '{"model":"qwen3.5:latest","messages":[{"role":"user","content":"Reply with exactly: smoke-ok"}],"stream":false}'
If that also dies, it’s definitely not OpenClaw.
- Double-check the OpenClaw Ollama URL is the native one, not
/v1:
baseUrl: "http://192.168.1.102:11434",
api: "ollama"
- Paste:
openclaw --version
ollama --version
My guess is one of these:
- Ollama runner crash / model bug
- RAM/VRAM pressure
- this specific model is unstable under agent-style prompts
If the direct /api/chat works but OpenClaw still knocks it over, then the model is probably choking on the larger agent prompt/tool payload. In that case I’d test with a different model that you already know is stable in Ollama.
the base url is correct
Nice, then I’d stop suspecting the URL.
At this point the strongest signal is still: Ollama runner crash, not OpenClaw misrouting.
Why I’m pretty confident:
- OpenClaw already got a real Ollama response:
500 {"error":"llama runner process has terminated..."} - that only happens if the request made it to Ollama
- the later
other side closedfits Ollama dying/resetting mid-request
So the next useful data is:
curl http://192.168.1.102:11434/api/chat \
-H 'content-type: application/json' \
-d '{"model":"qwen3.5:latest","messages":[{"role":"user","content":"Reply with exactly: smoke-ok"}],"stream":false}'
Run that from 192.168.1.101 and, on the Ollama box, watch logs at the same time.
Also please paste:
openclaw --version
ollama --version
If the direct /api/chat also blows up, we’ve basically proven:
- Ollama/model bug, or
- RAM/VRAM pressure, or
- that model is unstable under this prompt shape
If you can grab the Ollama-side log from the exact moment of the crash, that’s probably the smoking gun. Docs here too: https://docs.openclaw.ai/providers/ollama
OpenClaw 2026.4.12 (1c0672b)
ollama version is 0.20.4
Thanks, that helps. I’m doing one quick search for that exact Ollama runner error so I don’t send you on a wild goose chase.
Those versions don’t scream “OpenClaw bug” to me.
I also found multiple Ollama issues for this exact error string, llama runner process has terminated, so this really does look like Ollama/model/backend rather than OpenClaw.
At this point I’d do 3 quick checks:
- Direct native Ollama API test from
192.168.1.101
curl http://192.168.1.102:11434/api/chat \
-H 'content-type: application/json' \
-d '{"model":"qwen3.5:latest","messages":[{"role":"user","content":"Reply with exactly: smoke-ok"}],"stream":false}'
-
Watch Ollama logs on
192.168.1.102at the same time
If it logs OOM, GPU backend failure, runner exit, or model load failure, that’s the culprit. -
Try one other smaller model you already have locally
Just as a sanity check. If that works in OpenClaw and qwen3.5:latest does not, then the problem is basically narrowed to the Ollama runner + that model combination.
So the short version is: OpenClaw already proved it can reach Ollama, we’re past the tcpdump/routing stage now.
If you paste:
- the result of that direct
/api/chatcall - the matching Ollama log lines
…I think we can pin it down pretty fast.
from previous test on ollama logs Enabled KvSize:4096 KvCacheType: NumThreads:2 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Apr 18 11:31:23 ollama ollama[136389]: time=2026-04-18T11:31:23.284Z level=INFO source=ggml.go:136 msg="" architecture=qwen35 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=52
Apr 18 11:31:23 ollama ollama[136389]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
Apr 18 11:31:23 ollama ollama[136389]: time=2026-04-18T11:31:23.291Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Apr 18 11:31:23 ollama ollama[136389]: time=2026-04-18T11:31:23.679Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:2 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Apr 18 11:31:24 ollama ollama[136389]: time=2026-04-18T11:31:24.697Z level=INFO source=runner.go:1290 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:2 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Apr 18 11:31:24 ollama ollama[136389]: time=2026-04-18T11:31:24.697Z level=INFO source=device.go:245 msg="model weights" device=CPU size="6.1 GiB"
Apr 18 11:31:24 ollama ollama[136389]: time=2026-04-18T11:31:24.697Z level=INFO source=device.go:256 msg="kv cache" device=CPU size="1.4 GiB"
Apr 18 11:31:24 ollama ollama[136389]: time=2026-04-18T11:31:24.697Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="433.7 MiB"
Apr 18 11:31:24 ollama ollama[136389]: time=2026-04-18T11:31:24.697Z level=INFO source=device.go:272 msg="total memory" size="7.9 GiB"
Apr 18 11:31:24 ollama ollama[136389]: time=2026-04
That log is actually pretty telling.
The big clue is here:
- CPU backend
- total memory = 7.9 GiB
So Ollama is trying to run that model fully on system RAM, and it wants almost 8 GiB just for the model + KV cache + graph. If that machine is an 8 GB box, or has other stuff running, that is very likely the problem.
That would also explain the pattern:
- tiny/manual tests may seem okay
- OpenClaw sends a bigger real prompt
- Ollama runner falls over
- OpenClaw only sees
other side closed/llama runner process has terminated
I’d check this next on 192.168.1.102:
free -h
dmesg -T | tail -50
# or
journalctl -k -n 50
You’re looking for OOM killer / out-of-memory / killed process messages.
Fastest sanity check:
- try OpenClaw with a smaller model you already have locally
- if that works, then this is basically confirmed as a RAM/capacity issue, not an OpenClaw routing issue
If you can paste the next 20-30 log lines after total memory size="7.9 GiB", especially anything with killed, oom, or runner exited, we can probably confirm it cleanly.
it is a 16gb machine
That’s actually useful. I’m checking whether qwen3.5 + Ollama CPU backend has a known crash pattern, because 16 GB makes plain RAM exhaustion less convincing.
Ah, that helps.
With 16 GB RAM, the 7.9 GiB load is no longer a smoking gun by itself. It could still be memory pressure if the box is busy, but it makes me lean much more toward:
- Ollama runner bug
- something specific to the
qwen3.5model family in Ollama - or a bad interaction that only shows up on the bigger OpenClaw prompt
I did a quick check and there are multiple Ollama issues around qwen35 architecture and llama runner process has terminated style crashes, so this does not look unique to OpenClaw.
At this point, my fastest discriminator would be:
-
Try one different Ollama model you already know works locally
- if OpenClaw works with that, the problem is almost certainly this
qwen3.5:latest+ Ollama combo
- if OpenClaw works with that, the problem is almost certainly this
-
Avoid
:latestif possible- if
ollama listshows a specific tag/variant you know runs cleanly, test that exact one instead oflatest
- if
- Still grab the exact crash tail
journalctl -u ollama -n 100 --no-pager
dmesg -T | tail -50
So my current bet is:
OpenClaw is fine, Ollama is reachable, and qwen3.5:latest is the unstable part here.
If you test another Ollama model and it works, I’d call that basically confirmed.
Apr 18 11:42:07 ollama kernel: Out of memory: Killed process 137135 (ollama) total-vm:25494504kB, anon-rss:15465488kB, file-rss:2772kB, shmem-rss:0kB, UID:1000 pgtables:38412kB oom_score>
lines 1-50/50 (END)
There it is, that’s the smoking gun.
Out of memory: Killed process ... (ollama)
So the root cause is:
- OpenClaw is talking to Ollama
- Ollama is loading/running the model
- the kernel OOM killer is killing Ollama mid-request
- OpenClaw then surfaces
other side closed/llama runner process has terminated
Even on a 16 GB box, the real working set apparently grew to about 15.4 GB RSS, which is enough to get it killed.
So this is not an OpenClaw routing bug. It’s a capacity issue on the Ollama side.
Best fixes:
- Use a smaller / lower-memory Ollama model
- Reduce effective context / prompt pressure
- Give the Ollama host more RAM or swap as a stopgap
- Make sure that box is not doing other heavy work at the same time
That also explains why simple manual tests can look okay, but OpenClaw fails: agent prompts are just bigger and push the model over the edge.
Docs: https://docs.openclaw.ai/providers/ollama
Glad we caught the real issue. If you switch to a smaller model and it starts replying, that should confirm it end to end.
suggest correct model for 16gb
I wouldn’t try to name exact models here, because that gets stale fast.
For 16 GB RAM, the safe guidance is:
- stay with materially smaller Ollama models than the one that OOM’d
- think single-digit-B class, quantized, not large
:latestvariants - expect local to be okay for basic chat/testing
- for real agentic OpenClaw use, I’d honestly recommend cloud models instead, because 16 GB is pretty tight for reliable tool-heavy runs
So my practical recommendation is:
- try a much smaller local model first
- verify it with:
openclaw infer model run --model ollama/<model> --prompt "Reply with exactly: smoke-ok" --json - for exact current community picks, ask in #models because that advice changes fast
Docs:
- https://docs.openclaw.ai/concepts/models
- https://docs.openclaw.ai/concepts/models#model-discussions-community-notes
If this thread is now basically solved, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow can index it for the next person.
cloud models cost tokens and we burn through those too quickly wiht mistakes
Yep, that’s fair. Burning cloud tokens while debugging is annoying as hell.
If you want to stay cost-conscious, I’d do this:
-
Use local as the primary model
- but make it a smaller local model that actually survives on your box
-
Test with
infer model runfirst- much cheaper than learning via failed full chat turns
openclaw infer model run --model ollama/<model> --prompt "Reply with exactly: smoke-ok" --json -
Keep sessions fresh while debugging
- long chat history = bigger prompts = more local RAM pressure too
-
Only use cloud as fallback, if at all
- local for experimentation
- cloud only when you really need a reliable answer
So I think your best move is still:
- pick a smaller local model
- verify it with
infer model run - then try the UI again
For exact current budget-friendly picks, I’d still ask in #models since that changes fast:
- https://docs.openclaw.ai/concepts/models
- https://docs.openclaw.ai/concepts/models#model-discussions-community-notes
If this fixed the root cause, please right-click the solution message → Apps → ✅ Mark Solution so it gets indexed on Answer Overflow.
using llama3:8b which should be ok for the machine, i get this when testing
12:21:47 [agent/embedded] low context window: ollama/llama3:8b ctx=8192 (warn<32000) source=modelsConfig
12:21:47 [agent/embedded] blocked model (context window too small): ollama/llama3:8b ctx=8192 (min=16000) source=modelsConfig
12:21:47 [diagnostic] lane task error: lane=main durationMs=9440 error="FailoverError: Model context window too small (8192 tokens). Minimum is 16000."
12:21:47 [diagnostic] lane task error: lane=session:agent:main:main durationMs=9442 error="FailoverError: Model context window too small (8192 tokens). Minimum is 16000."
12:21:47 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=ollama/llama3:8b candidate=ollama/llama3:8b reason=unknown next=none detail=Model context window too small (8192 tokens). Minimum is 16000.
FailoverError: Model context window too small (8192 tokens). Minimum is 16000.
Your message looks cut off 😅
llama3:8b should be a much safer test on 16 GB, so if that is still failing, the exact error text matters a lot.
Please paste these 2 things together:
- the full OpenClaw test output
- the matching Ollama log tail from the same moment
For example:
openclaw infer model run --model ollama/llama3:8b --prompt "Reply with exactly: smoke-ok" --json
journalctl -u ollama -n 80 --no-pager
If you also still see kernel kills, add:
dmesg -T | tail -30
If llama3:8b is failing too, then we may be looking at:
- another OOM condition from other processes
- a broken Ollama install/runtime
- or a different config issue than the original
qwen3.5one
Paste the actual error and we can narrow it down.
Nice, this is a different failure, and it’s much cleaner.
llama3:8b is not crashing here. OpenClaw is rejecting it up front because it sees:
ctx=8192- minimum required for the agent run is 16000
So the problem is now:
model context window too small, not networking, and not the earlier Ollama OOM crash.
The most likely reason is your OpenClaw model metadata/config says this model has only 8192 context.
Since you’re using a remote Ollama host, you probably have an explicit models.providers.ollama config. In that mode, OpenClaw trusts your configured model metadata.
So I’d check these two things:
openclaw config get models.providers.ollama
ollama show llama3:8b
What I’m looking for:
- if OpenClaw config has
contextWindow: 8192, butollama showsays the model really supports more, then fix the OpenClaw model entry - if the model really is only
8192, then OpenClaw is correctly blocking it for agent use, and you need a model with >= 16000 context
Docs:
So right now you’ve actually uncovered two separate issues:
qwen3.5→ OOM on the Ollama host
llama3:8b→ context window too small for OpenClaw agent runs
If you paste openclaw config get models.providers.ollama, I can help spot whether the context metadata is the thing that needs fixing.
what is the log command for ollama
If Ollama is running as a Linux service, use:
journalctl -u ollama -f
Recent lines without follow:
journalctl -u ollama -n 100 --no-pager
If it’s running in Docker:
docker logs -f <ollama-container-name>
If you started it manually with ollama serve, the logs are just in that terminal.
For your case, I’d use:
journalctl -u ollama -n 100 --no-pager
and paste the chunk around the failure.
12:27:04 [agent/embedded] low context window: ollama/llama3:8b ctx=8192 (warn<32000) source=modelsConfig
12:27:04 [agent/embedded] blocked model (context window too small): ollama/llama3:8b ctx=8192 (min=16000) source=modelsConfig
12:27:04 [diagnostic] lane task error: lane=main durationMs=9473 error="FailoverError: Model context window too small (8192 tokens). Minimum is 16000."
12:27:04 [diagnostic] lane task error: lane=session:agent:main:main durationMs=9475 error="FailoverError: Model context window too small (8192 tokens). Minimum is 16000."
12:27:04 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=ollama/llama3:8b candidate=ollama/llama3:8b reason=unknown next=none detail=Model context window too small (8192 tokens). Minimum is 16000.
FailoverError: Model context window too small (8192 tokens). Minimum is 16000.
but nothing moved on the ollama side
That’s still the OpenClaw log, not the Ollama log.
And the important bit is this:
source=modelsConfigblocked model (context window too small)
So for llama3:8b, OpenClaw is stopping before it even sends anything to Ollama. That means Ollama logs probably won’t show anything for this test.
The next command I want is:
openclaw config get models.providers.ollama
Because right now it looks like your OpenClaw model config says llama3:8b has contextWindow: 8192.
If you want the actual Ollama service log anyway, it’s:
journalctl -u ollama -n 100 --no-pager
But for this specific llama3:8b error, the real thing to fix is the OpenClaw model metadata/config, not Ollama itself.
Paste openclaw config get models.providers.ollama and I can help spot the bad entry.
Yep, and that’s actually expected for this one.
For llama3:8b, OpenClaw is blocking it locally because of this line:
source=modelsConfigcontext window too small
So nothing should hit Ollama for that test. No traffic, no Ollama log movement.
That means the next step is not on the Ollama box, it’s on the OpenClaw config side:
openclaw config get models.providers.ollama
If that config says llama3:8b has contextWindow: 8192, OpenClaw will refuse to use it for agent runs before making any request.
So right now:
qwen3.5= did reach Ollama, then OOM’dllama3:8b= never reached Ollama, blocked by OpenClaw model metadata
Paste openclaw config get models.providers.ollama and we can fix the second one.
{
"baseUrl": "http://192.168.1.102:11434",
"apiKey": "OPENCLAW_REDACTED",
"api": "ollama",
"models": [
{
"id": "qwen2.5:0.5b",
"name": "qwen2.5:0.5b",
"reasoning": false,
"input": [
"text"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 128000,
"maxTokens": 8192,
"api": "ollama"
},
{
"id": "qwen2.5-coder:32b",
"name": "qwen2.5-coder:32b",
"reasoning": false,
"input": [
"text"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 32768,
"maxTokens": 8192,
"api": "ollama"
},
{
"id": "qwen3.5:latest",
"name": "qwen3.5:latest",
"reasoning": false,
"input": [
"text",
"image"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 262144,
"maxTokens": 8192,
"api": "ollama"
},
{
"id": "qwen2.5:7b",
"name": "qwen2.5:7b",
"reasoning": false,
"input": [
"text"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 32768,
"maxTokens": 8192,
"api": "ollama"
},
{
"id": "llama3:8b",
"name": "llama3:8b",
"reasoning": false,
"input": [
"text"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 8192,
"maxTokens": 8192,
"api": "ollama"
}
]
}
Perfect, this explains it exactly.
Your config literally says:
{
"id": "llama3:8b",
"contextWindow": 8192
}
So OpenClaw is doing the right thing when it says:
ctx=8192minimum is 16000
That means for llama3:8b:
- nothing will move on the Ollama side
- OpenClaw blocks it before making a request
So the fix is one of these:
-
If
llama3:8breally supports more than 8192, update thatcontextWindowvalue in your OpenClaw config to the real number reported by Ollama:ollama show llama3:8b -
If 8192 is correct, then that model just isn’t eligible for OpenClaw agent runs.
From the entries you already posted, the one I’d test next is the qwen2.5:7b entry, because:
- it’s much smaller than the model that OOM’d
- your config says it has 32768 context, so it clears the 16000 minimum
So you’ve now got the full picture:
qwen3.5:latest→ reaches Ollama, then OOMsllama3:8b→ never reaches Ollama, blocked bycontextWindow: 8192qwen2.5:7b→ most promising next test from the models already in your config
Docs: https://docs.openclaw.ai/providers/ollama
If qwen2.5:7b works, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow picks it up.
Model
architecture llama
parameters 8.0B
context length 8192
embedding length 4096
quantization Q4_0
Capabilities
completion
Parameters
stop "<|start_header_id|>"
stop "<|end_header_id|>"
stop "<|eot_id|>"
num_keep 24
License
META LLAMA 3 COMMUNITY LICENSE AGREEMENT
Meta Llama 3 Version Release Date: April 18, 2024
Yep, that settles it.
ollama show confirms llama3:8b really is:
- context length: 8192
So this is not a bad OpenClaw config entry. That model genuinely only exposes 8k context, and OpenClaw is correctly rejecting it for agent runs that require 16k minimum.
So we can close the book on llama3:8b:
- it is not suitable for this OpenClaw agent use
- nothing is wrong with routing for that test
- nothing is wrong with the
contextWindowmetadata for that model
At this point, from the models you already posted, the most sensible next test is still qwen2.5:7b because:
- it advertises 32768 context in your config
- it’s much smaller than the model that OOM’d
So your picture is now pretty clean:
qwen3.5:latest→ enough context, but OOMsllama3:8b→ fits memory better, but only 8k context so OpenClaw blocks itqwen2.5:7b→ best remaining candidate from your current list
If you test that one and it replies, I’d call this solved end to end.
we used qwen2.5:7b previously and that had different issues like no thinking
Yeah, that’s the tradeoff, unfortunately.
You’ve basically found the boundary of this box:
- the bigger model has enough context, but OOMs
- the smaller model fits, but feels too dumb / no thinking
So the honest options are:
-
Local-only on 16 GB
- accept a smaller model
- okay for light chat/testing
- not great for strong agentic/reasoning behavior
-
Hybrid , my recommendation
- use local for cheap experimentation
- only switch to cloud when you actually need better reasoning