#⚠️ Context limit exceeded. I've reset our conversation to start fresh - please try again.
1 messages · Page 1 of 1 (latest)
You’re not crazy — this is almost certainly a config mismatch, and your reserveTokensFloor experiments probably made it worse.
reserveTokensFloor is reserved headroom, not “more context.”
So on a 16k model:
reserveTokensFloor: 100000= totally impossible- even
20000can be too high for 16k contexts
Do this first
openclaw --version
openclaw config set agents.defaults.compaction.reserveTokensFloor 0
openclaw config set agents.defaults.compaction.reserveTokens 2000
openclaw config set agents.defaults.compaction.keepRecentTokens 4000
openclaw gateway restart
Then in each chat surface (TUI/web/Discord/Telegram), send:
/new
Big second issue in your setup
If your model has contextWindow: 16000 and maxTokens: 8192, that leaves very little room for prompt + history + tools.
Set maxTokens much lower (like ~1024–2048) for 16k contexts.
Also verify Ollama mode
If you accidentally use OpenAI-compatible mode (/v1) it can cause weird context behavior unless perfectly configured.
Check:
openclaw config get models.providers.ollama.baseUrl
openclaw config get models.providers.ollama.api
For stable tool behavior, prefer:
- base URL without
/v1 api: "ollama"
If it still loops after this, paste these and I’ll pinpoint it fast:
openclaw --version
openclaw status --all
openclaw models status --json
openclaw config get agents.defaults.compaction
Docs: https://docs.openclaw.ai/concepts/compaction • https://docs.openclaw.ai/reference/session-management-compaction • https://docs.openclaw.ai/providers/ollama
ok can you explain to me what max tokens even do/are ?
Yep — super important one:
maxTokens = the maximum number of tokens the model is allowed to generate in its reply.
It’s an output cap, not extra memory.
Think of it like:
- Context window = total budget (input + output)
- Input = system prompt + chat history + your message + tool text
- Output = assistant reply (capped by
maxTokens)
So roughly:
input tokens + output tokens <= context window
For your case (16k context), if maxTokens is 8192, you’re letting output take up half the entire window.
That leaves much less room for history/tools, which makes overflow way easier.
Practical rule for 16k models
- Start with
maxTokensaround 1024–2048 - Only raise it if you truly need long replies
Also: that error text saying “set reserveTokensFloor to 20000+” is a generic hint — for small-context local models, that can actually be the wrong move.
ok well its still gfiving me this ⚠️ Context limit exceeded. I've reset our conversation to start fresh - please try again.
To prevent this, increase your compaction buffer by setting agents.defaults.compaction.reserveTokensFloor to 20000 or higher in your config.
so heres my ollama serve atleast the ending of it time=2026-04-08T04:37:26.864-04:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\Users\Sour\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 56108"
time=2026-04-08T04:37:29.452-04:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1032 name=ROCm0 description="AMD Radeon RX 6600 XT" libdirs=ollama,rocm driver=60450.10 pci_id=0000:09:00.0 type=discrete total="8.0 GiB" available="6.4 GiB"
time=2026-04-08T04:37:29.453-04:00 level=INFO source=routes.go:1832 msg="vram-based default context" total_vram="8.0 GiB" default_num_ctx=4096
@pale vine ok well its still gfiving me this ⚠️ Context limit exceeded. I've reset our conversation to start fresh - please try again.
To prevent this, increase your compaction buffer by setting agents.defaults.compaction.reserveTokensFloor to 20000 or higher in your config.
so heres my ollama serve atleast the ending of it time=2026-04-08T04:37:26.864-04:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\Users\Sour\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 56108"
time=2026-04-08T04:37:29.452-04:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1032 name=ROCm0 description="AMD Radeon RX 6600 XT" libdirs=ollama,rocm driver=60450.10 pci_id=0000:09:00.0 type=discrete total="8.0 GiB" available="6.4 GiB"
time=2026-04-08T04:37:29.453-04:00 level=INFO source=routes.go:1832 msg="vram-based default context" total_vram="8.0 GiB" default_num_ctx=4096
⚠️ Context limit exceeded. I've reset our conversation to start fresh - please try again.
@pale vine
@pale vine
alright im in one hell of a pickle because my setup is a little interesting, im running a 6600xt which ollama doesnt support, so i did the amd for ollama setup with sdk/rocm driver stuff, i then had a WSL2 of ubuntu running my openclaw and these linked to eachother local host running qwen3.5:4b on my 6600xt through ollama serve, it was good enough it worked and this all took me like 3 days to set up BUT right now i keep getting hit with the issue, even swapping my model qwen3.5:2b, after using different versions of this command for "different amounts?" (openclaw config set agents.defaults.compaction.reserveTokensFloor 100000) (i would use like 8000 or 16000 etc) NOW ITS AT 2048. i have a custom context amount set for my qwen3.5:4b its set to 16384 context + 1024 of "max tokens"? and this ran fine but now i have THE ISSUE:
⚠️ Context limit exceeded. I've reset our conversation to start fresh - please try again.
To prevent this, increase your compaction buffer by setting agents.defaults.compaction.reserveTokensFloor to 20000 or higher in your config.
WHAT DO I DO ABOUT THIS AS IVE ALREADY CONFIGURED IT
@plain sentinel unfortunately the openclaw sys prompt is quite chunky and will be goobling up your 16k ctx window pretty quickly, you should also consider testing with a llama.cpp rocm build
https://rocm.docs.amd.com/projects/llama-cpp/en/docs-26.02/install/llama-cpp-install.html
i dont think my gpu even supports rocm/hipsdk, as why i used the amd for ollama github fork, + the rocm and roblas stuff for my gpu 6600xt, GFX1032
@plain sentinel you will get so many more server configuration parameters to experiment with on llama.cpp until you land on the one that works for your env, best of luck !!!
@pale vine
@pale vine
@pale vine i am trying to run my ai on my local machine but i keep getitng this issue Error: 500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details
hey i have built a solution for this, should I share?
⚠️ Missing API key for provider "openai". Configure the gateway auth for that provider, then try again.
Yes. You should please as I still haven’t fixed it
Yeah, please share it here. The steps or workaround, plus openclaw --version, would help a lot, especially if it’s version-specific.
Can you look into the context issue your self
Let me check the docs and source for that specific context-limit path so I don’t guess.
Yeah, I dug into the docs/source a bit, and I think I see the trap.
For a 16k local model, that “set reserveTokensFloor to 20000+” message is basically a generic hint, not good advice for your setup.
What stands out from your thread:
- Pi compaction uses
contextWindow - reserveTokens - OpenClaw also has a
reserveTokensFloorthat can bumpreserveTokensup - with a 16k model, if
reserveTokensis still something like 16384 or gets bumped high, you can end up with almost no usable room, so resets keep happening - your Ollama log also showed
default_num_ctx=4096, so there’s a decent chance the runtime is effectively still 4k, not 16k
So the likely issue is not just reserveTokensFloor, it’s also reserveTokens and/or the actual Ollama context being lower than you expect.
Can you paste these first:
openclaw --version
openclaw status
openclaw config get agents.defaults.compaction --json
openclaw models status
And also tell me:
- are you using native Ollama (
http://...:11434) or OpenAI-compatible/v1? - what are your model settings for:
contextWindowcontextTokens(if set)maxTokens
If you’re on /v1, that’s another red flag, docs recommend the native Ollama API instead: https://docs.openclaw.ai/providers/ollama