⚠️ Context limit exceeded. I've reset our conversation to start fresh - please try again. | Friends of the Crustacean 🦞🤝 | Page 1

frigid cryptBOT Apr 8, 2026, 8:28 AM

#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

jolly bloomBOT Apr 8, 2026, 8:28 AM

#

pale vineBOT Apr 8, 2026, 8:31 AM

#

You’re not crazy — this is almost certainly a config mismatch, and your reserveTokensFloor experiments probably made it worse.

reserveTokensFloor is reserved headroom, not “more context.”
So on a 16k model:

reserveTokensFloor: 100000 = totally impossible
even 20000 can be too high for 16k contexts

Do this first

openclaw --version
openclaw config set agents.defaults.compaction.reserveTokensFloor 0
openclaw config set agents.defaults.compaction.reserveTokens 2000
openclaw config set agents.defaults.compaction.keepRecentTokens 4000
openclaw gateway restart

#

Then in each chat surface (TUI/web/Discord/Telegram), send:

/new

Big second issue in your setup

If your model has contextWindow: 16000 and maxTokens: 8192, that leaves very little room for prompt + history + tools.
Set maxTokens much lower (like ~1024–2048) for 16k contexts.

Also verify Ollama mode

If you accidentally use OpenAI-compatible mode (/v1) it can cause weird context behavior unless perfectly configured.

Check:

#

openclaw config get models.providers.ollama.baseUrl
openclaw config get models.providers.ollama.api

For stable tool behavior, prefer:

base URL without /v1
api: "ollama"

If it still loops after this, paste these and I’ll pinpoint it fast:

openclaw --version

#

openclaw status --all
openclaw models status --json
openclaw config get agents.defaults.compaction

Docs: https://docs.openclaw.ai/concepts/compaction • https://docs.openclaw.ai/reference/session-management-compaction • https://docs.openclaw.ai/providers/ollama

plain sentinel Apr 8, 2026, 8:33 AM

#

ok can you explain to me what max tokens even do/are ?

pale vineBOT Apr 8, 2026, 8:33 AM

#

Yep — super important one:

maxTokens = the maximum number of tokens the model is allowed to generate in its reply.
It’s an output cap, not extra memory.

Think of it like:

Context window = total budget (input + output)
Input = system prompt + chat history + your message + tool text
Output = assistant reply (capped by maxTokens)

So roughly:

input tokens + output tokens <= context window

#

For your case (16k context), if maxTokens is 8192, you’re letting output take up half the entire window.
That leaves much less room for history/tools, which makes overflow way easier.

Practical rule for 16k models

Start with maxTokens around 1024–2048
Only raise it if you truly need long replies

Also: that error text saying “set reserveTokensFloor to 20000+” is a generic hint — for small-context local models, that can actually be the wrong move.

plain sentinel Apr 8, 2026, 8:40 AM

#

ok well its still gfiving me this ⚠️ Context limit exceeded. I've reset our conversation to start fresh - please try again.

To prevent this, increase your compaction buffer by setting agents.defaults.compaction.reserveTokensFloor to 20000 or higher in your config.

so heres my ollama serve atleast the ending of it time=2026-04-08T04:37:26.864-04:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\Users\Sour\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 56108"
time=2026-04-08T04:37:29.452-04:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1032 name=ROCm0 description="AMD Radeon RX 6600 XT" libdirs=ollama,rocm driver=60450.10 pci_id=0000:09:00.0 type=discrete total="8.0 GiB" available="6.4 GiB"
time=2026-04-08T04:37:29.453-04:00 level=INFO source=routes.go:1832 msg="vram-based default context" total_vram="8.0 GiB" default_num_ctx=4096

#

@pale vine ok well its still gfiving me this ⚠️ Context limit exceeded. I've reset our conversation to start fresh - please try again.

To prevent this, increase your compaction buffer by setting agents.defaults.compaction.reserveTokensFloor to 20000 or higher in your config.

so heres my ollama serve atleast the ending of it time=2026-04-08T04:37:26.864-04:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\Users\Sour\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 56108"
time=2026-04-08T04:37:29.452-04:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1032 name=ROCm0 description="AMD Radeon RX 6600 XT" libdirs=ollama,rocm driver=60450.10 pci_id=0000:09:00.0 type=discrete total="8.0 GiB" available="6.4 GiB"
time=2026-04-08T04:37:29.453-04:00 level=INFO source=routes.go:1832 msg="vram-based default context" total_vram="8.0 GiB" default_num_ctx=4096

#

⚠️ Context limit exceeded. I've reset our conversation to start fresh - please try again.

#

@pale vine

plain sentinel Apr 8, 2026, 3:27 PM

#

@pale vine

plain sentinel Apr 8, 2026, 5:44 PM

#

alright im in one hell of a pickle because my setup is a little interesting, im running a 6600xt which ollama doesnt support, so i did the amd for ollama setup with sdk/rocm driver stuff, i then had a WSL2 of ubuntu running my openclaw and these linked to eachother local host running qwen3.5:4b on my 6600xt through ollama serve, it was good enough it worked and this all took me like 3 days to set up BUT right now i keep getting hit with the issue, even swapping my model qwen3.5:2b, after using different versions of this command for "different amounts?" (openclaw config set agents.defaults.compaction.reserveTokensFloor 100000) (i would use like 8000 or 16000 etc) NOW ITS AT 2048. i have a custom context amount set for my qwen3.5:4b its set to 16384 context + 1024 of "max tokens"? and this ran fine but now i have THE ISSUE:

⚠️ Context limit exceeded. I've reset our conversation to start fresh - please try again.

To prevent this, increase your compaction buffer by setting agents.defaults.compaction.reserveTokensFloor to 20000 or higher in your config.

WHAT DO I DO ABOUT THIS AS IVE ALREADY CONFIGURED IT

quasi anchor Apr 8, 2026, 6:39 PM

#

@plain sentinel unfortunately the openclaw sys prompt is quite chunky and will be goobling up your 16k ctx window pretty quickly, you should also consider testing with a llama.cpp rocm build

https://rocm.docs.amd.com/projects/llama-cpp/en/docs-26.02/install/llama-cpp-install.html

plain sentinel Apr 8, 2026, 6:45 PM

#

quasi anchor <@252481944741347328> unfortunately the openclaw sys prompt is quite chunky and ...

i dont think my gpu even supports rocm/hipsdk, as why i used the amd for ollama github fork, + the rocm and roblas stuff for my gpu 6600xt, GFX1032

quasi anchor Apr 8, 2026, 7:00 PM

#

plain sentinel i dont think my gpu even supports rocm/hipsdk, as why i used the amd for ollama...

https://www.reddit.com/r/LocalLLaMA/comments/1odzx7g/trying_to_run_gguf_with_amd_radeon_rx6600xt/

#

@plain sentinel you will get so many more server configuration parameters to experiment with on llama.cpp until you land on the one that works for your env, best of luck !!!

plain sentinel Apr 8, 2026, 9:37 PM

#

@pale vine

plain sentinel Apr 9, 2026, 12:23 AM

#

@pale vine

#

@pale vine i am trying to run my ai on my local machine but i keep getitng this issue Error: 500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details

spiral garden Apr 9, 2026, 12:10 PM

#

hey i have built a solution for this, should I share?

pale vineBOT Apr 9, 2026, 12:10 PM

#

⚠️ Missing API key for provider "openai". Configure the gateway auth for that provider, then try again.

plain sentinel Apr 10, 2026, 9:56 AM

#

spiral garden hey i have built a solution for this, should I share?

Yes. You should please as I still haven’t fixed it

pale vineBOT Apr 10, 2026, 9:56 AM

#

Yeah, please share it here. The steps or workaround, plus openclaw --version, would help a lot, especially if it’s version-specific.

plain sentinel Apr 10, 2026, 3:14 PM

#

pale vine Yeah, please share it here. The steps or workaround, plus `openclaw --version`, ...

Can you look into the context issue your self

pale vineBOT Apr 10, 2026, 3:20 PM

#

Let me check the docs and source for that specific context-limit path so I don’t guess.

#

Yeah, I dug into the docs/source a bit, and I think I see the trap.

For a 16k local model, that “set reserveTokensFloor to 20000+” message is basically a generic hint, not good advice for your setup.

What stands out from your thread:

Pi compaction uses contextWindow - reserveTokens
OpenClaw also has a reserveTokensFloor that can bump reserveTokens up
with a 16k model, if reserveTokens is still something like 16384 or gets bumped high, you can end up with almost no usable room, so resets keep happening
your Ollama log also showed default_num_ctx=4096, so there’s a decent chance the runtime is effectively still 4k, not 16k

So the likely issue is not just reserveTokensFloor, it’s also reserveTokens and/or the actual Ollama context being lower than you expect.

Can you paste these first:

#

openclaw --version
openclaw status
openclaw config get agents.defaults.compaction --json
openclaw models status

And also tell me:

are you using native Ollama (http://...:11434) or OpenAI-compatible /v1?
what are your model settings for:
- contextWindow
- contextTokens (if set)
- maxTokens

If you’re on /v1, that’s another red flag, docs recommend the native Ollama API instead: https://docs.openclaw.ai/providers/ollama

#⚠️ Context limit exceeded. I've reset our conversation to start fresh - please try again.

Do this first

Big second issue in your setup

Also verify Ollama mode

Practical rule for 16k models