Context size reduction issue | Nous Research | Page 1

topaz ridge · 2026-05-23T15:04:23.719Z

Hi, I'm using llamacpp server with 96k context size. In the prompt process, agent dynamically reduces context size to lower numbers like 96k to 32k then 8k. I have set the size in the config file too but the issue persists. Using the latest version as of today. Thanks.

For llama.cpp, both sides need to agree on the context window.

Hermes can use model.context_length to decide its own compression/request budget, but llama.cpp still has to be launched with a matching context size. If the server is actually running with 32k or 8k, Hermes setting 96000 will not make the backend accept 96k prompts.

On the llama.cpp server side, make sure it is started with the 96k context you intend to use, for example the equivalent of:

--ctx-size 96000

Then set Hermes to the same plain integer under the top-level model: block:

model:
context_length: 96000

Not 96k, not nested under compression, and not only in the llama.cpp server config.

Docs for the Hermes setting:

https://hermes-agent.nousresearch.com/docs/integrations/providers#context-length

If it still steps down to 32k or 8k after both sides are set, please paste hermes debug share from that same turn plus the relevant model: block from ~/.hermes/config.yaml and the llama.cpp server launch command. That should show whether Hermes is applying the override or reacting to an actual context-limit error from llama.cpp.

#Context size reduction issue