Error: Response truncated due to output length limit | Nous Research | Page 1

hazy flower May 19, 2026, 5:34 PM

#

Hi! I am still getting this error despite I am now using DeepSeek v4 Pro exclusively with max_tokens set to 131k, which is the max for this model. What do I need to do to avoid this error? Thanks

tidal sedge May 19, 2026, 5:46 PM

#

@hazy flower https://discord.com/channels/1053877538025386074/1496283715083567256

#

logs please

static kettle May 19, 2026, 6:01 PM

#

max_tokens is prob not what you want to mess with here.

#

max_tokens is the output cap for one generated answer, not the model's total context window.

So if DeepSeek v4 Pro has a 131k context window, setting max_tokens to 131k is usually the wrong direction. Hermes still has to fit the system prompt, tools, conversation history, and your new message into that same request.

Remove the model.max_tokens: 131000 override, or set it to a much smaller per-response cap like 16k or 32k. Use context_length only if you need to override the detected total context window; do not use max_tokens for that.

After changing it, start a fresh session with /new and retry the same task.

If it still happens, send a /debug captured right after one failed turn. The exact log line matters here because Hermes has different paths for normal long-answer truncation, truncated tool-call JSON, and provider-side output-cap/context errors.

hazy flower May 20, 2026, 6:19 PM

#

Thanks, will do. For now I made the change you suggested and I'll see if it happens again.

#Error: Response truncated due to output length limit