#Error: Response truncated due to output length limit

1 messages · Page 1 of 1 (latest)

hazy flower
#

Hi! I am still getting this error despite I am now using DeepSeek v4 Pro exclusively with max_tokens set to 131k, which is the max for this model. What do I need to do to avoid this error? Thanks

tidal sedge
#

logs please

static kettle
#

max_tokens is prob not what you want to mess with here.

#

max_tokens is the output cap for one generated answer, not the model's total context window.

So if DeepSeek v4 Pro has a 131k context window, setting max_tokens to 131k is usually the wrong direction. Hermes still has to fit the system prompt, tools, conversation history, and your new message into that same request.

Remove the model.max_tokens: 131000 override, or set it to a much smaller per-response cap like 16k or 32k. Use context_length only if you need to override the detected total context window; do not use max_tokens for that.

After changing it, start a fresh session with /new and retry the same task.

If it still happens, send a /debug captured right after one failed turn. The exact log line matters here because Hermes has different paths for normal long-answer truncation, truncated tool-call JSON, and provider-side output-cap/context errors.

hazy flower
#

Thanks, will do. For now I made the change you suggested and I'll see if it happens again.