#Error: Response truncated due to output length limit
1 messages · Page 1 of 1 (latest)
max_tokens is prob not what you want to mess with here.
max_tokens is the output cap for one generated answer, not the model's total context window.
So if DeepSeek v4 Pro has a 131k context window, setting max_tokens to 131k is usually the wrong direction. Hermes still has to fit the system prompt, tools, conversation history, and your new message into that same request.
Remove the model.max_tokens: 131000 override, or set it to a much smaller per-response cap like 16k or 32k. Use context_length only if you need to override the detected total context window; do not use max_tokens for that.
After changing it, start a fresh session with /new and retry the same task.
If it still happens, send a /debug captured right after one failed turn. The exact log line matters here because Hermes has different paths for normal long-answer truncation, truncated tool-call JSON, and provider-side output-cap/context errors.
Thanks, will do. For now I made the change you suggested and I'll see if it happens again.