logs tell a pretty clear story. here's what happened:
Root cause: compression.threshold: 0.9 is too aggressive
MiniMax-M2.7 has a ~128k context window. Your sessions keep hitting ~105-110k tokens, at which point Hermes tries to compress the transcript. With compression.threshold: 0.9, it waits until the context is 90% full before triggering compression โ that's cutting it extremely close. By the time it tries to compress, there's no room left to do the work, so it fails and resets.
Contributing factors from your logs:
- 16 sessions reset in a single day โ this isn't a one-off, it's a configuration problem
agent.max_turns: 9999 means sessions run indefinitely, accumulating massive context
- You have 137 skills loaded + mempalace MCP with 29 tools โ the system prompt alone is huge, leaving less room for conversation
- You were literally asking the agent to "split skills to reduce context" but the task itself kept overflowing
The fix:
Lower your compression threshold so it kicks in earlier:
hermes config set compression.threshold 0.7
Or even 0.6 given your setup. This gives the compressor room to actually do its job before you're jammed against the ceiling.
Also consider:
hermes config set agent.max_turns 50
9999 turns is effectively unlimited โ long-running tasks are the direct cause of 100k+ token sessions.
The kanban duplicate column max_retries error is a separate known migration bug โ it's benign (won't crash anything) but worth noting.
The gateway restart chaos was caused by you repeatedly trying to restart while an agent was mid-run. Each drain hit the 60s timeout with an active agent, requiring SIGKILL. Not a bug, just rough to deal with when sessions are hanging due to context issues.
tl;dr โ drop compression.threshold to 0.65-0.7 and set a sane max_turns limit. The resets should stop.