#Excessive Preflight compression messages

1 messages · Page 1 of 1 (latest)

rose hornet
#

what you were trying to do
hermes is helping me redesign/code a pygame display
what happened instead
That part works OK it's geneerating these messages almost every turn:
─────────────────────────────────────
📦 Preflight compression: ~117,193 tokens >= 102,400 threshold. This may take a moment.
🗜️ Compacting context — summarizing earlier conversation so I can continue...
⚠️ Session compressed 7 times — accuracy may degrade. Consider /new to start fresh.
🗜️ Compacting context — summarizing earlier conversation so I can continue...
⚠️ Session compressed 8 times — accuracy may degrade. Consider /new to start fresh.
🗜️ Compacting context — summarizing earlier conversation so I can continue...
⚠️ Session compressed 9 times — accuracy may degrade. Consider /new to start fresh.
The 117,193 number goes up every time. It never goes down
how you installed Hermes
I don't recall i think it was a one liner.
your OS / Docker / WSL / terminal app if relevant
I am running it in an Ubuntu 24.04 lxc container on a linux mint platform.
your provider / model / platform
The provider/model is MiniMax-M2.7
what you already tried
I don't know what to try?
the relevant logs
Debug report uploaded:
Report https://paste.rs/gzi6h
agent.log https://paste.rs/ihtVy
gateway.log https://paste.rs/qKaw7

#

Also it suggests a /new but when I did that it said I would lose the conversation. That really isn't an option. I don't think I want to start over again.

#

It is almost unusable. It keeps losing my responses to its questions.

limpid current
#

The logs point to Hermes compacting based on a rough preflight estimate, not because the actual MiniMax request is still too large.

In your report, the preflight estimate is staying around 113k+ tokens against a 102,400 threshold, but the actual MiniMax calls after compression are only around 58k prompt tokens. So the conversation is fitting after compression, but Hermes keeps seeing the rough estimate and compacting again on later turns. That explains the repeated compression messages and the rising compression count.

You do not need to use /new as the first step. For now, raise the compression threshold for this MiniMax-M2.7 setup and restart/resume the same session:

hermes config set compression.threshold 0.85

Then restart Hermes and resume the existing session instead of starting a new one. You can find the session with:

hermes sessions list

and resume it with:

hermes --resume <session_id>

If the extra background review passes keep triggering more compaction during this coding session, also disable those automatic nudges for now:

hermes config set memory.nudge_interval 0
hermes config set skills.creation_nudge_interval 0

Those settings avoid the repeated review/compaction churn while keeping your current conversation recoverable. The part where answers to Hermes' questions seem to get lost also lines up with long-session compression behavior, so keeping the same session but reducing the repeated compaction should make it less disruptive while we look into a possible code-side fix.

limpid current
#

I put up a code fix for the repeated preflight compaction loop here:

https://github.com/NousResearch/hermes-agent/pull/34861

The fix keeps preflight compression for genuinely oversized sessions, but stops repeatedly compacting based only on the rough estimate when the provider’s last real prompt count showed the compressed request fits. It also makes the preflight compression loop respect the existing ineffective-compression guard, so tiny/no-op compaction passes do not keep stacking.

cc @tacit wolf or @next pier (review?)

GitHub

What does this PR do?
Fixes a repeated preflight-compression loop caused by treating schema-heavy rough token estimates as if they were real provider context usage after compression.
In long tool-h...