When an agent accumulates many tool calls in a single response (e.g., reading 18 files from a GitHub repo, browsing multiple web pages), the model's final text generation can take several minutes. During this phase, no new stream chunks are sent, and after 180 seconds of inactivity, HERMES_STREAM_STALE_TIMEOUT (default, hardcoded in run_agent.py line 4430) kills the connection.
The user sees no progress update — the agent appears frozen. On Telegram, the message simply never arrives.
Current env vars:
- HERMES_STREAM_READ_TIMEOUT=350 (configurable via .env ✅)
- HERMES_STREAM_STALE_TIMEOUT=180 (default, NOT configurable via config.yaml ❌, only env var)
- HERMES_API_TIMEOUT=1800 (default)
Problems:
- HERMES_STREAM_STALE_TIMEOUT is not exposed in config.yaml or hermes setup — users don't know it exists
- Default of 180s is too aggressive for complex tasks (reading many files, large codebases, multi-step research)
- No heartbeat/progress mechanism during text generation — the agent can only report progress via tool calls, not during the final generation phase
- .env file is protected by the security scanner, so even the agent can't modify it without terminal workarounds
Suggested fixes:
- Expose HERMES_STREAM_STALE_TIMEOUT in config.yaml under a sensible section
- Increase default to 300-600s
- Add a gateway-level heartbeat that sends a "typing..." or progress indicator to messaging platforms during long generations (like Telegram's "typing" action)
Affected: gateway/Telegram streaming, any platform where the user can't see partial responses. Less impactful on CLI where the user sees the spinner.