Kanban DB Corruption (recurring)
Hermes version: 0.15.1 (2026.5.29)
OS: Linux 5.15.0-177-generic x86_64
Python: 3.11.15
Memory provider: mnemosyne
Platform: telegram (gateway via systemd)
Kanban dispatcher: cron-based (dispatch_in_gateway: false)
Two corruption incidents
May 27 — Physical page corruption:
- Dashboard kanban plugin polled DB every ~2s and detected corruption
- Integrity check returned 10 pages unreadable (trees 2, 6, 7, 8, 14, 15, 18, 19, 26 — error 522)
- Pages 29 and 30 marked "never used"
- Created 311
.bakbackup files in a crash loop before I fixed it - Preceded by
disk I/O errorevents
May 30 — Index corruption:
wrong # of entries in index idx_events_run- Dashboard plugin detected once (not a crash loop this time)
- DB otherwise readable — 12 tasks, 7,557 task_events, 493 task_runs all salvageable by rowid scan
- Rebuilt DB by recreating schema + rowid-insert data → integrity: ok
Possible cause
Concurrent writes from dashboard plugin (every 2s poll) + gateway + hermes kanban CLI. SQLite gets corrupted when multiple processes write to WAL without coordination. The dashboard's kanban event stream polls aggressively.
Logs
- Report: https://paste.rs/s2RpS
- agent.log: https://paste.rs/jgFTB
- gateway.log: https://paste.rs/QOXbj
- Full extract with both corruption signatures and server stats also on Unraid: /mnt/user/data/Backups/hermes/hermes-kanban-debug-logs.tar.gz (but the actual broken DB was accidentally cleaned up — only the rebuilt healthy DB remains)
Fix applied locally
- Rebuilt DB from rowid scan + fresh schema
- Added
--exclude='*.corrupt*'to backup script - Removed stale 45 × .bak files and 497 MB of workspace artifacts
- Sync script changed to send only latest backup instead of entire folder