#question: rollback backup, because failsafe triggered, need help after running 3days openclaw

1 messages · Page 1 of 1 (latest)

dense summit
#

my system-spec`s and overview

Hello i´m 7even , german , <50 , english (not so good) and i need help!

ms-01 with opnsense + zenarmor , seperate 2.5G network

Agent Hostname IP CPU RAM GPU (VRAM) Model Role
Jarvis srv06 NVIDIA RTX 3090 (24GB) qwen3.6:27b Head / Master
Adam srv04 192.168.10.74 NVIDIA GTX 1070 (8GB) qwen3.5:4b Builder
Eva srv05 192.168.10.75 NVIDIA RTX 2070 (8GB) qwen3.5:4b Builder
Jeeves srv07 192.168.10.77 NVIDIA RTX 3070 (8GB) qwen3.5:4b Builder

All running: Ubuntu 24.04 Desktop natively (no VMs), Ollama for local inference, OpenClaw as agent framework.

The worker nodes (srv04, srv05, srv07) are lighter builds — enough to run small models locally.

First 3 Days: Everything Was Fine

The initial setup ran smoothly. Jarvis as head coordinating Adam, Eva, and Jeeves as workers. Local Ollama inference, no cloud APIs, everything on-prem. The 27b model on the 3090 handled fine. Smaller models on the workers stayed within their VRAM limits.

After 3 days of happy operation, things started degrading. OOM issues, instability, configuration drift. I've been trying to stabilize everything through 7even_failsave.md and 7even_failsaveV2.md — basically attempts to define a stable standard configuration that actually survives reboots and context shifts.

I went from "this is the future" to "I should have just restored a backup two days ago.

Option A: Restore the Rescuezilla Backup

I have a Rescuezilla backup from 2 days ago — before things started degrading. The rational choice is to restore it and start from a known-good state. so what can i do?

thorn forgeBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

swift islandBOT
# dense summit my system-spec`s and overview Hello i´m 7even , german , <50 , english (not so...

I would treat this as a recovery operation, not keep tweaking the live system.

Before restore, please capture a small read-only snapshot from each machine, especially Jarvis:

openclaw --version
openclaw status --all
openclaw gateway status
openclaw config file
openclaw config validate
openclaw logs --tail 200
ollama ps
free -h
journalctl -k --since "3 days ago" | grep -Ei "oom|killed process|out of memory" | tail -80

Do not paste secrets/tokens. If openclaw status --all redacts normally, that is the safest thing to share here.

swift islandBOT
# dense summit my system-spec`s and overview Hello i´m 7even , german , <50 , english (not so...

If your Rescuezilla image is from the known-good period, restoring it is reasonable. My suggested order:

  1. Save the current OpenClaw config/log snapshot somewhere outside the disk you will restore.
  2. Stop OpenClaw + Ollama on all nodes before restoring, so workers do not reconnect into a half-restored head node.
  3. Restore Jarvis/head first from Rescuezilla.
  4. Boot Jarvis alone. Run:
openclaw --version
openclaw status
openclaw doctor
openclaw models status
  1. Start only one worker node. Verify it pairs/connects cleanly.
  2. Add the other workers one at a time.

If the failure was "failsafe / last-known-good config restored", OpenClaw preserves rejected configs beside the active config. After boot, check this on the restored or broken system: