#1006

1 messages · Page 1 of 1 (latest)

charred sluice
#

my openclaw (running on VPS) has become almost impossible to use. the problem seems to be around PID for openclaw-gateway using sometimes more than 100% of CPU . i was trying to get openclaw to help and fix itself (but it's really slow because of this runaway pid). but it last asked me to run kill the gateway, wait, restart, then probe and got this
:

ubuntu@polymarket-mc:~$ ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd
PID STAT ELAPSED %CPU %MEM RSS CMD
1656370 Rsl 00:20 115 3.3 833576 openclaw-gateway

shrewd shadowBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

mental hamletBOT
#

That 115% means the gateway is burning a bit over one full CPU core. After only 20s uptime, I’d treat this as a tight loop/regression, not “normal startup”.

First thing: please paste the version:

openclaw --version

Then run these from SSH, not through the slow bot:

which openclaw
openclaw gateway status --deep
openclaw status --all
openclaw logs --limit 300 --plain --no-color
#

If the CLI is too slow/unresponsive, grab systemd logs instead:

systemctl --user status openclaw-gateway --no-pager
journalctl --user -u openclaw-gateway -n 200 --no-pager

Please skim/redact anything private before posting logs. The useful bits are repeated errors/warnings near startup.

Docs for the exact triage ladder: https://docs.openclaw.ai/help/faq and https://docs.openclaw.ai/gateway/troubleshooting

charred sluice
#

OpenClaw 2026.4.26 (be8c246)
Service: systemd (enabled)
File logs: /tmp/openclaw/openclaw-2026-04-28.log
Command: /home/ubuntu/.nvm/versions/node/v24.14.1/bin/node /home/ubuntu/.nvm/versions/node/v24.14.1/lib/node_modules/openclaw/dist/index.js gateway --port 18789
Service file: ~/.config/systemd/user/openclaw-gateway.service
Service env: OPENCLAW_GATEWAY_PORT=18789

Service config looks out of date or non-standard.
Service config issue: Gateway service PATH includes version managers or package managers; recommend a minimal PATH. (/home/ubuntu/.nvm/versions/node/v24.14.1/bin)
Service config issue: Gateway service uses Node from a version manager; it can break after upgrades. (/home/ubuntu/.nvm/versions/node/v24.14.1/bin/node)
Recommendation: run "openclaw doctor" (or "openclaw doctor --repair").
Config (cli): ~/.openclaw/openclaw.json
Config (service): ~/.openclaw/openclaw.json

Gateway: bind=loopback (127.0.0.1), port=18789 (service args)
Probe target: ws://127.0.0.1:18789
Dashboard: http://127.0.0.1:18789/
Probe note: Loopback-only gateway; only local clients can connect.

Runtime: running (pid 1656370, state active, sub running, last exit 0, reason 0)
Connectivity probe: ok
Capability: admin-capable

mental hamletBOT
#

Good, that confirms the gateway is reachable but the service install is suspicious: it’s running from nvm Node 24.14.1, and OpenClaw is explicitly warning that the service file is stale/non-standard.

I’d do this next, in order:

openclaw doctor

If it recommends the service repair/migration, then run:

openclaw doctor --repair
openclaw gateway restart

After restart, immediately check CPU again:

#
sleep 20
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd

If CPU is still pinned, please paste these:

openclaw gateway stability --json
openclaw logs --limit 500 --plain --no-color

The stability output is useful here because OpenClaw records liveness warnings when the Node event loop / CPU is saturated.

Docs refs:
https://docs.openclaw.ai/help/faq
https://docs.openclaw.ai/gateway/diagnostics

charred sluice
#

i ran the repair, but i don't think it actually changes the node thing. it just recommends

◇ Gateway service config ────────────────────────────────────────────────────────────╮
│ │
│ - Gateway service PATH includes version managers or package managers; recommend a │
│ minimal PATH. (/home/ubuntu/.nvm/versions/node/v24.14.1/bin)◇ Gateway service config ──────────────────────────────────────

mental hamletBOT
#

Yep — that warning may stay after repair. Doctor only migrates off nvm if it can find a suitable system Node; otherwise it just warns.

Two tracks:

1) Get the CPU clue first — this matters more than the Node warning:

openclaw gateway stability --json
openclaw logs --limit 500 --plain --no-color

2) Check whether system Node exists:

type -a node npm openclaw
/usr/bin/node -v || true
#
/usr/bin/npm -v || true

If /usr/bin/node is missing, that explains why repair didn’t migrate. The nvm service path is not ideal, but it may be separate from the runaway CPU. Paste the stability/log output and we can tell whether it’s stuck in a channel loop, session loop, plugin/runtime deps issue, etc.

Docs: https://docs.openclaw.ai/gateway/doctor

charred sluice
#

gateway connect failed: Error: gateway closed (1000):
â—‡
Gateway not reachable. Is it running and accessible?
Gateway target: ws://127.0.0.1:18789
Source: local loopback
Config: /home/ubuntu/.openclaw/openclaw.json
Bind: loopback
Hint: run openclaw doctor.

ubuntu@polymarket-mc:~$ type -a node npm openclaw
node is /home/ubuntu/.nvm/versions/node/v24.14.1/bin/node
node is /usr/bin/node
node is /bin/node
npm is /home/ubuntu/.nvm/versions/node/v24.14.1/bin/npm
npm is /usr/bin/npm
npm is /bin/npm
openclaw is /home/ubuntu/.nvm/versions/node/v24.14.1/bin/openclaw
openclaw is /usr/bin/openclaw
openclaw is /bin/openclaw
ubuntu@polymarket-mc:~$
v22.22.0
ubuntu@polymarket-mc:~$ /usr/bin/npm -v || true
10.9.4

mental hamletBOT
#

Aha — you have a split install:

  • shell default: ~/.nvm/.../openclaw
  • system install also exists: /usr/bin/openclaw
  • system Node exists and is supported: v22.22.0

Check both OpenClaw versions first:

openclaw --version
/usr/bin/openclaw --version

If /usr/bin/openclaw is the same/new enough, force-rewrite the gateway service from the system install:

#
systemctl --user stop openclaw-gateway
/usr/bin/openclaw gateway install --force
/usr/bin/openclaw gateway restart
/usr/bin/openclaw gateway status --deep

Then verify the command changed away from ~/.nvm/...:

ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd
systemctl --user cat openclaw-gateway

If /usr/bin/openclaw is old, don’t force it yet — paste both version outputs.

For the current “gateway closed (1000)” state, also grab logs directly:

#
journalctl --user -u openclaw-gateway -n 200 --no-pager
tail -n 300 /tmp/openclaw/openclaw-2026-04-28.log
charred sluice
#

not sure how i have two installs. i only had one before - maybe my openclaw screwed something up at somepoint

ubuntu@polymarket-mc:~$ openclaw --version
OpenClaw 2026.4.26 (be8c246)
ubuntu@polymarket-mc:~$ /usr/bin/openclaw --version
OpenClaw 2026.4.22 (00bd2cf)
ubuntu@polymarket-mc:~$

mental hamletBOT
#

Yep, that’s the split-brain: new CLI in nvm, older CLI in /usr/bin. Don’t use /usr/bin/openclaw gateway install yet — 2026.4.22 may be too old for config touched by 2026.4.26.

Update the system install first, then reinstall the service from that path:

sudo /usr/bin/npm install -g openclaw@latest
/usr/bin/openclaw --version

Make sure it shows 2026.4.26 or newer. Then:

systemctl --user stop openclaw-gateway
/usr/bin/openclaw gateway install --force
/usr/bin/openclaw gateway restart
/usr/bin/openclaw gateway status --deep
#

Then verify the service no longer points into nvm:

ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd
systemctl --user cat openclaw-gateway

Totally plausible this came from PATH/install drift rather than anything you intentionally did. The key is to get one current install owning the systemd service.

charred sluice
#

ubuntu@polymarket-mc:~$ sudo /usr/bin/npm install -g openclaw@latest
â ™
npm warn deprecated node-domexception@1.0.0: Use your platform's native DOMException instead

added 32 packages, removed 44 packages, and changed 402 packages in 20s

70 packages are looking for funding
run npm fund for details
npm notice
npm notice New major version of npm available! 10.9.4 -> 11.13.0
npm notice Changelog: https://github.com/npm/cli/releases/tag/v11.13.0
npm notice To update run: npm install -g npm@11.13.0
npm notice

mental hamletBOT
#

Good. Ignore the npm-major-version notice for now.

Next, confirm the system install updated:

/usr/bin/openclaw --version

If it’s 2026.4.26 or newer, do the service rewrite:

systemctl --user stop openclaw-gateway
/usr/bin/openclaw gateway install --force
/usr/bin/openclaw gateway restart
/usr/bin/openclaw gateway status --deep
#

Then:

sleep 20
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd

If CPU is still high after that, paste gateway status --deep plus the ps output and we’ll move to logs/stability.

charred sluice
#

Service env: OPENCLAW_GATEWAY_PORT=18789sically a Swiss Army knife, but with more opinions and fewer sharp edges.

Service config looks out of date or non-standard.
Service config issue: Gateway service PATH includes version managers or package managers; recommend a minimal PATH. (/home/ubuntu/.nvm/versions/node/v24.14.1/bin)
Service config issue: Gateway service uses Node from a version manager; it can break after upgrades. (/home/ubuntu/.nvm/versions/node/v24.14.1/bin/node)
Recommendation: run "openclaw doctor" (or "openclaw doctor --repair").
Config (cli): ~/.openclaw/openclaw.json
Config (service): ~/.openclaw/openclaw.json

Gateway: bind=loopback (127.0.0.1), port=18789 (service args)
Probe target: ws://127.0.0.1:18789
Dashboard: http://127.0.0.1:18789/
Probe note: Loopback-only gateway; only local clients can connect.

Runtime: running (pid 1681727, state active, sub running, last exit 0, reason 0)
Warm-up: launch agents can take a few seconds. Try again shortly.
Connectivity probe: failed
Probe target: ws://127.0.0.1:18789
timeout
Capability: unknown

Port 18789 is already in use.

  • pid 1681727 ubuntu: openclaw-gateway (127.0.0.1:18789)
  • Gateway already running locally. Stop it (openclaw gateway stop) or use a different port.
    Listening: 127.0.0.1:18789
    Troubles: run openclaw status
    Troubleshooting: https://docs.openclaw.ai/troubleshooting
    ubuntu@polymarket-mc:~$ sleep 20
    ,%mem,rss,cmd
    ubuntu@polymarket-mc:~$ ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd
    PID STAT ELAPSED %CPU %MEM RSS CMD
    1681727 Rsl 02:34 90.8 4.4 1081636 openclaw-gateway
mental hamletBOT
#

That means the service still did not get rewritten — it’s still launching from ~/.nvm/....

Important nuance: even /usr/bin/openclaw may still run under nvm Node because its shebang uses env node, and your shell PATH has nvm first. Force a clean PATH for the install:

export SYSTEM_PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

env PATH="$SYSTEM_PATH" /usr/bin/openclaw --version

systemctl --user stop openclaw-gateway
env PATH="$SYSTEM_PATH" /usr/bin/openclaw gateway install --force
systemctl --user daemon-reload
env PATH="$SYSTEM_PATH" /usr/bin/openclaw gateway restart

Then check:

#
systemctl --user cat openclaw-gateway | sed -n '1,160p'
sleep 20
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd

If systemctl --user cat still shows .nvm, paste it. Also that weird 18789sically a Swiss Army knife... line is odd — if it’s really in the service file and not just copy/paste garbling, we need to see the unit file.

charred sluice
#

discord is blocking the message send. i can send parts:
[Service]
ExecStart=/usr/bin/node /usr/lib/node_modules/openclaw/dist/index.js gateway --port 18789
Restart=always
RestartSec=5
RestartPreventExitStatus=78
TimeoutStopSec=30
TimeoutStartSec=30
SuccessExitStatus=0 143
KillMode=control-group
Environment=HOME=/home/ubuntu
Environment=TMPDIR=/tmp
Environment=NODE_EXTRA_CA_CERTS=/etc/ssl/certs/ca-certificates.crt
Environment=PATH=/usr/bin:/home/ubuntu/.nvm/current/bin:/home/ubuntu/.local/bin:/home/ubuntu/.npm-global/bin:/home/ubuntu/bin:/home/ubuntu/.volta/bin:/home/ubuntu/.asdf/shims:/home/ubuntu/.bun/bin:/home/ubuntu/.nix-profile/bin:/home/ubuntu/.local/share/fnm/aliases/default/bin:/home/ubuntu/.local/share/fnm/current/bin:/home/ubuntu/.fnm/aliases/default/bin:/home/ubuntu/.fnm/current/bin:/home/ubuntu/.local/share/pnpm:/usr/local/bin:/bin:/home/ubuntu/.nvm/versions/node/v24.14.1/bin

mental hamletBOT
#

That unit is actually much better now: ExecStart=/usr/bin/node /usr/lib/node_modules/openclaw/... means the gateway is not launching from nvm anymore. The remaining warning is probably because the service PATH still contains nvm entries, but /usr/bin is first and ExecStart is absolute, so I’d stop chasing the Node install for the moment.

Now we need the CPU cause. Since RPC/probe is timing out, use direct files:

PID=$(pidof openclaw-gateway)
readlink -f /proc/$PID/exe
tr '\0' ' ' < /proc/$PID/cmdline; echo

Then collect logs as files so Discord doesn’t eat them:

journalctl --user -u openclaw-gateway -n 300 --no-pager > openclaw-journal.txt
tail -n 800 /tmp/openclaw/openclaw-2026-04-28.log > openclaw-filelog.txt
#
grep -iE "error|warn|diagnostic|liveness|exception|timeout|poll|plugin|restart|loop|rate" \
  openclaw-journal.txt openclaw-filelog.txt | tail -n 200

Paste the filtered grep output, or attach the two .txt files after skimming/redacting. Don’t paste full environment output; it can contain secrets.

If the grep shows a repeated channel/plugin line, that’s likely our runaway loop. If logs are quiet while CPU is pinned, then we’re probably looking at a tighter internal loop and should grab a short CPU profile next.

charred sluice
#

openclaw-filelog.txt:{"0":"{"subsystem":"telegram/network"}","1":"fetch fallback: enabling sticky IPv4-only dispatcher (codes=ETIMEDOUT,ENETUNREACH)","_meta":{"runtime":"node","runtimeVersion":"22.22.0","hostname":"unknown","name":"{"subsystem":"telegram/network"}","parentNames":["openclaw"],"date":"2026-04-28T18:43:42.237Z","logLevelId":4,"logLevelName":"WARN","path":{"fullFilePath":"file:///usr/lib/node_modules/openclaw/dist/subsystem-rHhUC6qs.js:147:14","fileName":"subsystem-rHhUC6qs.js","fileNameWithLine":"subsystem-rHhUC6qs.js:147","fileColumn":"14","fileLine":"147","filePath":"/usr/lib/node_modules/openclaw/dist/subsystem-rHhUC6qs.js","filePathWithLine":"/usr/lib/node_modules/openclaw/dist/subsystem-rHhUC6qs.js:147","method":"logToFile"}},"time":"2026-04-28T18:43:42.237+00:00","hostname":"polymarket-mc","message":"fetch fallback: enabling sticky IPv4-only dispatcher (codes=ETIMEDOUT,ENETUNREACH)"}

mental hamletBOT
#

That log points at Telegram network trouble, not the nvm issue anymore:

telegram/network … ETIMEDOUT, ENETUNREACH … sticky IPv4-only dispatcher

Let’s isolate whether Telegram polling is the CPU loop.

Temporarily disable Telegram and restart:

/usr/bin/openclaw config set channels.telegram.enabled false --strict-json
/usr/bin/openclaw gateway restart
sleep 30
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd

If CPU drops, we found the subsystem. Then either leave Telegram off, or re-enable with IPv4-only behavior:

#
/usr/bin/openclaw config set channels.telegram.enabled true --strict-json
/usr/bin/openclaw config set channels.telegram.network.autoSelectFamily false --strict-json
/usr/bin/openclaw gateway restart

Also quick VPS network check:

curl -4 -I --max-time 10 https://api.telegram.org
curl -6 -I --max-time 10 https://api.telegram.org

It’s okay if IPv6 fails; the question is whether IPv4 works reliably. If IPv4 also times out, Telegram needs a proxy or VPS network fix.

Docs: https://docs.openclaw.ai/channels/telegram#troubleshooting

charred sluice
#

before we do that, you mentioned earlier about a split install. could that be the problem? can you help set it back to a single install?

mental hamletBOT
#

Yes — let’s clean that up. The split install probably caused the service confusion; the current CPU loop may still be Telegram, but one install is much healthier.

I’d keep the system install (/usr/bin/openclaw) and remove only the nvm-global OpenClaw package:

# confirm system install is current
/usr/bin/openclaw --version

# remove ONLY the nvm/global OpenClaw package from your current nvm Node
npm uninstall -g openclaw

# clear shell command cache
hash -r

# verify openclaw now resolves to system install
type -a openclaw
#
openclaw --version

On Ubuntu, /bin/openclaw and /usr/bin/openclaw may be the same install, so don’t worry if both remain.

Then rewrite the service from the system install one more time:

SYSTEM_PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

systemctl --user stop openclaw-gateway
env PATH="$SYSTEM_PATH" /usr/bin/openclaw gateway install --force
systemctl --user daemon-reload
env PATH="$SYSTEM_PATH" /usr/bin/openclaw gateway restart
#

Verify:

systemctl --user cat openclaw-gateway | grep -E 'ExecStart=|Environment=PATH='
sleep 20
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd

If CPU is still ~90–100%, then the split install is fixed and we go back to the Telegram/network loop.

charred sluice
#

does this mean it's still running?

#

ubuntu@polymarket-mc:~$ npm uninstall -g openclaw
â ™

mental hamletBOT
#

I can’t tell from that line alone — check both the package and the process:

type -a openclaw
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd
systemctl --user status openclaw-gateway --no-pager

How to read it:

  • If type -a openclaw still shows /home/ubuntu/.nvm/.../openclaw, the nvm install still exists.
  • If ps -C openclaw-gateway ... prints a PID, the gateway process is running.
  • If the cmd/unit shows /usr/bin/node /usr/lib/node_modules/openclaw/..., then the running gateway is using the system install, even if nvm is still on your shell PATH.
#

It might just still be uninstalling — npm can sit on that spinner for a bit. Give it up to ~1–2 minutes.

If it doesn’t return, open a second SSH session and check:

ps aux | grep -E 'npm|node|openclaw' | grep -v grep
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd

If the uninstall is stuck but the gateway is still pegging CPU, stop the gateway first, then let npm finish/retry:

systemctl --user stop openclaw-gateway
npm uninstall -g openclaw
hash -r
type -a openclaw
#

Do not use sudo for that npm uninstall -g openclaw — that one is removing the nvm user-level install.

charred sluice
#

ubuntu@polymarket-mc:~$ ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd
PID STAT ELAPSED %CPU %MEM RSS CMD
1691059 Ssl 11:40 59.3 2.5 632656 openclaw-gateway

mental hamletBOT
#

Yes, the gateway is still running: PID 1691059, about 59% CPU.

That’s lower than before, but still too high for an idle gateway. I’d stop it while you remove the duplicate install:

systemctl --user stop openclaw-gateway
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd

That second command should print only the header/no process. Then retry/finish:

npm uninstall -g openclaw
hash -r
type -a openclaw
openclaw --version
#

Expected after cleanup: openclaw should resolve to /usr/bin/openclaw / /bin/openclaw, not /home/ubuntu/.nvm/....

Then restart cleanly:

/usr/bin/openclaw gateway restart
sleep 30
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd
charred sluice
#

ubuntu@polymarket-mc:~$ sleep 30
ubuntu@polymarket-mc:~$ ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd
PID STAT ELAPSED %CPU %MEM RSS CMD
1699584 Rsl 00:51 109 3.6 891424 openclaw-gateway

mental hamletBOT
#

That confirms the split install wasn’t the whole problem. It’s still pegging CPU after the clean service restart.

Now isolate Telegram, because the only useful log we saw was telegram/network timeouts:

systemctl --user stop openclaw-gateway
/usr/bin/openclaw config set channels.telegram.enabled false --strict-json
/usr/bin/openclaw gateway restart

sleep 30
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd

If CPU drops, Telegram/network polling is the culprit. Then check VPS egress:

#
curl -4 -I --max-time 10 https://api.telegram.org
curl -6 -I --max-time 10 https://api.telegram.org

If IPv4 works and IPv6 fails, try re-enabling Telegram with IPv4-only behavior. If IPv4 fails too, Telegram needs a proxy or VPS network fix.

If CPU doesn’t drop with Telegram disabled, paste the new ps output plus:

tail -n 500 /tmp/openclaw/openclaw-2026-04-28.log | grep -iE "error|warn|diagnostic|liveness|telegram|plugin|poll|timeout" | tail -n 150
charred sluice
#

ubuntu@polymarket-mc:~$ curl -4 -I --max-time 10 https://api.telegram.org
-6 -I --max-time 10 https://api.telegram.orgHTTP/2 302
server: nginx/1.18.0
date: Tue, 28 Apr 2026 18:54:35 GMT
content-type: text/html
content-length: 145
location: https://core.telegram.org/bots
strict-transport-security: max-age=31536000; includeSubDomains; preload
access-control-allow-origin: *
access-control-allow-methods: GET, POST, OPTIONS
access-control-expose-headers: Content-Length,Content-Type,Date,Server,Connection

ubuntu@polymarket-mc:~$ curl -6 -I --max-time 10 https://api.telegram.org
curl: (7) Failed to connect to api.telegram.org port 443 after 0 ms: Couldn't connect to server

mental hamletBOT
#

That network result is clear: IPv4 to Telegram works, IPv6 is broken/unavailable.

Now set Telegram to avoid auto family selection, then restart:

/usr/bin/openclaw config set channels.telegram.network.autoSelectFamily false --strict-json
/usr/bin/openclaw config set channels.telegram.enabled true --strict-json
/usr/bin/openclaw gateway restart

sleep 45
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd

If CPU is still high, do the isolation test exactly once:

#
/usr/bin/openclaw config set channels.telegram.enabled false --strict-json
/usr/bin/openclaw gateway restart

sleep 45
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd

Outcomes:

  • CPU drops with Telegram disabled → Telegram polling/network loop; leave Telegram off or use a proxy.
  • CPU stays high even with Telegram disabled → not Telegram; we’ll need the filtered logs/profile next.
charred sluice
#

dont think it's telegram
ubuntu@polymarket-mc:~$ sleep 45
ubuntu@polymarket-mc:~$ ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd
PID STAT ELAPSED %CPU %MEM RSS CMD
1702217 Rsl 01:14 107 3.7 914028 openclaw-gateway

mental hamletBOT
#

That only proves the IPv4-only tweak didn’t fix it. It doesn’t rule out Telegram unless channels.telegram.enabled was actually false.

Quick verify:

/usr/bin/openclaw config get channels.telegram.enabled --json

If that prints true, do the real isolation test:

/usr/bin/openclaw config set channels.telegram.enabled false --strict-json
/usr/bin/openclaw gateway restart
sleep 45
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd
#

If it prints false and CPU is still ~100%, then yep, not Telegram. Next step is a CPU profile:

mkdir -p ~/openclaw-cpu-profiles
systemctl --user stop openclaw-gateway
timeout 75s env NODE_OPTIONS="--cpu-prof --cpu-prof-dir=$HOME/openclaw-cpu-profiles" \
  /usr/bin/node /usr/lib/node_modules/openclaw/dist/index.js gateway --port 18789
ls -lh ~/openclaw-cpu-profiles
/usr/bin/openclaw gateway restart

That should produce a .cpuprofile file. If you can attach it in the thread, it’ll show what’s burning the CPU.

charred sluice
#

it printed trus, so did the isolation test:
ubuntu@polymarket-mc:~$ sleep 45
ubuntu@polymarket-mc:~$ ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd
PID STAT ELAPSED %CPU %MEM RSS CMD
1704573 Rsl 01:05 107 3.6 902812 openclaw-gateway

mental hamletBOT
#

Yep — if Telegram is now disabled and it’s still 107%, we can stop chasing Telegram. This needs a CPU profile.

First confirm Telegram is actually off:

/usr/bin/openclaw config get channels.telegram.enabled --json

Then capture a short profile:

mkdir -p ~/openclaw-cpu-profiles

systemctl --user stop openclaw-gateway
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd
#
timeout 75s env NODE_OPTIONS="--cpu-prof --cpu-prof-dir=$HOME/openclaw-cpu-profiles" \
  /usr/bin/node /usr/lib/node_modules/openclaw/dist/index.js gateway --port 18789

ls -lh ~/openclaw-cpu-profiles

/usr/bin/openclaw gateway restart

Attach the newest .cpuprofile file here if Discord allows it. If not, zip it:

cd ~/openclaw-cpu-profiles
zip latest-cpuprofile.zip "$(ls -t *.cpuprofile | head -1)"

That should show the exact function/subsystem burning the CPU.

charred sluice
#

i am on my desktop and connecting to my vps (that openclaw runs on) via powershell. how can i get the file to attach to you?

mental hamletBOT
#

Use scp from your Windows PowerShell to copy it down from the VPS, then drag/drop the zip into Discord.

On the VPS first:

cd ~/openclaw-cpu-profiles
zip latest-cpuprofile.zip "$(ls -t *.cpuprofile | head -1)"
pwd
ls -lh latest-cpuprofile.zip

Then in Windows PowerShell on your desktop:

scp ubuntu@polymarket-mc:~/openclaw-cpu-profiles/latest-cpuprofile.zip $env:USERPROFILE\Downloads\
#

If polymarket-mc doesn’t resolve from Windows, use the VPS IP instead:

scp ubuntu@YOUR_VPS_IP:~/openclaw-cpu-profiles/latest-cpuprofile.zip $env:USERPROFILE\Downloads\

If you use a specific SSH key:

scp -i $env:USERPROFILE\.ssh\your_key ubuntu@YOUR_VPS_IP:~/openclaw-cpu-profiles/latest-cpuprofile.zip $env:USERPROFILE\Downloads\

Then attach Downloads\latest-cpuprofile.zip here.

charred sluice
#

is there another way? can i log into my oracle vps?

mental hamletBOT
#

Yes — if it’s an Oracle Cloud VPS, the easiest “browser” way is usually the OCI Console Cloud Shell or Serial Console, but for downloading a file, SSH/SCP is still simplest.

A few options:

Option A — use WinSCP GUI

  1. Install/open WinSCP.
  2. Host: your VPS public IP
  3. User: ubuntu
  4. Auth: your SSH private key
  5. Browse to:
    ~/openclaw-cpu-profiles/latest-cpuprofile.zip
  6. Download it, then attach here.

Option B — print a temporary download link from the VPS
Only do this if you’re okay exposing the profile briefly to anyone who can reach the URL. On the VPS:

#
cd ~/openclaw-cpu-profiles
python3 -m http.server 8123 --bind 0.0.0.0

Then from your desktop browser:

http://YOUR_VPS_PUBLIC_IP:8123/latest-cpuprofile.zip

You may need to open port 8123 in Oracle security rules/firewall. Stop it after downloading with Ctrl+C. I’d prefer WinSCP over this.

Option C — don’t attach; inspect top CPU locally
On the VPS, run:

#
node --prof-process ~/openclaw-cpu-profiles/*.log | head -n 120

But .cpuprofile may not produce a .log, so attachment is better.

My recommendation: WinSCP. It avoids opening public ports and is easier than fighting PowerShell paths.

charred sluice
#

ubuntu@polymarket-mc:~$ node --prof-process ~/openclaw-cpu-profiles/*.log | head -n 120
Please provide a valid isolate file as the final argument.

mental hamletBOT
#

--prof-process is for a different kind of V8 profile. --cpu-prof makes .cpuprofile files, so use this instead:

ls -lh ~/openclaw-cpu-profiles
PROFILE=$(ls -t ~/openclaw-cpu-profiles/*.cpuprofile | head -1)

node - "$PROFILE" <<'NODE'
const fs = require("fs");
const p = process.argv[2];
const prof = JSON.parse(fs.readFileSync(p, "utf8"));
const nodes = new Map(prof.nodes.map(n => [n.id, n]));
const counts = new Map();

for (const id of prof.samples || []) {
  const n = nodes.get(id);
  if (!n) continue;
#
  const c = n.callFrame || {};
  const key = `${c.functionName || "(anonymous)"} ${c.url || ""}:${c.lineNumber ?? ""}`;
  counts.set(key, (counts.get(key) || 0) + 1);
}

const total = (prof.samples || []).length || 1;
console.log("PROFILE", p);
console.log("SAMPLES", total);
[...counts.entries()]
  .sort((a,b) => b[1] - a[1])
  .slice(0, 50)
  .forEach(([k,v], i) => {
    console.log(`${String(i+1).padStart(2)} ${String(v).padStart(6)} ${(v/total*100).toFixed(1)}% ${k}`);
  });
NODE
#

Paste that output. It should be small enough for Discord and should tell us what’s spinning.

charred sluice
#

it's too big to send

mental hamletBOT
#

No worries — reduce it to just the top 15 lines and strip long paths:

PROFILE=$(ls -t ~/openclaw-cpu-profiles/*.cpuprofile | head -1)

node - "$PROFILE" <<'NODE' | head -n 25
const fs = require("fs");
const p = process.argv[2];
const prof = JSON.parse(fs.readFileSync(p, "utf8"));
const nodes = new Map(prof.nodes.map(n => [n.id, n]));
const counts = new Map();

for (const id of prof.samples || []) {
  const n = nodes.get(id);
  if (!n) continue;
  const c = n.callFrame || {};
#
  const file = (c.url || "").split("/").pop();
  const key = `${c.functionName || "(anonymous)"} ${file}:${c.lineNumber ?? ""}`;
  counts.set(key, (counts.get(key) || 0) + 1);
}

const total = (prof.samples || []).length || 1;
console.log("SAMPLES", total);
[...counts.entries()]
  .sort((a,b) => b[1] - a[1])
  .slice(0, 15)
  .forEach(([k,v], i) => {
    console.log(`${i+1}. ${v} ${(v/total*100).toFixed(1)}% ${k}`);
  });
NODE
#

Paste only that output.

charred sluice
#

i managed to connect back to my openclaw via the control web page.... this is latest, after i got it to check the profile stuff.. any guidance from here?

Yes — and it’s already helped.

The profile shows the hot path is not Mission Control, not Telegram, and not the logs.tail UI spam directly.

It’s mostly this chain:

Copy
prewarmConfiguredPrimaryModel
→ resolveModel / resolveModelWithRegistry
→ provider/plugin discovery
→ loadOpenClawPlugins
→ resolveRuntimePluginRegistry
→ mirrorBundledPluginRuntimeRoot
→ copyFile / readFile / JSON5 manifest parsing
In plain English: the gateway is repeatedly re-loading plugin/provider/model registry stuff and mirroring bundled plugin runtime files, instead of using a cheap cached registry.

That matches the earlier profile too. So the split registry cleanup helped remove stale NVM paths, but the gateway is still doing expensive plugin/runtime/model discovery repeatedly.

Current status:

Copy
gateway restarted cleanly
profiling override removed
gateway CPU still high-ish: ~80–115% of one core
One annoying wrinkle: the first profiling attempt wrote the profile to /home/ubuntu/CPU.20260428.203112.1779279.0.001.cpuprofile instead of the profile dir, because systemd split the NODE_OPTIONS line. But that file is valid and confirms the culprit.

My next move would be to inspect OpenClaw’s plugin runtime mirror/cache code and look for config/env switches or a patch to stop repeated mirrorBundledPluginRuntimeRoot / registry rebuilds. This is now looking like an OpenClaw bug, not your Mission Control work.

mental hamletBOT
#

Yes — that stack is very useful. I’d treat this as an OpenClaw plugin/model-registry hot-loop bug, not your VPS or Mission Control.

Try this low-risk workaround first: extend plugin discovery/manifest cache TTL from the tiny default to 60s.

/usr/bin/openclaw config set env.OPENCLAW_PLUGIN_DISCOVERY_CACHE_MS '"60000"' --strict-json
/usr/bin/openclaw config set env.OPENCLAW_PLUGIN_MANIFEST_CACHE_MS '"60000"' --strict-json
/usr/bin/openclaw gateway restart

sleep 60
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd

If CPU drops, that strongly confirms “registry/discovery rebuilding too often.”

If it does not drop, next useful check:

#
/usr/bin/openclaw plugins list --enabled
/usr/bin/openclaw gateway status --deep

Then I’d file a GitHub issue with:

  • OpenClaw 2026.4.26 (be8c246)
  • CPU 80–115% idle
  • profile hot path: prewarmConfiguredPrimaryModel → resolveModel → loadOpenClawPlugins → mirrorBundledPluginRuntimeRoot
  • note that Telegram disabled didn’t change it
  • attach the .cpuprofile if possible

Issue link: https://github.com/openclaw/openclaw/issues

I would not disable all plugins globally yet — that may break channels/providers. The cache TTL workaround is the least destructive next move.

charred sluice
#

Better, but not fixed.

Before Krill fix:

Copy
gateway CPU: ~100–115%
RSS: ~1.1–1.2GB
Now:

Copy
gateway CPU: ~75–81%
RSS: ~650–700MB
So the cache TTL change seems to have helped memory and knocked CPU down a bit, but the gateway is still burning most of one core.

One catch: the values are in openclaw.json, but I don’t see them in the actual process environment:

Copy
OPENCLAW_PLUGIN_DISCOVERY_CACHE_MS / MANIFEST_CACHE_MS not present in /proc/<pid>/environ
So either OpenClaw reads env.* internally and doesn’t export it to process env, or Krill’s config path isn’t actually applied to the gateway process environment. That’s important.

My take: partial improvement, not conclusive.

Next I’d do:

Copy
/usr/bin/openclaw plugins list --enabled
/usr/bin/openclaw gateway status --deep
Then if needed, set those two vars via the systemd drop-in directly so we know the process actually receives them.

mental hamletBOT
#

Yep, I agree with that read. Since the hot code checks process.env, let’s set the cache vars in systemd so there’s no ambiguity.

Run:

mkdir -p ~/.config/systemd/user/openclaw-gateway.service.d

cat > ~/.config/systemd/user/openclaw-gateway.service.d/plugin-cache.conf <<'EOF'
[Service]
Environment=OPENCLAW_PLUGIN_DISCOVERY_CACHE_MS=300000
Environment=OPENCLAW_PLUGIN_MANIFEST_CACHE_MS=300000
EOF

systemctl --user daemon-reload
/usr/bin/openclaw gateway restart
#

Verify they’re actually in the process:

PID=$(pidof openclaw-gateway)
tr '\0' '\n' < /proc/$PID/environ | grep OPENCLAW_PLUGIN
sleep 60
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd

If CPU drops further, we’ve basically proven plugin discovery is rebuilding too often. If it stays ~75–80%, paste:

/usr/bin/openclaw plugins list --enabled
/usr/bin/openclaw gateway status --deep

Then I’d file this upstream with the profile path details. This is looking bug-shaped.

charred sluice
#

ok it's dropped to 64%

mental hamletBOT
#

That’s meaningful: the systemd env cache vars helped, so the hot path is very likely repeated plugin discovery/manifest/model-provider work.

64% is still too high, though. I’d do two things now:

  1. Confirm the env really stuck + get enabled plugin list:
PID=$(pidof openclaw-gateway)
tr '\0' '\n' < /proc/$PID/environ | grep OPENCLAW_PLUGIN
/usr/bin/openclaw plugins list --enabled
  1. Give it a stronger cache TTL and restart once:
cat > ~/.config/systemd/user/openclaw-gateway.service.d/plugin-cache.conf <<'EOF'
#
[Service]
Environment=OPENCLAW_PLUGIN_DISCOVERY_CACHE_MS=3600000
Environment=OPENCLAW_PLUGIN_MANIFEST_CACHE_MS=3600000
EOF

systemctl --user daemon-reload
/usr/bin/openclaw gateway restart
sleep 90
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd

If it stays above ~20–30% idle after that, please file a GitHub issue with the profile stack + these cache results. The workaround reducing CPU from ~110% → ~64% is strong evidence, but there’s still another repeated path or cache miss.

charred sluice
#

Worse, unfortunately.

The 1-hour TTL definitely stuck:

Copy
OPENCLAW_PLUGIN_DISCOVERY_CACHE_MS=3600000
OPENCLAW_PLUGIN_MANIFEST_CACHE_MS=3600000
PATH=/usr/bin:/bin
But CPU after restart did not drop:

Copy
+0s 111%
+30s 109%
+60s 106%
+90s 105%
now ~104%
Memory is back up too:

Copy
RSS ~1.5GB
Enabled plugins are:

Copy
anthropic
brave
google
memory-core
memory-wiki
ollama
telegram
tokenjuice
openclaw-web-search
So: cache TTL alone is not enough. The earlier improvement to ~64–70% was probably transient / warmup variance, not a real fix.

This is now properly bug-shaped. The strongest evidence remains the CPU profile:

Copy
/home/ubuntu/CPU.20260428.203112.1779279.0.001.cpuprofile
Hot path:

Copy
prewarmConfiguredPrimaryModel
→ resolveModel / resolveModelWithRegistry
→ provider/plugin discovery
→ loadOpenClawPlugins
→ resolveRuntimePluginRegistry
→ mirrorBundledPluginRuntimeRoot
→ copyFile/readFile/JSON5 parsing
Next practical options:

Try isolating the one global third-party plugin: openclaw-web-search
Lowest blast radius among plugins, but web search tool disappears temporarily.

Disable provider plugins you don’t need right now: anthropic, brave, google, ollama etc.
More disruptive.

File upstream with profile + results and leave gateway running hot for now.

My recommendation: try disabling only openclaw-web-search first, because it’s the only global non-stock plugin in the list.

======

note - i don't think the web search is the problem , but if you see lots of new issues on github related or linked to this problem i have then maybe - can you check the github in case there are similar recent issues (last week or so)

mental hamletBOT
#

I checked. I don’t see a pile of brand-new GitHub issues from the last week that exactly match your prewarmConfiguredPrimaryModel → loadOpenClawPlugins → mirrorBundledPluginRuntimeRoot profile.

What I do see:

  • Today’s 2026.4.26 issue #73685 is plugin-related, but it’s specifically an external Discord plugin failing because plugin-sdk/discord export is missing. That doesn’t match your enabled plugin list unless you also have a stale Discord plugin somewhere. (github.com)
  • Older but very relevant #28587 reports sustained 75–85%+ gateway CPU on CPU-only VPS hardware due plugin runtime eagerly loading channel SDKs. That sounds conceptually close. (github.com)
  • #52031 is also close: redundant loadOpenClawPlugins() / plugin registration work during snapshot loads causing severe performance overhead. (github.com)
  • #63249 is a different 99% CPU CLI hot-loop, but also points toward Linux/plugin-loader initialization paths. (github.com)

So I agree with you: openclaw-web-search doesn’t look like the obvious culprit. Your profile looks more like a plugin-loader/provider-registry regression or recurrence of the older plugin-runtime CPU issues.

I’d file a new issue rather than disabling random plugins. Include:

  • OpenClaw 2026.4.26
  • VPS/Linux, Node 22.22, system install
  • idle gateway CPU ~100%, RSS up to ~1.5GB
  • Telegram disabled didn’t fix it
  • plugin cache env vars didn’t fix it
GitHub

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞 - Issues · openclaw/openclaw

GitHub

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞 - Issues · openclaw/openclaw

GitHub

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞 - Issues · openclaw/openclaw

GitHub

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞 - Issues · openclaw/openclaw

#
  • hot path from profile
  • mention related issues: #28587, #52031, #63249, and maybe #73685 only as “recent 4.26 plugin regression, but likely separate”

If you need relief now, the most pragmatic test is a controlled rollback to 2026.4.25, but back up ~/.openclaw/openclaw.json first.

charred sluice
#

it's ok - i had this issue before that release (i had rolled back but that didn't resolve). could it be anything to do with the 'fallbacks' which use ollama which are on my desktop, not the vps, and connect via tailscale? because the cpu comes down bit by bit.. it's at 54 now

mental hamletBOT
#

Yes — that’s plausible.

Not because OpenClaw is necessarily running the fallback model, but because startup/model resolution can still touch provider/fallback config and Ollama discovery. If your Ollama fallback is on a desktop over Tailscale, slow/unreachable /api/tags or timeouts could fit the “CPU comes down bit by bit” pattern.

I’d test that before disabling random plugins:

/usr/bin/openclaw config get agents.defaults.model --json
/usr/bin/openclaw config get models.providers --json

Don’t paste secrets, but check whether fallbacks includes ollama/... or a custom Ollama provider.

Then temporarily remove model fallbacks:

#
cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.bak-$(date +%Y%m%d-%H%M%S)

/usr/bin/openclaw config set agents.defaults.model.fallbacks '[]' --strict-json
/usr/bin/openclaw gateway restart

sleep 120
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd

If CPU drops hard, the Ollama fallback/provider path is implicated.

Also test VPS → desktop Ollama directly, replacing the URL with your configured baseUrl:

curl --max-time 10 http://TAILSCALE-IP-OR-HOST:11434/api/tags
charred sluice
#

I think Krill’s fallback/Ollama theory is worth testing before local patching OpenClaw. It’s lower-risk and fits the slow “comes down bit by bit” behaviour. I’ll check the config first without touching anything.I agree with Krill’s next test — remove model fallbacks temporarily before disabling plugins or patching OpenClaw.

I checked the config and there is a real lead: (i will put in separate new message as discord doesn't like long messages):

mental hamletBOT
#

Yep, that’s the right direction. Wait for the config lead, then do the smallest reversible test:

cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.bak-$(date +%Y%m%d-%H%M%S)

/usr/bin/openclaw config set agents.defaults.model.fallbacks '[]' --strict-json
/usr/bin/openclaw gateway restart

sleep 120
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd

If CPU drops, don’t patch OpenClaw yet — fix the fallback/provider config. If it doesn’t, restore or leave the no-fallback state depending on preference and continue with plugin/provider profiling.

charred sluice
#

`fallbacks:
[
"google/gemini-2.5-flash",
"ollama/qwen3.5:9b",
"ollama/gemma4"
]
``

Ollama is configured at:

http://100.96.199.51:11434
``

Direct VPS → desktop Ollama is fast right now:

```text
/api/tags returned in ~0.018s
models: qwen3.5:4b, qwen3.5:9b, gema4:e4b
``

But there’s a mismatch:

```text
configured fallback: ollama/gemma4
actual Ollama model: gema4:e4b`
mental hamletBOT
#

Nice catch. The endpoint is fast, so Tailscale/Ollama reachability is probably not the main cause. But the bad fallback model ID absolutely can create repeated resolution/catalog work.

Do the reversible test first:

cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.bak-$(date +%Y%m%d-%H%M%S)

/usr/bin/openclaw config set agents.defaults.model.fallbacks '[]' --strict-json
/usr/bin/openclaw gateway restart

sleep 120
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd

If CPU drops, restore only valid fallbacks, e.g.:

#
/usr/bin/openclaw config set agents.defaults.model.fallbacks '["google/gemini-2.5-flash","ollama/qwen3.5:9b","ollama/gema4:e4b"]' --strict-json
/usr/bin/openclaw gateway restart

Also, double-check the model spelling — gema4:e4b looks unusual compared with gemma4, but use exactly what /api/tags reports.

charred sluice
#

Fallback test result: not the cause.

Removing fallbacks completely did not improve CPU:

Copy
fallbacks: []
after 120s: ~105% CPU
instant pidstat average: ~104.45%
RSS: ~1.3GB then rising to ~1.58GB
And node.list was still slow during the test:

Copy
node.list ~7.8s
node.list ~7.9s
So Krill’s Ollama/bad fallback theory was plausible, but this test says model fallbacks are not the main driver.

Important: fallbacks are currently still empty:

I’d restore them, but with the corrected Ollama model ID:

not ollama/gemma4.

Say OK restore valid fallbacks and I’ll put those back and restart once.

Running the fallback isolation as a detached systemd job so the gateway restart doesn’t kill the test halfway through. It’ll back up config, remove fallbacks, restart, wait 120s, then log CPU samples.Started. The detached test is running now and will survive the gateway restart.

I also set a one-shot wake for ~21:44 UTC to read the log and report back automatically if this chat gets interrupted. and log :
Yes — the fallback test log completed and reported back.

Log file:

Copy
/home/ubuntu/openclaw-cpu-profiles/fallback-test-20260428T214134Z.log
Result:

Fallbacks were removed successfully:
Gateway restarted.
CPU stayed bad:
Copy
+30s 107%
+60s 106%
+90s 106%
+120s 105%
pidstat average: 104.45%
Memory climbed:
Copy
~751MB → ~1.3GB → currently ~1.58GB
node.list still had slow calls around:
Copy
7.8s
So the log says removing fallbacks did not help. Fallbacks are currently still empty until we restore them.

mental hamletBOT
#

Fallbacks ruled out — good test. Yes, I’d restore them, but fix the bad Ollama ID:

/usr/bin/openclaw config set agents.defaults.model.fallbacks '["google/gemini-2.5-flash","ollama/qwen3.5:9b","ollama/gema4:e4b"]' --strict-json
/usr/bin/openclaw gateway restart

Next I’d isolate provider plugins, not web search randomly. First check primary:

/usr/bin/openclaw config get agents.defaults.model.primary --json
/usr/bin/openclaw plugins list --enabled

Then temporarily disable only providers you’re not actively using. Example if your primary is Google/Anthropic and you can live without local Ollama briefly:

#
/usr/bin/openclaw config set plugins.entries.ollama.enabled false --strict-json
/usr/bin/openclaw gateway restart
sleep 120
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd

If no change, re-enable it and test the next nonessential plugin. Don’t disable memory-core yet; higher blast radius. The profile + node.list ~8s still smells like plugin/provider registry reload, so narrowing which enabled plugin triggers it is the next useful clue.

charred sluice
#

ok - it said this and doing it now:
Krill’s right on the direction, but typo again:

Copy
ollama/gema4:e4b ❌
ollama/gemma4:e4b âś…
Also, your primary is:

Copy
openai-codex/gpt-5.5
So testing ollama next is sensible — it’s non-primary and lower blast radius than memory plugins.

Recommended sequence:

Restore valid fallbacks:

Restart once.

Then temporarily disable only ollama, restart, wait 120s, measure CPU.

If CPU drops, Ollama provider/plugin is implicated. If not, re-enable it and test the next provider plugin.

Because this involves restarts, I need the explicit nod. Say:

OK restore fallbacks and test ollama

mental hamletBOT
#

Yep — good correction. Use ollama/gemma4:e4b, not my typo.

Since primary is openai-codex/gpt-5.5, testing Ollama is low-risk. The exact restore command should be:

/usr/bin/openclaw config set agents.defaults.model.fallbacks '["google/gemini-2.5-flash","ollama/qwen3.5:9b","ollama/gemma4:e4b"]' --strict-json
/usr/bin/openclaw gateway restart

Then Ollama isolation:

/usr/bin/openclaw config set plugins.entries.ollama.enabled false --strict-json
/usr/bin/openclaw gateway restart
sleep 120
ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd
#

After the test, restore Ollama unless it clearly fixed CPU:

/usr/bin/openclaw config set plugins.entries.ollama.enabled true --strict-json
/usr/bin/openclaw gateway restart
charred sluice
#

while i wait for reply from my openclaw, is this still a possible bug to raise with openclaw? or more likely bad config on my instance? or hard to say atm?

mental hamletBOT
#

Still very much possible bug territory.

My read right now:

  • Not just “normal bad config”: an idle gateway sitting at ~100% CPU with RSS climbing to ~1.5GB is not expected.
  • Config may be the trigger: enabled provider plugins, model registry, Ollama/custom provider, or memory/web plugins may be causing the bad path.
  • But the behavior is bug-shaped: a config edge case should fail cheaply or cache properly, not repeatedly burn a core in loadOpenClawPlugins → mirrorBundledPluginRuntimeRoot.

So I’d phrase the issue as:

“Idle gateway high CPU/memory triggered by this plugin/provider config; CPU profile points to repeated plugin/model registry runtime mirror work.”

That’s useful upstream even if the root cause is one config entry. The tests you’re doing now are exactly what makes the bug report strong: fallbacks ruled out, Telegram ruled out, cache env didn’t solve it, now isolating provider plugins.

charred sluice
#

ok and even if slight config related, probably best to fully isolate what the cause / issue is, then raise a bug. and i then want to help create a fix for it too, to help the team

mental hamletBOT
#

Exactly. That’s the best path: isolate the trigger, then file a tight bug with enough evidence that someone can fix it fast.

Good bug report shape:

  1. Environment

    • OpenClaw version
    • VPS OS / Node version
    • systemd gateway
    • enabled plugins
  2. Symptoms

    • idle gateway CPU ~100%
    • RSS climbs ~1.5GB
    • RPC calls like node.list ~8s
  3. Ruled out

    • split install fixed
#
  • Telegram disabled didn’t help
    • model fallbacks removed didn’t help
    • plugin cache TTL didn’t help
  1. Profile evidence

    • prewarmConfiguredPrimaryModel → resolveModelWithRegistry → loadOpenClawPlugins → mirrorBundledPluginRuntimeRoot
    • attach .cpuprofile
  2. Minimal trigger

    • whatever your isolation finds next: e.g. “only happens with plugins.entries.ollama.enabled=true” or “only with memory-wiki + provider plugins”.

If you want to help fix it, the likely code area is plugin/model registry caching around repeated resolveRuntimePluginRegistry / bundled runtime mirror. But first isolate the trigger so the team isn’t guessing.

charred sluice
#

this is what came back..

Yep, restart completed and the gateway is up.

Current important bits:

Copy
gateway PID: 1844309
CPU: ~100% of one core
ollama plugin: disabled
fallbacks: restored with corrected gemma4:e4b
So we need to either:

re-enable Ollama now, since it didn’t help, or
leave it disabled while testing the next plugin.
I’d re-enable it before moving on, to keep tests clean.

===

i think we are now past the testing of different plug ins. can you give as much detail as possible for what to include in the bug ticket, and i will get my openclaw to create the ticket

mental hamletBOT
#

Yes. First, I’d re-enable Ollama so the bug report reflects your normal config:

/usr/bin/openclaw config set plugins.entries.ollama.enabled true --strict-json
/usr/bin/openclaw gateway restart

Then use this for the GitHub issue: https://github.com/openclaw/openclaw/issues

## Title

Idle gateway high CPU/RSS on VPS; CPU profile points to repeated plugin/model registry + bundled runtime mirror work

## Summary
#
My OpenClaw gateway becomes almost unusable on a VPS because `openclaw-gateway` idles at ~100% of one CPU core and RSS grows to ~1.3–1.6GB. RPC/control operations become slow; `node.list` calls were taking ~7.8s.

A CPU profile points at repeated model/plugin registry work:

prewarmConfiguredPrimaryModel
→ resolveModel / resolveModelWithRegistry
→ provider/plugin discovery
→ loadOpenClawPlugins
→ resolveRuntimePluginRegistry
→ mirrorBundledPluginRuntimeRoot
→ copyFile / readFile / JSON5 manifest parsing

This looks like repeated plugin/provider/model registry rebuilding or bundled runtime mirror work during/after startup.

## Environment
#

- OpenClaw: 2026.4.26 (be8c246)
- Host: Ubuntu VPS, systemd user service
- Runtime: Node v22.22.0
- Gateway command after cleanup:
  `/usr/bin/node /usr/lib/node_modules/openclaw/dist/index.js gateway --port 18789`
- Gateway bind: loopback, port 18789
- Primary model:
  `openai-codex/gpt-5.5`
- Normal fallbacks:
  `google/gemini-2.5-flash`
  `ollama/qwen3.5:9b`
  `ollama/gemma4:e4b`
- Ollama endpoint:
  `http://100.96.199.51:11434` via Tailscale
#
- Enabled plugins observed:
  - anthropic
  - brave
  - google
  - memory-core
  - memory-wiki
  - ollama
  - telegram
  - tokenjuice
  - openclaw-web-search

## Symptoms

`ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd` repeatedly shows the gateway using most/all of one CPU core while apparently idle.
#
Examples:

- Initially: ~100–115% CPU, RSS around 1.1–1.5GB
- After multiple restarts/tests: still ~100–107% CPU
- `node.list` remained slow, around 7.8s
- Memory climbed during tests, e.g. ~751MB → ~1.3GB → ~1.58GB

## Things ruled out / tested

### Split install / stale service

There was initially a split install:
- nvm OpenClaw: 2026.4.26
- `/usr/bin/openclaw`: 2026.4.22
#
The system install was updated and the service was rewritten. Gateway now runs from:

`/usr/bin/node /usr/lib/node_modules/openclaw/dist/index.js gateway --port 18789`

This did not fix the high CPU.

### Telegram

Telegram had IPv6/network warnings, and IPv6 to Telegram failed while IPv4 worked.

Tested disabling Telegram:

`channels.telegram.enabled=false`

After restart and waiting, CPU stayed around ~105–107%. So Telegram does not appear to be the main cause.
#

### Plugin discovery cache env

Tried setting these in systemd so they are visible in `/proc/<pid>/environ`:

`OPENCLAW_PLUGIN_DISCOVERY_CACHE_MS=3600000`
`OPENCLAW_PLUGIN_MANIFEST_CACHE_MS=3600000`

The vars stuck, but after restart CPU still stayed around ~104–111%. So cache TTL alone did not solve it.

### Model fallbacks / Ollama fallback

Temporarily removed all model fallbacks:

`agents.defaults.model.fallbacks=[]`
#

After restart and 120s:
- CPU stayed around ~105%
- pidstat average around ~104.45%
- RSS continued rising
- `node.list` still around ~7.8s

So model fallbacks are not the main cause.

### Ollama plugin

Temporarily disabled only the Ollama plugin:

`plugins.entries.ollama.enabled=false`
#
After restart:
- Gateway still around ~100% CPU

So Ollama plugin alone does not appear to be the main cause.

## CPU profile evidence

A valid CPU profile was captured at:

`/home/ubuntu/CPU.20260428.203112.1779279.0.001.cpuprofile`

The hot path observed:

prewarmConfiguredPrimaryModel
→ resolveModel / resolveModelWithRegistry
#
→ provider/plugin discovery
→ loadOpenClawPlugins
→ resolveRuntimePluginRegistry
→ mirrorBundledPluginRuntimeRoot
→ copyFile / readFile / JSON5 manifest parsing

Please see attached `.cpuprofile`.

## Expected behavior

An idle gateway should not continuously burn ~100% of one CPU core or grow RSS toward ~1.5GB.

Plugin/model/provider registry discovery should either complete once, cache effectively, or fail cheaply.

## Actual behavior
#

Gateway repeatedly consumes ~100% CPU while idle, with profile evidence pointing at plugin/model registry and bundled runtime mirror work.

## Extra notes

This happened before/independent of 2026.4.26; rolling back previously did not resolve it.

This may be a config-triggered bug rather than a universal regression, but the gateway behavior still seems bug-shaped because an edge config should not cause repeated expensive plugin registry/runtime mirror work.

Attach the .cpuprofile if possible. If GitHub won’t accept it directly, zip it first.

charred sluice
mental hamletBOT
#

Nice — that’s the right move. Issue #73835 has the kind of evidence maintainers need.

For Codex, I’d point it at these likely areas:

  • src/gateway/server-startup-post-attach.ts
    • prewarmConfiguredPrimaryModel
  • src/agents/models-config.ts
    • ensureOpenClawModelsJson
    • fingerprint/cache behavior
  • src/plugins/loader.ts
    • resolveRuntimePluginRegistry
    • loadOpenClawPlugins
    • cache-key differences causing reloads
  • src/plugins/bundled-runtime-root.ts
    • mirrorBundledPluginRuntimeRoot

The fix should probably aim for: startup/model prewarm must not repeatedly trigger full plugin registry loads or runtime mirror refreshes. A good regression test would assert repeated model resolution/prewarm does not repeatedly call loadOpenClawPlugins / mirror runtime roots.

#

And yeah, attach the .cpuprofile zip to the issue if you can — that’s the strongest artifact.

charred sluice