#Gateway response slow/failing before and after upgrading to 2026.4.25

1 messages · Page 1 of 1 (latest)

vocal wingBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

vestal oracleBOT
#

That sounds like two things may be stacked:

  1. Model/provider overload from Gemini preview.
  2. Gateway/channel delay because Telegram messages not appearing in the Gateway UI for ~3 minutes — that part is before the model response, so I wouldn’t treat this as only a model problem.

Please run/paste this safe diagnostic bundle:

openclaw --version
which openclaw
openclaw status --all
openclaw gateway status --deep
openclaw channels status --probe
openclaw models status

Then tail logs while sending one Telegram message:

#
openclaw logs --follow

Look for lines mentioning:

  • telegram polling stalls/timeouts
  • queued for ...ms
  • provider errors like overloaded, resource exhausted, 429, timeout
  • event-loop delay / liveness warnings

Also run:

openclaw doctor
openclaw gateway restart
#

One nuance: if you manually selected a model with /model, that session selection is strict; it may fail visibly instead of walking your configured fallback chain. For debugging, /new is worth trying, but the 3-minute UI delay still points at Gateway/Telegram polling/queue/event-loop issues.

Docs for the relevant runbook: https://docs.openclaw.ai/help/faq and https://docs.openclaw.ai/gateway/troubleshooting 🩐

humble obsidian
#

I pasted the diagnostic bundle.
I then ran tail logs and sent a message. But got a message is the gateway even running
I then ran openclaw doctor, doctor --fix and re-started the gateway
Each of the openclaw processes took a long time compared to previously

vestal oracleBOT
#

I don’t see the diagnostic bundle in this thread — Discord may have eaten it if it was too long.

The new clue is important: if even openclaw ... CLI commands are slow, this is probably below Telegram/model now: gateway/service startup, machine resource pressure, or OpenClaw hanging while probing the gateway.

Please paste these as code blocks:

openclaw --version
which openclaw
time openclaw gateway status --deep
time openclaw status --all

And the latest logs without --follow:

#
tail -n 200 "$(ls -t /tmp/openclaw/openclaw-*.log | head -1)"

Also this read-only system snapshot:

ps -eo pid,ppid,etime,%cpu,%mem,cmd | grep -E 'openclaw|node' | grep -v grep
free -h
df -h

If gateway status says RPC/connectivity is down or “is the gateway even running”, don’t keep testing Telegram yet — first we need the gateway healthy. Docs runbook: https://docs.openclaw.ai/gateway/troubleshooting

humble obsidian
#

What do you mean by "paste these as code blocks"

vestal oracleBOT
#

In Discord, wrap the output with triple backticks so formatting stays readable.

Like this:

```bash
openclaw --version
which openclaw
time openclaw gateway status --deep
```

Then after you run it, paste the output like:

```text
#
<paste terminal output here>
```

The important part is the three backticks before and after:

your output here
humble obsidian
#

'''🩞 OpenClaw 2026.4.25 (aa36ee6) — I'm the assistant your terminal demanded, not the one your sleep schedule requested.

│
◇
Service: systemd (enabled)
File logs: /tmp/openclaw/openclaw-2026-04-28.log
Command: /usr/bin/node /home/XXXXX/.npm-global/lib/node_modules/openclaw/dist/index.js gateway --port 18789
Service file: ~/.config/systemd/user/openclaw-gateway.service
Service env: OPENCLAW_GATEWAY_PORT=18789

Config (cli): ~/.openclaw/openclaw.json
Config (service): ~/.openclaw/openclaw.json

Gateway: bind=loopback (127.0.0.1), port=18789 (service args)
Probe target: ws://127.0.0.1:18789
Dashboard: http://127.0.0.1:18789/
Probe note: Loopback-only gateway; only local clients can connect.

Runtime: running (pid 387527, state active, sub running, last exit 0, reason 0)
Connectivity probe: ok
Capability: admin-capable

Listening: 127.0.0.1:18789
Troubles: run openclaw status
Troubleshooting: https://docs.openclaw.ai/troubleshooting

real 0m4.915s
user 0m5.535s
sys 0m0.967s'''

vestal oracleBOT
#

That worked — thanks. Tiny formatting note: Discord code blocks use backticks ``` not apostrophes ''', but I can read this fine.

Good news: the gateway is running and reachable:

  • Runtime: running
  • Connectivity probe: ok
  • listening on 127.0.0.1:18789

real 0m4.915s is slow-ish for gateway status, but not “hung”. Next we need to separate Telegram polling delay from model/queue delay.

Please paste these next:

openclaw channels status --probe
openclaw models status
openclaw status --all
#

And since your log path is shown, after sending one Telegram test message, paste:

tail -n 300 /tmp/openclaw/openclaw-2026-04-28.log

In that log, the most useful lines will mention telegram, poll, queued, overloaded, timeout, 429, or event-loop delay.

humble obsidian
#

│
◒  Checking channel status (probe)
penclaw models status
◇
Gateway reachable.
- Telegram default: enabled, configured, running, connected, in:13m ago, mode:polling, bot:@XXXXXXXX, token:config, works

Tip: status --deep adds gateway health probes to status output (requires a reachable gateway).```
vestal oracleBOT
#

That output is useful: Telegram itself probes as working, but in:13m ago means the gateway hasn’t seen an inbound Telegram message for 13 minutes. So now I’d look at the polling loop / duplicate bot consumer, not the model first.

Also looks like two commands got mashed together here:

Checking channel status (probe)
penclaw models status

Run/paste these one at a time, waiting for the shell prompt between each:

openclaw models status
openclaw status --all
#

Then this filtered log view:

grep -iE 'telegram|poll|409|conflict|getUpdates|timeout|stall|queued|overload|429|event-loop|liveness' /tmp/openclaw/openclaw-2026-04-28.log | tail -n 200

And check for duplicate gateway/processes using the same Telegram bot token:

ps -eo pid,ppid,etime,%cpu,%mem,cmd | grep -E 'openclaw|node' | grep -v grep

If the logs show 409 Conflict / terminated by other getUpdates request, that means another OpenClaw/gateway/bot process is polling the same Telegram bot token. That would explain Telegram delays/non-responsiveness pretty neatly.

humble obsidian
#

''' OpenClaw 2026.4.25 (aa36ee6) — I autocomplete your thoughts—just slower and with more API calls.

│
◇
Gateway reachable.

  • Telegram default: enabled, configured, running, connected, in:17m ago, mode:polling, bot:@XXXX, token:config, works

Tip: status --deep adds gateway health probes to status output (requires a reachable gateway).```

vestal oracleBOT
#

That’s still the channel status output, not models status.

Key clue: in:17m ago is increasing. So OpenClaw thinks Telegram polling is “works”, but it has not actually received a Telegram inbound message recently.

Let’s skip models for a sec and check the Telegram logs. Please copy/paste just this one line into terminal:

grep -iE 'telegram|poll|409|conflict|getUpdates|timeout|stall' /tmp/openclaw/openclaw-2026-04-28.log | tail -n 200

Then paste the output here.

After that, run this one:

ps -eo pid,ppid,etime,%cpu,%mem,cmd | grep -E 'openclaw|node' | grep -v grep
#

If we see 409 Conflict / getUpdates, it likely means another process is polling the same Telegram bot token.

humble obsidian
#

I haven't sent a telegram message for 12mins

vestal oracleBOT
#

Ah, fair — then in:17m ago isn’t suspicious by itself. My bad, I over-weighted that signal.

Let’s do a clean timed test:

  1. Start this in one terminal:
openclaw logs --follow
  1. In Telegram, send a simple message like:
ping test
  1. Watch how long it takes for any Telegram/inbound line to appear in logs or Gateway UI.
#

If it’s delayed again, stop the log tail and paste the relevant chunk. You can also use this after the test:

grep -iE 'telegram|inbound|poll|queued|overload|429|timeout|event-loop|liveness|error' /tmp/openclaw/openclaw-2026-04-28.log | tail -n 250

Also still worth checking processes once:

ps -eo pid,ppid,etime,%cpu,%mem,cmd | grep -E 'openclaw|node' | grep -v grep

That’ll tell us if it’s “Telegram update not arriving” vs “arrives quickly but waits in queue/model.”

humble obsidian
#

Following the log command. It opened logs. Then I sent the telegram message then

Gateway target: ws://127.0.0.1:XXXXX
Source: local loopback
Config: /home/XXXXX/.openclaw/openclaw.json
Bind: loopback
Hint: run `openclaw doctor`.```
vestal oracleBOT
#

That’s a stronger signal: the gateway is either crashing/restarting, or becoming unreachable right when Telegram input hits it. Let’s stop changing things for a moment and capture the crash evidence.

Please run these read-only commands:

openclaw gateway status --deep
systemctl --user status openclaw-gateway --no-pager -l
journalctl --user -u openclaw-gateway -n 200 --no-pager

And this log tail around the failure:

#
tail -n 300 /tmp/openclaw/openclaw-2026-04-28.log

Also, don’t keep running doctor --fix repeatedly yet — if the gateway is crash-looping, repair attempts can add noise. The thing we need now is the first error/stack trace around the moment you sent the Telegram message.

humble obsidian
#

│
◇
Service: systemd (enabled)
File logs: /tmp/openclaw/openclaw-2026-04-28.log
Command: /usr/bin/node /home/x/.npm-global/lib/node_modules/openclaw/dist/index.js gateway --port x
Service file: ~/.config/systemd/user/openclaw-gateway.service
Service env: OPENCLAW_GATEWAY_PORT=x

Config (cli): ~/.openclaw/openclaw.json
Config (service): ~/.openclaw/openclaw.json

Gateway: bind=loopback (127.0.0.1), port=x(service args)
Probe target: ws://127.0.0.1:x
Dashboard: http://127.0.0.1:x/
Probe note: Loopback-only gateway; only local clients can connect.

Runtime: running (pid 387527, state active, sub running, last exit 0, reason 0)
Connectivity probe: ok
Capability: admin-capable

Listening: 127.0.0.1:x
Troubles: run openclaw status
Troubleshooting: https://docs.openclaw.ai/troubleshooting```
vestal oracleBOT
#

That status says the gateway is currently healthy and still on the same PID, so it probably didn’t hard-crash. More likely it became temporarily unreachable / event-loop blocked / stalled.

The status output only tells us “now”; we need the logs from the exact failure window.

Please run and paste these next — especially the first one:

tail -n 300 /tmp/openclaw/openclaw-2026-04-28.log

Then:

journalctl --user -u openclaw-gateway -n 200 --no-pager

And quick question: did the “Gateway not reachable” message appear in Telegram, or in the terminal running openclaw logs --follow? That changes what path is failing.

humble obsidian
#

On last question. It was in terminal
On two commands above, the output is too large to paste her

vestal oracleBOT
#

Got it — since it appeared in the terminal, that means the openclaw logs --follow client temporarily couldn’t reach the gateway. The gateway may still be running, but something is stalling/interrupting the local RPC path.

Don’t paste the huge logs. Please run this smaller filtered version:

grep -iE 'gateway not reachable|websocket|econn|refused|closed|disconnect|telegram|poll|queued|overload|429|timeout|event-loop|liveness|fatal|uncaught|error' /tmp/openclaw/openclaw-2026-04-28.log | tail -n 150

And this smaller journal slice:

journalctl --user -u openclaw-gateway --since "20 minutes ago" --no-pager | tail -n 150

If that’s still too big, save it to a file and attach it here:

#
{
  echo "=== gateway filtered log ==="
  grep -iE 'gateway not reachable|websocket|econn|refused|closed|disconnect|telegram|poll|queued|overload|429|timeout|event-loop|liveness|fatal|uncaught|error' /tmp/openclaw/openclaw-2026-04-28.log | tail -n 150
  echo
  echo "=== systemd journal ==="
  journalctl --user -u openclaw-gateway --since "20 minutes ago" --no-pager | tail -n 150
  echo
  echo "=== processes ==="
  ps -eo pid,ppid,etime,%cpu,%mem,cmd | grep -E 'openclaw|node' | grep -v grep
} > /tmp/openclaw-debug.txt

Then attach /tmp/openclaw-debug.txt here. Quick skim/redact anything private first, just in case.

humble obsidian
#

I cant attach file and the log is too long

vestal oracleBOT
#

No worries — let’s make the output tiny.

Run these exactly, one at a time:

grep -iE 'fatal|uncaught|event-loop|liveness|gateway not reachable|ECONN|refused|telegram|poll|overload|429|timeout' /tmp/openclaw/openclaw-2026-04-28.log | tail -n 40 | cut -c1-240
journalctl --user -u openclaw-gateway --since "30 minutes ago" --no-pager | grep -iE 'fatal|uncaught|error|killed|restart|exit|memory|oom|timeout' | tail -n 40 | cut -c1-240
ps -eo pid,ppid,etime,%cpu,%mem,cmd | grep -E 'openclaw|node' | grep -v grep

Also: for the live Telegram test, don’t use openclaw logs --follow anymore since that itself depends on gateway RPC. Use raw file tail instead:

#
tail -f /tmp/openclaw/openclaw-2026-04-28.log

Then send Telegram ping test and watch whether lines appear immediately.

humble obsidian
#

THere is a 2000 character limit here. I can't send the outut

vestal oracleBOT
#

No problem — let’s force it under the limit.

Run this one command and paste the output. It should be short:

LOG=/tmp/openclaw/openclaw-2026-04-28.log
echo "== counts =="
for p in "fatal|uncaught|exception" "event-loop|liveness" "telegram|poll|getUpdates|409|conflict" "overload|429|resource exhausted" "timeout|ECONN|refused"; do
  printf "%-32s " "$p"
  grep -iE "$p" "$LOG" | wc -l
done
echo "== last important lines =="
grep -iE 'fatal|uncaught|exception|event-loop|liveness|telegram|poll|getUpdates|409|conflict|overload|429|resource exhausted|timeout|ECONN|refused' "$LOG" | tail -n 8 | cut -c1-180
echo "== openclaw processes =="
ps -eo pid,etime,%cpu,%mem,cmd | grep -E 'openclaw|node' | grep -v grep | cut -c1-180
#

If that’s still too long, paste only the == counts == section first.

humble obsidian
#
{"0":"{\"subsystem\":\"agent/embedded\"}","1":{"event":"embedded_run_agent_end","tags":["error_handling","lifecycle","agent_end","assistant_error"],"runId":"952fa47d-926b-4005-97a0
{"0":"{\"subsystem\":\"agent/embedded\"}","1":{"event":"auth_profile_failure_state_updated","tags":["error_handling","auth_profiles","cooldown"],"runId":"952fa47d-926b-4005-97a0-11
{"0":"{\"subsystem\":\"agent/embedded\"}","1":{"event":"embedded_run_failover_decision","tags":["error_handling","failover","assistant","fallback_model"],"runId":"952fa47d-926b-400
{"0":"{\"subsystem\":\"diagnostic\"}","1":"lane task error: lane=main durationMs=80156 error=\"FailoverError: The AI service is temporarily overloaded. Please try again in a moment
{"0":"{\"subsystem\":\"diagnostic\"}","1":"lane task error: lane=session:agent:main:telegram:direct:7006368633 durationMs=80163 error=\"FailoverError: The AI service is temporarily
{"0":"{\"subsystem\":\"model-fallback/decision\"}","1":{"event":"model_fallback_decision","tags":["error_handling","model_fallback","candidate_failed"],"runId":"952fa47d-926b-4005-
{"0":"{\"subsystem\":\"model-fallback/decision\"}","1":{"event":"model_fallback_decision","tags":["error_handling","model_fallback","candidate_succeeded"],"runId":"952fa47d-926b-40```
vestal oracleBOT
#

Perfect — this finally narrowed it down.

Those lines mean Telegram/Gateway are not the main issue anymore. The run is reaching the agent, then spending ~80s in the model/failover path and failing with:

FailoverError: The AI service is temporarily overloaded

So I’d do this now:

  1. In Telegram, send standalone:
/stop
  1. Switch away from the Gemini preview primary for now. Use a stable model you already have auth for: