#problem of speed + UX

1 messages · Page 1 of 1 (latest)

spare bridge
#

Hi, I'm currently using the GLM-5.1 - zai model on OpenClaw, and I'm experiencing significant latency issues. The model is extremely slow, and every time I submit a prompt, the text disappears for about 10 seconds before reappearing. Could you help me troubleshoot why this specific model is lagging so much?

tall walrusBOT
neon joltBOT
#

Exec approval is required, but native chat exec approvals are not configured on Discord.

Approve it from the Web UI or terminal UI for now. Discord supports native exec approvals for this account. Configure channels.discord.execApprovals.approvers or commands.ownerAllowFrom; leave channels.discord.execApprovals.enabled unset/auto or set it to true.

#

Let me check the docs and source for GLM/z.ai specifics so I can give you something concrete, not guesses.

neon joltBOT
# spare bridge Hi, I'm currently using the GLM-5.1 - zai model on OpenClaw, and I'm experiencin...

Yep. First thing, what does openclaw --version show?

The “text disappears for ~10 seconds, then reappears” part sounds a lot like Discord preview streaming UX, not just raw model latency. In the Discord docs, channels.discord.streaming can be off | partial | block | progress, and the default is off because Discord preview edits can be weird/rate-limity. If you have partial or block enabled, try off and test again.

Please paste these next:

openclaw --version
openclaw status
openclaw models status
openclaw logs --follow

Then reproduce one slow GLM prompt while logs are open and share the relevant chunk, but please do not paste API keys/tokens.

Two quick isolation tests that help a lot:

#
  1. Run the same short prompt on GLM-5.1 and on another model you already know feels fast.
  2. On GLM-5.1, try a dead-simple prompt like: Reply with OK only.

How to read that:

  • simple prompt is fast, normal prompt is slow → likely tool/reasoning overhead on the Z.AI side
  • everything is slow only on GLM-5.1 → likely provider/model-specific latency
  • everything is slow on all models → likely gateway/Discord/config side instead

Version matters here a lot, because Z.AI/GLM streaming behavior has changed across releases.

Docs:
https://docs.openclaw.ai/help/faq
https://docs.openclaw.ai/channels/discord
https://docs.openclaw.ai/providers/glm

spare bridge
#

Here are the outputs from my system:

Version: OpenClaw 2026.4.14 (323493f)
OS: linux 4.4.180+ (x64) · node 22.22.2

System Status & Models:

Plaintext
OpenClaw 2026.4.14 (323493f)
OS: linux 4.4.180+ (x64) · node 22.22.2
Channel: stable (v2026.4.14)
Gateway: local · ws://127.0.0.1:18789
Sessions: 55 active · default zai-org/GLM-5.1-FP8 (203k ctx)

Config: /data/openclaw.json
Default Model: zai/zai-org/GLM-5.1-FP8
Fallbacks: openrouter/nvidia/nemotron-3-super-120b-a12b:free

Providers:

  • anthropic (env)
  • groq (env)
  • openrouter (models.json)
  • zai (models.json)
    Isolation Tests:

Simple prompt (Reply with OK only) on GLM-5.1: It is [LENT ou RAPIDE ?] compared to normal prompts.

Comparison with Nemotron: Nemotron feels [PLUS RAPIDE ou AUSSI LENT ?].

I am checking the channels.discord.streaming setting in my config now. I'll follow up with the openclaw logs --follow output if the issue persists after testing the streaming setting

neon joltBOT
#

I’ve got enough to narrow it down a bit. Let me check whether that exact Z.AI model ID or provider path has any known rough edges.

#

Thanks, this helps a lot.

2026.4.14 is recent enough that I’m less suspicious of an old OpenClaw/Z.AI bug. Right now the most likely buckets are:

  1. Discord preview streaming UX
    If channels.discord.streaming is partial or block, that can absolutely cause the “text disappears, then comes back” behavior.

  2. Z.AI first-token latency / tool overhead
    If GLM is slow even on very short prompts, that points more to provider/model latency.

  3. Long session/context buildup
    This one matters a lot for GLM. If old threads are slow but a fresh thread is fine, that’s the clue.

The next best test is:

  • check whether channels.discord.streaming is on, if yes, try off
  • run Reply with OK only in a brand-new thread or DM
#
  • compare that with the same prompt on the other model

Can you tell me these three results once you test?

  • Is GLM still slow on the simple prompt?
  • Does Nemotron feel faster?
  • Is a fresh thread faster than this current one?

If fresh thread is faster, I’d strongly suspect context/session growth rather than raw model speed. There was also a reported rough edge around ZAI not benefiting from the same cache-ttl pruning path as Anthropic, so long GLM sessions can be a bit more painful.

If it’s still slow after turning streaming off, then yes, please paste the openclaw logs --follow chunk for one slow request. I’m mainly looking for:

  • long gap before first token
  • retries / cooldowns / rate limits
  • tool calls happening before any visible text

Docs:

spare bridge
#

Quand j'envoie un simple prompt, il est très lent. Dès que j'appuie sur envoyer, le texte disparaît, j'attends dix minutes avant qu'il me réponde. Némotron répond plus rapidement mais il est moins performant.

neon joltBOT
#

Simple prompt being that slow is a really important clue. I’m going to check whether there are known Z.AI retry/cooldown issues that match a “wait forever, then eventually answer” pattern.

#

Merci, ça change bien le diagnostic.

Si un prompt trivial prend ~10 minutes sur GLM, et que Nemotron est plus rapide sur la même instance, alors ce n’est probablement pas seulement channels.discord.streaming.
Le streaming peut expliquer le texte qui disparaît, mais pas 10 minutes d’attente.

Ă€ ce stade, les suspects principaux sont :

  1. latence / rate limit côté Z.AI
  2. boucle de retry / cooldown sur le provider Z.AI
  3. endpoint / région Z.AI pas idéal

Il y a déjà eu des reports côté Z.AI avec des symptômes du style rate_limit, cooldown, timeout, ou fallback qui ne part pas bien.

Peux-tu faire ce test et coller la partie utile des logs, sans secret :

#
openclaw models status --probe
openclaw logs --follow

Puis envoie un seul prompt très court, et regarde si tu vois dans les logs des mots comme :

  • rate limit
  • cooldown
  • timeout
  • retry
  • FailoverError
  • No available auth profile for zai

Et dis-moi aussi :

  • est-ce que c’est encore lent dans un thread tout neuf ?
spare bridge
#

C'est très clair, le problème vient donc directement du fournisseur (Z.AI) ou de la manière dont ce modèle spécifique réagit, puisque Nemotron fonctionne normalement sur ton installation.

Voici le message précis à envoyer à Krill pour qu'il puisse t'aider à débugger le côté "Provider" :

The results are very clear:

Simple prompt (Reply with OK only) on GLM-5.1: It is extremely slow. It took 40 seconds just to reply with "OK".

Normal prompts on GLM-5.1: It's even worse, sometimes taking several minutes to get a response.

Comparison with Nemotron: Nemotron is much faster and responds normally.

Fresh thread: Even in a brand-new thread, GLM-5.1 remains very slow.

Config: channels.discord.streaming is not defined (default off).

Since Nemotron works fine, it seems specific to the GLM-5.1 / Z.AI provider path. Should I provide logs for a specific GLM request to see if it's a first-token latency or something else?

neon joltBOT
#

Oui, exactement, là on a presque isolé le problème 👍

Avec ce que tu as trouvé :

  • thread neuf = toujours lent
  • streaming Discord = off par dĂ©faut
  • Nemotron = normal
  • GLM simple prompt = ~40s juste pour OK

… ça pointe surtout vers le chemin Z.AI / ce modèle précis, pas vers Discord ni vers le contexte de session.

Le meilleur next step, c’est oui, fournir les logs d’une requête GLM précise.
Et j’ajouterais un test encore plus utile, pour séparer provider vs agent wrapper :

openclaw infer model run --model zai/zai-org/GLM-5.1-FP8 --prompt "Reply with OK only" --json
#
openclaw models status --probe
openclaw logs --follow

Puis :

  1. lance logs --follow
  2. fais une seule requête GLM très courte
  3. colle la portion de log autour de cette requĂŞte

Ce que je cherche dans les logs :

  • timestamp de dĂ©part de la requĂŞte
  • dĂ©lai avant le premier token
  • timeout
  • retry
#
  • rate_limit
  • cooldown
  • FailoverError

Le test infer model run est super utile ici :

  • si lui aussi est lent, c’est presque sĂ»rement Z.AI / endpoint / modèle
  • si lui est rapide mais pas le chat normal, alors le souci est plutĂ´t dans la couche agent/session

Et j’aimerais aussi savoir quel setup Z.AI tu utilises :

  • zai-api-key
  • zai-coding-global
  • zai-global
  • zai-cn

Ça peut vraiment changer le diagnostic.

spare bridge
#

System: [2026-04-16 18:33:50 GMT+2] Gateway restart config-patch ok (config.patch)
System: Suppression des 2 groupes Telegram inutilisés (-5248606366 et -5086269084). Seuls le DM et le groupe SEO (-5161923848) restent.
System: Run: openclaw doctor --non-interactive

Read HEARTBEAT.md if it exists (workspace context). Follow it strictly. Do not infer or repeat old tasks from prior chats. If nothing needs attention, reply HEARTBEAT_OK.
When reading HEARTBEAT.md, use workspace file /data/workspace/HEARTBEAT.md (exact case). Do not read docs/heartbeat.md.
Current time: Thursday, April 16th, 2026 - 6:33 PM (Europe/Paris) / 2026-04-16 16:33 UTC J'ai souvent ce message.

neon joltBOT
#

Let me check the heartbeat docs first, because that message looks much more like an internal heartbeat prompt leaking into chat than a GLM-only latency problem.

#

Oui, ce message-là n’est pas normal.
C’est presque sûrement le prompt interne de heartbeat qui fuit dans le chat.

Le texte :

Read HEARTBEAT.md ... reply HEARTBEAT_OK

correspond exactement au prompt heartbeat par défaut dans les docs. Donc ça, ce n’est pas GLM qui “hallucine”, c’est plutôt un souci de routing / heartbeat leak.

Les points à vérifier :

agents.defaults.heartbeat

En particulier :

#
  • target
    • si c’est "last", OpenClaw peut envoyer les heartbeats au dernier contact/conversation
    • si tu ne veux pas ça, mets "none"
  • every
    • si tu n’utilises pas heartbeat, mets "0m" pour le dĂ©sactiver
  • includeReasoning
    • garde-le Ă  false

Donc, en pratique :

  • si tu n’as pas besoin de heartbeat, dĂ©sactive-le
  • si tu veux le garder, ne le route pas vers ce chat Discord

Vu que tu es déjà sur 2026.4.14 et que tu vois encore le prompt brut, ça sent aussi le bug de fuite heartbeat plus qu’un simple mauvais réglage.

Si tu veux, colle juste le bloc sanitized de :