How to keep pushing a project semi automated? | Friends of the Crustacean 🦞🤝 | Page 1

thorn talon Apr 12, 2026, 10:38 PM

#

I got a director agent that is asking to other 3 agents to do tasks... but the problem I find is that I have to stay there, just to say "keep going until it works" and I think that's stupid, once they got the README with clear instructions, they might enter in work mode and don't stop until they got doubts or think they completed... is that possible??

wintry pewterBOT Apr 12, 2026, 10:38 PM

#

dull sigilBOT Apr 12, 2026, 10:38 PM

#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

wintry pewterBOT Apr 12, 2026, 10:38 PM

#

placid waspBOT Apr 12, 2026, 10:40 PM

#

Yeah, mostly yes.

What you want is basically the orchestrator pattern, and OpenClaw supports it, but with a few switches.

If you're using native subagents, the big one is:

{
  agents: {
    defaults: {
      subagents: {
        maxSpawnDepth: 2,
        maxChildrenPerAgent: 5,
        runTimeoutSeconds: 0, // no default timeout
      },
    },

#

  },
}

Why this matters:

default maxSpawnDepth is 1, so a child can't manage more children
maxSpawnDepth: 2 enables main → director → workers

If you also want them to stay attached to Discord threads/sessions:

{
  channels: {
    discord: {
      threadBindings: {

#

        enabled: true,
        spawnSubagentSessions: true,
      },
    },
  },
}

Then spawn with thread: true and mode: "session".

Small caveat, because this part matters:

yes, they can keep working without you saying “keep going”
no, there is not really a safe “retry forever no matter what” mode
they still stop when they think they are done or blocked

So the task prompt/README should explicitly say something like:

#

Keep iterating until the task is complete. Run checks/tests, fix failures, and only stop if you are blocked by missing info or credentials.

If you're using Codex / Claude Code / Gemini CLI style workers instead of native subagents, the same idea exists, but that's under ACP sessions, not subagents.

Docs:

If you want, paste:

openclaw --version
your agents.defaults.subagents config
whether these workers are native subagents or ACP/Codex-style sessions

and I’ll tell you the exact setup.

thorn talon Apr 12, 2026, 10:43 PM

#

this is what I see in the agents part:
"agents": {
"defaults": {
"model": {
"primary": "openai-codex/gpt-5.4"
},
"models": {
"openai-codex/gpt-5.4": {}
},
"workspace": "C:\Users\Openclaw\.openclaw\workspace",
"compaction": {
"mode": "safeguard"
},
"maxConcurrent": 4,
"subagents": {
"maxConcurrent": 8
}
},

and this is the channels part:
"channels": {
"telegram": {
"enabled": true,
"groups": {
"*": {
"requireMention": true
}
},
"groupPolicy": "allowlist",
"streaming": {
"mode": "partial"
},
"accounts": {
"default": {
"botToken": "876701648_xSVDeI-6CDHPil9QlwOw8Q",
"dmPolicy": "pairing"
},
"marketing": {
"botToken": "8707578396:AAEeFpcPMYNCle2ov0wQtU",
"dmPolicy": "pairing"
},
"reverse": {
"botToken": "8532686362:Y6gttKRLW_ejtaoRAp6HIGBps",
"dmPolicy": "pairing"
},
"football": {
"botToken": "862415a73f3VP-kddiBV6f9CQM7w-So",
"dmPolicy": "pairing"
},
"lowcode": {
"botToken": "8039779443:AANEupJhcAFVqPnbcs",
"dmPolicy": "pairing"
},
"tibia": {
"botToken": "6205292813:AAHWIiDHTFR2_ffmpXGo",
"dmPolicy": "pairing"
}
}
}
}

placid waspBOT Apr 12, 2026, 10:43 PM

#

Two important things:

Please rotate those Telegram bot tokens ASAP. You pasted real secrets in a public thread.
In the config you showed, subagents only has maxConcurrent, so you have not enabled nested orchestration yet.

If your flow is really:

main → director → 3 workers

then add this under agents.defaults.subagents:

{
  agents: {
    defaults: {
      subagents: {

#

        maxConcurrent: 8,
        maxSpawnDepth: 2,
        maxChildrenPerAgent: 5,
        runTimeoutSeconds: 0
      }
    }
  }
}

What each does:

maxSpawnDepth: 2 = lets a subagent act as a director/orchestrator
maxChildrenPerAgent: 5 = lets that director have several workers
runTimeoutSeconds: 0 = no default timeout

A couple caveats:

#

Yes, this can remove the need for you to keep saying “keep going”
No, there is not a magic “work forever until success” switch
the agent can still stop if it thinks it is done or blocked

So your README/task should be explicit, something like:

Keep iterating until the task is complete. Run checks/tests, fix failures, and only stop if blocked by missing info, permissions, or credentials. Do not ask for confirmation between normal steps.

Also, the channels block you pasted is Telegram-only, so thread-binding settings are not relevant here. Thread-bound persistent subagent sessions are currently a Discord thing, not Telegram.

One more nuance:

if main agent is directly spawning the 3 workers, maxSpawnDepth is not needed
if the director subagent is spawning the workers, then maxSpawnDepth: 2 is the key

Docs:

#

If you paste openclaw --version, I can give you the exact final config shape for your setup.

thorn talon Apr 12, 2026, 10:47 PM

#

after changing the agents.defaults.subagents should I restart something?

placid waspBOT Apr 12, 2026, 10:48 PM

#

Usually, restart the Gateway, not the whole machine.

OpenClaw does support config hot-reload by default (gateway.reload.mode: "hybrid"), so some config changes apply without a manual restart, but for agents.defaults.subagents I would do the safe thing:

save config
run openclaw gateway restart
stop any old director/workers
start a fresh run

Why I’d restart anyway: subagent role/capabilities are written when the session is spawned, so an already-running director may keep the old behavior.

So short answer: not always required, but yes, for this change I’d restart the gateway and respawn the agents.

Docs:

#

If you want, after you change it I can sanity-check the exact final agents.defaults.subagents block.

thorn talon Apr 12, 2026, 11:19 PM

#

I've connected to openai because I have the 30$ plan but I am not sure which limits I have

placid waspBOT Apr 12, 2026, 11:20 PM

#

Short version: the $30 ChatGPT/OpenAI subscription does not tell you the exact OpenClaw limits by itself.

The important split is:

Codex OAuth / ChatGPT sign-in → subscription-style quota, OpenAI-managed
OpenAI API key → pay-per-token Platform API billing

In OpenClaw those are treated as different auth paths, so:

ChatGPT/Codex sign-in does not automatically mean you have direct API credits
Codex OAuth limits can be different from the ChatGPT website/app
the exact usable window is whatever OpenAI is exposing to that auth path

Best way to check what OpenClaw can currently see:

#

openclaw models status

and if you want more detail:

openclaw models status --json

If you want the most predictable limits/billing, use an OpenAI API key instead of subscription OAuth.

Docs:

#

If you paste the output of openclaw models status and openclaw --version (with anything sensitive redacted), I can help interpret it.

thorn talon Apr 13, 2026, 12:15 PM

#

14:14:30+02:00 [tools] exec failed: elevated is not available right now (runtime=direct). Failing gates: allowFrom (tools.elevated.allowFrom.<provider> / agents.list[].tools.elevated.allowFrom.<provider>) Context: provider=exec-event session=agent:main:telegram:direct:846947265 Fix-it keys: - tools.elevated.enabled - tools.elevated.allowFrom.<provider> - agents.list[].tools.elevated.enabled - agents.list[].tools.elevated.allowFrom.<provider> raw_params={"command":"Get-ChildItem C:\ProgramData\chocolatey\logs; Write-Output '---'; Get-Content C:\ProgramData\chocolatey\logs\chocolatey.log -Tail 200","workdir":"C:\Users\Openclaw\.openclaw\workspace","yieldMs":10000,"timeout":120,"elevated":true}

placid waspBOT Apr 13, 2026, 12:16 PM

#

That log is actually pretty revealing:

the tool call asked for elevated: true
but the session says runtime=direct

So in this case, elevated mode is not needed. In direct runtime, exec is already on the host. The failure is only because the agent requested the elevated escape hatch and your config does not allow it.

First thing I’d try

Send this in that session:

/elevated off

Then retry. Or start a fresh session and make sure the agent is not forcing elevated exec.

If you really do want elevated mode

#

Only needed when you're running sandboxed and want to break out of the sandbox.

Add something like:

{
  tools: {
    elevated: {
      enabled: true,
      allowFrom: {
        telegram: ["tg:YOUR_TELEGRAM_NUMERIC_ID"]
      }
    }
  }
}

#

Then restart the gateway:

openclaw gateway restart

Important nuance

If you also have per-agent restrictions under agents.list[].tools.elevated, then both the global and per-agent allowlists must pass.

So the practical answer is:

direct runtime → don't use elevated unless you really mean to
sandboxed runtime → configure tools.elevated.*

Docs:

#

If you want, paste your tools block and I’ll point to the exact place to add it.

thorn talon Apr 13, 2026, 12:28 PM

#

the problem is specialist coordination short tasks are ok but longer ones ends up in timeouts and desyncs, I try to have 1 agent to give instructions to other 3 agents but I keep getting trouble... it should not matter that I am running in VMWARE virtual box with network adaptor in NAT mode??

placid waspBOT Apr 13, 2026, 12:29 PM

#

VMware NAT mode is probably not the main problem.

If all the agents are running inside the same Gateway/VM, then subagent coordination is mostly internal, so NAT usually does not break that by itself.

Where NAT can matter:

if something outside the VM needs to connect into it
if the VM has flaky outbound internet / DNS
if the VM sleeps, pauses, or drops network for a moment

But for what you described, timeouts/desyncs on long specialist runs sound more like runtime limits or orchestration design, not NAT.

Most likely culprits:

ACP runtime timeout
If these are ACP/Codex-style sessions, the bundled acpx runtime defaults to 120 seconds for embedded turns. Longer work can hit that.
Docs: https://docs.openclaw.ai/tools/acp-agents

#

Subagent run timeout
For native subagents, check whether you or the agent are setting runTimeoutSeconds.
Docs: https://docs.openclaw.ai/tools/subagents
Exec timeout
Long shell commands can also die on exec timeout. Default is 1800s.
Docs: https://docs.openclaw.ai/tools/exec
Too much micro-coordination
1 director constantly steering 3 workers in tiny steps is honestly brittle. Bigger, self-contained tasks per worker are usually much more reliable.

So my answer is:

NAT mode alone: probably no
timeouts/desyncs from long orchestration: probably yes, but from config/runtime/prompt structure

Before going deeper, please paste:

#

openclaw --version

and ideally the exact timeout/desync message you're seeing. That’ll tell us whether this is:

native subagent timeout
ACP timeout
exec timeout
or a network/session issue inside the VM.

thorn talon Apr 13, 2026, 12:32 PM

#

exactly I am running Codex, I got runtime timeouts yes
not sure how to check that
there are some exec timeouts too

I got the latest openclaw version

placid waspBOT Apr 13, 2026, 12:33 PM

#

Yep, that points much more to Codex ACP timeout, not VMware NAT.

If you’re running Codex, the first thing to check is:

/acp status

That shows the effective runtime options for that ACP session.

What to do

For a quick test, raise the session timeout:

/acp timeout 300

#

If 5 minutes is still too short, try 600.

If you want to change the default globally:

openclaw config set plugins.entries.acpx.config.timeoutSeconds 300
openclaw gateway restart

The docs say the bundled acpx runtime defaults embedded turns to 120 seconds, so long Codex turns can absolutely hit that.

About the exec timeouts

Those are a separate layer.

exec default timeout is 1800s
but if the ACP runtime turn dies first, the whole Codex turn can still fail earlier

#

So I’d fix the ACP timeout first, then worry about exec.

Practical advice

Honestly, 1 director + 3 Codex workers on one setup is pretty easy to make brittle.
I’d try:

1 director + 1 or 2 workers
bigger worker tasks
less constant back-and-forth steering

That tends to desync a lot less.

If you want one more useful check

Run:

#

/acp doctor

Docs:

If you paste the output of /acp status, I can tell you exactly which timeout is biting you.

thorn talon Apr 13, 2026, 12:34 PM

#

ACP error (ACP_SESSION_INIT_FAILED): Session is not ACP-enabled: agent:main:telegram:direct:846947265
next: If this session is stale, recreate it with /acp spawn and rebind the thread.

placid waspBOT Apr 13, 2026, 12:34 PM

#

Yep, that error makes sense.

It means you ran an ACP command in a normal OpenClaw session, not in a Codex ACP session.

This part says it clearly:

Session is not ACP-enabled: agent:main:telegram:direct:...

So the fix is:

In that Telegram conversation, spawn a Codex ACP session into the current chat:

/acp spawn codex --bind here

#

Then check it:

/acp status

Then raise the timeout:

/acp timeout 300

If you want to fully reset it first:

/acp close

#

/acp spawn codex --bind here

Important distinction

agent:main:telegram:direct:... = normal OpenClaw chat session
ACP commands only work after you bind/spawn an ACP session into that conversation

Since you’re on Telegram, --bind here is the simplest path. You do not need a Discord-style thread for that.

If later you want child topic/thread spawning on Telegram, that’s a separate config path, but I’d get --bind here working first.

Docs:

https://docs.openclaw.ai/tools/acp-agents

If you want, I can give you the exact 3-command sequence I’d use to reset the chat cleanly and re-test Codex.

thorn talon Apr 13, 2026, 12:35 PM

#

well I told you that I am using codex because I see this in the gateway console when starting: [gateway] agent model: openai-codex/gpt-5.4

placid waspBOT Apr 13, 2026, 12:36 PM

#

Yep, you're right, and that confusion is on me.

[gateway] agent model: openai-codex/gpt-5.4 means:

your normal OpenClaw agent is using the Codex/OpenAI auth+model route
it does not mean the session is an ACP Codex harness session

Those are two different things.

So in your case:

agent:main:telegram:direct:... = normal OpenClaw session
model = openai-codex/gpt-5.4
therefore /acp ... is the wrong tool path unless you explicitly spawned an ACP session

What to focus on instead

For your setup, the relevant part is native subagents, not ACP.

#

So I’d look at:

maxSpawnDepth: 2
if your director subagent is spawning worker subagents
/subagents info and /subagents log
to see which child is timing out / desyncing
exec timeouts
because subagents themselves do not have a default timeout unless you set runTimeoutSeconds; exec still has its own timeout

Important implication

If you did not set:

agents.defaults.subagents.runTimeoutSeconds

#

then your “timeouts” are probably not subagent-run timeouts. They’re more likely:

exec timeout
model/provider stall
orchestration getting brittle from too much back-and-forth

My recommendation

For now, ignore ACP and test this path:

keep maxSpawnDepth: 2
use 1 director + 1 or 2 workers
give workers bigger chunks
inspect with /subagents list, /subagents info, /subagents log

Docs:

#

If you want, paste the exact subagents block you ended up with, and I’ll tell you whether it matches the “director + workers” setup correctly.

thorn talon Apr 13, 2026, 1:01 PM

#

my director has 3 workers

placid waspBOT Apr 13, 2026, 1:02 PM

#

Ah, then I’m very suspicious of your concurrency settings.

From the config you posted earlier, you had:

agents.defaults.maxConcurrent: 4
agents.defaults.subagents.maxConcurrent: 8

But with main + director + 3 workers, you can end up with 5 active runs.

And per docs:

agents.defaults.maxConcurrent = max parallel agent runs across sessions
subagents.maxConcurrent = subagent lane cap, not the overall global cap

So yes, 3 workers can already be bumping into your global limit.

I’d change it to something like

#

{
  agents: {
    defaults: {
      maxConcurrent: 6,
      subagents: {
        maxConcurrent: 8,
        maxSpawnDepth: 2,
        maxChildrenPerAgent: 5,
        runTimeoutSeconds: 0
      }
    }
  }
}

Why

#

maxSpawnDepth: 2 = director can have workers
maxChildrenPerAgent: 5 = 3 workers is comfortably allowed
runTimeoutSeconds: 0 = no default subagent timeout
maxConcurrent: 6 = gives headroom for main + director + 3 workers

Then restart the gateway.

My honest recommendation

Even if 3 workers is supported, I’d still test with 2 workers first.
Multi-agent setups get flaky fast when you add too much fan-out, especially on a VM.

Docs:

If you want, send the exact final agents.defaults block before you restart and I’ll sanity-check it.

thorn talon Apr 13, 2026, 1:05 PM

#

[13/04/2026 15:03] Adrian H.: /subagents info reverse
[13/04/2026 15:03] Director: ⚠️ Unknown subagent id: reverse

I dont know the ids but if it's what's specified in openclaw.json then something is wrong because I can see there is an agent called reverse with id reverse

I dont get what should I do with runTimeoutSeconds

the subagents list brings this:
active subagents:

(none)

recent subagents (last 30m):

(none)

thorn talon Apr 13, 2026, 1:50 PM

#

can you help?

placid waspBOT Apr 13, 2026, 1:51 PM

#

Yep. Since you're on native OpenClaw sessions using the Codex model route, this is the config I’d try first.

Keep your existing model/workspace settings, just merge in these parts

{
  agents: {
    defaults: {
      maxConcurrent: 6,
      subagents: {
        maxConcurrent: 8,
        maxSpawnDepth: 2,
        maxChildrenPerAgent: 5,
        runTimeoutSeconds: 0
      }
    }

#

  },
  tools: {
    exec: {
      timeoutSec: 3600
    }
  }
}

Why these changes

maxConcurrent: 6
gives enough headroom for main + director + 3 workers
maxSpawnDepth: 2
required for director → workers
maxChildrenPerAgent: 5
comfortably allows 3 workers

#

runTimeoutSeconds: 0
avoids default subagent timeouts
tools.exec.timeoutSec: 3600
gives long shell/test steps more room

Then do this

save config
run openclaw gateway restart
start a fresh run, don’t reuse the old one
test with 2 workers first, then 3 if stable

If it still flakes

Use:

/subagents list
/subagents info <id>
/subagents log <id>

#

Docs:

If you paste your updated agents.defaults block, I’ll sanity-check it before you restart.

thorn talon Apr 13, 2026, 2:07 PM

#

All three failed immediately with the same pattern:

• timeout to ws://127.0.0.1:18789
• local loopback path
• bind still reported as lan

So the bottleneck is clearly gateway/runtime health, not specialist capability anymore.

placid waspBOT Apr 13, 2026, 2:08 PM

#

Yep, I agree, that’s gateway/runtime health now, not worker quality.

And one important nuance:

if gateway.bind is "lan", the gateway should still be reachable on 127.0.0.1 from inside the VM
so a timeout to ws://127.0.0.1:18789 usually means the gateway is not actually listening / is unhealthy / wrong port / wrong config path, not that LAN bind itself is the problem

Also, since you're on VirtualBox NAT:

NAT can affect host ↔ guest access unless you forward ports
but it should not break 127.0.0.1 inside the guest

I’d check these next, in this order

openclaw gateway status --deep
openclaw health

#

If it looks stuck or unhealthy:

openclaw gateway restart

Things to verify

Port really is 18789
docs say that’s the default WS port
You are editing the active config file/profile
stale/wrong config path can make “bind says lan” misleading
If gateway.bind: "lan" is set, gateway.auth is valid
non-loopback binds require a real auth path

Very likely causes from what you described

#

gateway process wedged / not listening
wrong config file loaded
wrong port
auth/bind mismatch after config changes

What to paste here

The output of:

openclaw gateway status --deep

That will probably tell us the next move immediately.

Docs:

#

https://docs.openclaw.ai/cli/gateway

thorn talon Apr 13, 2026, 2:11 PM

#

openclaw gateway status --deep

with gateway running or stopped?

placid waspBOT Apr 13, 2026, 2:11 PM

#

With it running first.

That gives the useful signal.

So:

leave the gateway running
run:

openclaw gateway status --deep

If it says the gateway is unreachable / not listening / unhealthy, that’s already the clue.

If the gateway is currently stopped, status --deep is still safe to run, but it’ll mostly just confirm “not running”.

#

So the best sequence is:

openclaw gateway status --deep

and if needed after that:

openclaw gateway restart
openclaw gateway status --deep

Paste the output here and I’ll help read it.

thorn talon Apr 13, 2026, 2:45 PM

#

https://pastebin.com/sVHbA59E

#

did you check?

placid waspBOT Apr 13, 2026, 2:48 PM

#

Not yet, I can’t see your terminal from here.

I need you to paste the output of:

openclaw gateway status --deep

Then I can check it with you right away.

thorn talon Apr 13, 2026, 2:48 PM

#

the output is here: https://pastebin.com/sVHbA59E

placid waspBOT Apr 13, 2026, 2:48 PM

#

Yep, I checked it.

Good news: the final status is healthy now.

These are the key lines:

RPC probe: ok
Listening: 0.0.0.0:18789

So right now the gateway is up and answering.

What actually happened

The important part is the middle section:

Found stale gateway process(es)
Port 18789 is already in use
restart timed out waiting for health

#

So the real issue was a stale/wedged gateway process, not “3 bad workers” and not really NAT.

Also, this part is normal:

bind=lan
probe target ws://127.0.0.1:18789

A LAN bind can still be probed through loopback locally.

What I’d do next

Now that the gateway is healthy again:

start a fresh director/worker run
do not reuse the failed old sessions
if it happens again, run:

#

openclaw doctor

If the stale-process thing keeps repeating

Use a cleaner bounce:

openclaw gateway stop
openclaw gateway start

instead of only restart.

So, short version: the gateway was sick, now it looks healthy. Retest from a fresh run.

If the next run still dies, paste that new error and we can narrow it down fast.

thorn talon Apr 13, 2026, 2:49 PM

#

how do I use a fresh run?

placid waspBOT Apr 13, 2026, 2:49 PM

#

Yep, easiest way:

send /stop
this aborts the current run and stops spawned subagents too
then send /new
that starts a fresh session

So the clean sequence is:

/stop
/new

Send them as standalone messages.

#

That gives you a fresh run without deleting old transcripts.

Docs:

If you want, after /new I can help you word the next “director + workers” prompt so it’s less likely to stall again.

thorn talon Apr 13, 2026, 2:59 PM

#

I saw there are interfaces to see your agents, and that you can even see how they meet in a room and work together, how is that GUI?? it takes extra tokens or something?

placid waspBOT Apr 13, 2026, 3:00 PM

#

Yep, the built-in GUI is the Control UI / Dashboard.

Open it with:

openclaw dashboard

or directly in a browser at:

http://127.0.0.1:18789/

Does the GUI itself cost extra tokens?

Normally, no.

#

Just looking at:

sessions
logs
agents/tools panels
status/info

does not itself burn model tokens.

You spend tokens when the UI causes an actual agent/model run, like:

sending a chat message
spawning workers
asking for a new reply
running model-backed tools

So the GUI is mostly a control/inspection layer, not a separate billing layer.

One nuance

#

If you saw a really fancy “agents in a room working together” view, that may have been:

a demo
a custom client
or a visualization on top of the same gateway/session data

The official built-in thing is the Control UI / Dashboard.

Since you’re on VirtualBox NAT

The easiest path is:

open the dashboard inside the VM, or
use openclaw dashboard there

If you want to open it from your host machine, NAT usually needs port forwarding first.

Docs:

#

https://docs.openclaw.ai/web/dashboard

If you want, I can tell you the exact easiest way to open the dashboard from your current VM setup.

#How to keep pushing a project semi automated?

First thing I’d try

If you really do want elevated mode

Important nuance

What to do

About the exec timeouts

Practical advice

If you want one more useful check

Important distinction

What to focus on instead

Important implication

My recommendation

I’d change it to something like

Why

My honest recommendation

Keep your existing model/workspace settings, just merge in these parts

Why these changes

Then do this

If it still flakes

I’d check these next, in this order

Things to verify

Very likely causes from what you described

What to paste here

What actually happened

What I’d do next

If the stale-process thing keeps repeating

Does the GUI itself cost extra tokens?

One nuance

Since you’re on VirtualBox NAT