#Task Queue & Delegation
1 messages ยท Page 1 of 1 (latest)
while i have you, trusty claw bot - what would be the smartest way to have a list of tasks permanently worked on in a defined way.
Llet's for example say there is this list of tasks somewhere. Different tasks with different goals. Maybe research, coding, different things.
I want to have this checked regularly and have them delegated to specialized agents.
Would i just create an agent for the triage/delegation and give that agent a heartbeat to have him check those regularly and then spin up subagents?
Yeah, that's basically the right pattern. Here's how I'd architect it with OpenClaw's primitives:
The core loop:
-
Task list as a file โ keep a
TASKS.mdortasks.jsonin the triage agent's workspace. Each task has a status, goal, type, and any context. Plain files > databases here because the agent can read/write them directly. -
Triage agent with heartbeat โ exactly as you described. Give it a heartbeat (e.g. every 30min or whatever cadence makes sense). On each heartbeat it:
- Reads the task list
- Checks which tasks need work (new, stalled, waiting for review)
- Spawns sub-agents via
sessions_spawnfor the actual work - Updates task statuses based on completed sub-agent results
-
Specialized agents โ define different agent configs (via
agents:in your config) for different task types. Research agent, coding agent, etc. Each with their own system prompt, model choice, and tool access. The triage agent targets them byagentIdwhen spawning.
Key design decisions:
- Heartbeat vs. cron: Heartbeat is simpler since the triage agent stays alive and has context. Cron with
agentTurnin isolated sessions works too but you lose continuity between runs.
- Parallelism:
sessions_spawnruns async โ the triage agent can kick off multiple sub-agents at once and they'll report back when done. - State tracking: The triage agent should write results back to the task file after each sub-agent completes. That way it knows what's done on the next heartbeat.
- Escalation: Have the triage agent notify you (via message) when something needs human input rather than spinning forever.
Rough config shape:
agents:
task-triage:
heartbeat: { everyMinutes: 30 }
# system prompt tells it how to read TASKS.md,
# what agents are available, when to escalate
research-agent:
# optimized for web search, summarization
coding-agent:
# optimized for code tasks, repo access
The triage agent's HEARTBEAT.md would say something like: "Read TASKS.md. For each task with status 'pending' or 'in-progress', decide if it needs a sub-agent spawn. Spawn the appropriate agent. Update statuses. Report anything blocked."
That's the clean version. The main gotcha is making sure the triage agent doesn't re-spawn work that's already running โ so track sub-agent session keys in your task file and check if they're still active before re-dispatching.
Task Queue & Delegation
Oh that's interesting insight i didn't know so far. So the heartbeat keeps runn ing in one defined session is that the case? So with heartbeat there's really just a nudge kicked of in always the same session?
The whole task management thing would also include other agents that help in planning or create the tasks so i asusme this will be a more complex system - would you still keep this file based inswtead of havinga little application with an api or database? I guess file based is more natural for agents and easier to handle?
Good questions.
Heartbeat = same session, always. Exactly right. The heartbeat is just a periodic nudge into the agent's persistent session. So the triage agent wakes up, has full memory of what it did last time, can see its workspace files, and picks up where it left off. That's the big advantage over cron with isolated sessions โ continuity. The agent remembers that it spawned a research task 30 minutes ago and can check on it.
File-based vs. app/database โ it depends on scale, but files win for a while:
Why files work well:
- Agents read/write files natively โ zero friction
- Human-readable and editable (you can just open
TASKS.mdand add something) - Git-trackable if you want history
- No infrastructure to maintain
- The triage agent can reason about the whole task list in one read
When you'd outgrow files:
- Hundreds of concurrent tasks (context window gets heavy)
- Multiple agents writing to the same file simultaneously (race conditions)
- You need querying/filtering that's painful in flat files
- You want a dashboard or external integrations
My honest take: Start with files. Probably a structured tasks.json rather than markdown โ easier for agents to parse and update reliably. You could split into tasks/pending/, tasks/active/, tasks/done/ directories if the single file gets unwieldy. You'll know when you've outgrown it because the agent will start struggling with the file size or you'll hit conflicts.
If you do eventually need something more, a simple SQLite database in the workspace is the natural next step โ agents can query it via shell commands, it's still just a file, no server needed.
For the planning agents that create tasks: They'd just write to the same task file/directory. The flow becomes:
Planning agents โ write tasks to tasks.json
โ
Triage agent (heartbeat) โ reads tasks.json
โ
Specialized agents (spawned) โ do the work
โ
Triage agent โ updates tasks.json with results
The triage agent is the single owner of task state. Other agents propose tasks by writing to an inbox file or a proposed/ directory, and the triage agent decides what to action. That avoids conflicts and keeps one agent in charge of prioritization.
Oh, your input is really helpful. Thanks a lot. That helps a ton. I was also thinking about using files, but then there might be tasks for multiple projects, so there might be some kind of relation. That's where I thought a database might be more handy. But then using JSON files could do as well. Yeah, I see the pros and cons. I need to think about it probably. Unfortunately, there's many, many projects on my end going on and many different things that need to be sorted and executed and things like that. So, yeah, I need to think about it. Also, the insights into how Heartbeat works helped me a lot to understand some of the issues I had in the past. So those heartbeat sessions, are they always exclusive to heartbeats? So let's say my heartbeat instructions on an agent cause some kind of a message to be sent on channel. The session this message gets sent into could be different, right? So it's not the heartbeat session. The heartbeat session stays exclusively for the heartbeat, is that the case? Sorry if this might be a little bit confusing.
Not confusing at all โ it's an important distinction.
The heartbeat session IS the agent's main session. It's not a separate, isolated thing. If you chat with that agent on a channel, and the heartbeat fires, they're the same session. The agent sees both your messages and the heartbeat nudges in one continuous conversation. That's why it has continuity โ but it also means heartbeat processing and regular chat can interleave.
For outbound messages โ yes, those go wherever you tell them to. When the heartbeat causes the agent to send a message via the message tool to a Discord channel, that's just an outbound action. The message lands in whatever channel you target. If someone replies there, how that reply routes back depends on your channel config and which agent is bound to that channel.
So the flow you're imagining works like this:
Heartbeat fires โ triage agent's main session
โ agent reads tasks, decides to notify you
โ sends message to #updates channel via message tool
โ that message appears in Discord
โ you reply in Discord
โ reply routes back to... whichever agent owns that channel
The practical implication: If you want the triage agent to be a clean, dedicated task machine, don't also bind it to a chat channel where people talk to it ad-hoc. Keep it focused on its heartbeat loop. If it needs to alert you, it sends messages out to a channel, but its own session stays clean and task-focused.
If you want a separate agent for chatting about tasks, planning, etc. โ make that a different agent on a different channel. It can write to the same task files, but its session won't get cluttered with heartbeat processing.
For your multi-project situation: You could structure it as tasks/{project-name}.json โ one file per project. The triage agent scans all of them, but the data stays organized. Keeps the file-based approach viable for longer before you'd need a database.
ahh right, so it's the "main session" which probably means some things are happening in that session. But as my conversation with agents right now is only happeningv through discord these sessions are basically decoupled from the main session. When looking at the session view in the web interface i see the sessions containing the heartbeat just carryingb heartbeats only. Discord conversations being separated (multipes by channel/thread).
What else is this main session used for so what will go there automatically and mix up with the heartbeats? just for my understanding.
Also I guess (just asking to have clarity): There's always only one heartbeat per agent? There can't be differenbt heartbeat sessions running for different instruction files on the same agent?
Good questions โ I dug into the docs to make sure I give you accurate answers here.
You're right that Discord sessions are separate. Discord conversations get their own sessions (per channel/thread), so they don't mix with the main session. The main session is where the heartbeat fires, but if you're only interacting via Discord, the main session stays pretty clean โ mostly just heartbeats.
What else lands in the main session:
- Heartbeat runs (the primary resident)
- System events from cron jobs configured with
sessionTarget: "main"(these inject text into the main session and get picked up on the next heartbeat) - Manual wake events (
openclaw system event --text "...") - WebChat conversations (if you use the web UI directly)
So in practice, if you're only on Discord, the main session is basically heartbeat-only โ which is exactly what you're seeing in the session view.
One heartbeat per agent โ correct. Each agent gets a single heartbeat config (interval, prompt, target). You can't run multiple heartbeat loops with different instruction files on the same agent. If you want different periodic checks with different cadences or instructions, your options are:
- Multiple agents โ each with its own heartbeat config, prompt, and cadence. This is the cleanest separation.
- Heartbeat + cron jobs โ keep one heartbeat for routine checks, use cron jobs (isolated sessions) for specific tasks that need different timing or models. Cron can run in isolated sessions with their own model and thinking settings.
- One heartbeat, structured HEARTBEAT.md โ put all your periodic checks in one checklist and let the agent batch them in a single turn. Cheaper but less granular.
For your task queue system, I'd probably go: one triage agent with a heartbeat for the regular scan-and-delegate loop, and cron jobs for any tasks that need exact timing (daily reports, weekly reviews). The specialized worker agents don't need heartbeats at all โ they just get spawned by the triage agent when there's work.
This is really, really helpful, Malti. Thanks for all the information. That really helps a lot. Now we talked about sub-agents and session spawning before, and I think you referred to the coding skill that does do it differently instead of sub-agents. Can you tell me more about it? And why not sub-actions and things like that.
Good question โ these are two fundamentally different approaches to delegation and they're good at different things.
Sub-agents (sessions_spawn) are OpenClaw-native. They spin up an isolated agent session with its own context, tools, and model. The sub-agent has access to OpenClaw's full toolset (web search, file ops, message sending, browser, etc.) minus session tools. When it finishes, it announces results back to the chat. Think of it as "another you, running in parallel."
Coding agents (the skill) are external CLI processes โ Codex, Claude Code, Pi โ running as background terminal processes via exec with PTY. They're specialized coding tools with their own sandboxing, file awareness, and code editing capabilities. You control them by monitoring logs and sending keystrokes.
Why the coding skill doesn't use sub-agents:
| | Sub-agent (`sessions_spawn`) | Coding agent (background exec) |
| ------------- | ------------------------------------------ | ---------------------------------------------------- |
| **What it is** | Another OpenClaw agent session | External CLI process (Codex, Claude Code) |
| **Tools** | OpenClaw tools (web, files, browser, etc.) | Its own built-in tools (code edit, shell, etc.) |
| **Code editing** | Writes files via OpenClaw's `edit`/`write` | Purpose-built code editing with diffs, apply, review |
| **Sandboxing** | Session isolation only | Real filesystem sandbox (Codex `--full-auto`) |
| **Git awareness** | None built-in | Deep (branch, diff, commit, PR) |
| **Context** | Gets AGENTS.md + TOOLS.md, fresh session | Gets the project directory โ sees actual code |
| **Cost** | Full agent turn with OpenClaw overhead | Direct API call, often cheaper models |
| **Control** | Fire-and-forget, announces when done | Real-time monitoring, can send input mid-run |
| **Nesting** | Cannot spawn sub-sub-agents | Can run any shell command including other agents |
When to use which:
โข Sub-agents โ research, web scraping, data analysis, multi-tool tasks, anything that needs OpenClaw's tool ecosystem. Great for your task queue workers that need to search the web, read emails, check APIs, etc.
โข Coding agents โ actual code work. Building features, fixing bugs, reviewing PRs, refactoring. They're purpose-built for code and much better at it than a generic agent writing files.
โข Both together โ your triage agent spawns a sub-agent for research tasks and launches a Codex background process for coding tasks. Different tools for different jobs.
For your task queue system specifically: The triage agent would use sessions_spawn for most delegation (research, analysis, communication tasks), and exec with a coding agent for tasks that involve writing or reviewing code. The triage agent decides which mechanism fits each task type.
One important limitation: sub-agents can't spawn sub-agents. So your hierarchy is max two levels deep โ triage agent โ worker. If you need deeper delegation, the coding agent route (background exec) doesn't have that restriction since it's just a process running commands.
Maybe I was asking the wrong question. So I guess most likely there was an execution confusion. I know the difference between native subagents and the coding skill in general. We used the coding skill to start coding tasks with those coding CLIs, but I assume it did not start subagents, but just executed those tools. That was probably the problem. It fired off those processes and then stopped the inference, basically. So there was never something calling it back when the process is done, because this is what actually happens, right? Let them know they're done.
So I assume when we would in our task queue system have sub-agents use those CLI agents, we would not let them spawn them off in separate Pty's because then they would lose connection. Isn't it like that? Or how would you suggest handling this?
Exactly right โ you've nailed the problem. When a coding CLI gets launched as a background process and the agent turn ends, nobody's watching. The process runs to completion but there's nothing to pick up the result. It's orphaned from the agent's attention.
The fix depends on the architecture:
Option 1: Sub-agent runs the CLI in foreground (recommended)
The sub-agent starts Codex/Claude Code without background:true. The sub-agent's turn blocks until the CLI finishes. Since sessions_spawn is non-blocking from the triage agent's perspective, nothing else is stuck โ the sub-agent is just sitting there waiting.
Triage agent โ sessions_spawn (non-blocking)
โโโ Sub-agent starts โ exec pty:true (NO background)
โโโ Codex runs... finishes
โโโ Sub-agent reads output, announces result back
The sub-agent has its own session โ it can take 20 minutes and nobody cares. When Codex finishes, the sub-agent is still in its turn, sees the output, and announces back. Clean loop. Set runTimeoutSeconds on the spawn as a safety net.
Option 2: Background process + wake event
If you do need background (e.g., parallel CLIs), you embed a notification in the coding agent's prompt:
When completely finished, run:
openclaw gateway wake --text "Done: built the API" --mode now
This triggers a wake event that hits the main session's heartbeat, so the agent gets nudged: "hey, that Codex run finished." It's less clean โ the wake lands in the main session, not the sub-agent โ but it works for fire-and-forget patterns.
Option 3: Background + polling loop in the sub-agent
The sub-agent launches the CLI as background, then loops: poll, check if done, sleep, repeat. This works but it burns tokens on each poll cycle since every check is an agent turn continuation.
For your task queue, I'd go with Option 1:
Triage agent (heartbeat)
โ reads tasks.json
โ sessions_spawn for research tasks (sub-agent uses web_search, etc.)
โ sessions_spawn for coding tasks (sub-agent runs Codex in foreground)
โ both announce back when done
โ next heartbeat: triage agent sees results, updates task status
The key insight: the sub-agent IS the babysitter. You don't need to background the CLI because the sub-agent session is already decoupled from everything else. Let it block. That's its job.
Interesting.
Option 2: Butg this wakes the main session - not the session that coding task was started from right?
I'm just curious why the coding skill was done the way it is done now with like sending things into the background. I mean, I can see that, that the main agent just tells you, yeah, I kicked off the CLI task and it's running in the background now. But without any control, it kind of seems weird to have this as the default coding skill. I start to understand the structure a little better now.
On Option 2: Correct โ the wake event hits the main session (or whichever agent has heartbeat configured), not the session that spawned the coding task. So there's a disconnect. The wake is more of a "hey someone should look at this" broadcast, not a callback to the originator.
On why the coding skill works the way it does: It actually makes sense for its intended use case โ it's designed for interactive orchestration by the main agent, not autonomous delegation.
The flow it's built for:
User: "Fix this bug in the auth module"
โ Main agent kicks off Codex in background
โ Main agent replies: "Started Codex on it, I'll monitor"
โ User goes about their day
โ Main agent's heartbeat fires every 30 min
โ Heartbeat: agent checks process:log, sees progress
โ Agent reports back: "Codex finished, here's what it changed"
The main agent has a heartbeat โ so it naturally comes back and checks on things. Background mode makes sense here because:
โข The agent can reply to the user immediately ("I started it")
โข The agent can run multiple coding CLIs in parallel (batch PR reviews, parallel issue fixing)
โข The heartbeat is the built-in "come back and check" mechanism
โข The user can also ask "how's that Codex run going?" and the agent polls on demand
It's not fire-and-forget โ it's fire-and-check-on-heartbeat. The main agent is the long-lived babysitter with a periodic wake-up.
Where it breaks down is exactly what you hit: when you try to use this pattern from a sub-agent or any context without a heartbeat loop. The sub-agent launches the background process, its turn ends, there's no heartbeat to bring it back, the process is orphaned.
So the mental model is:
| Context | Background CLI | Foreground CLI |
| -------------------------- | ------------------------- | --------------------------------- |
| **Main agent** (has heartbeat) | โ
Works โ heartbeat polls | Works but blocks the agent |
| **Sub-agent** (no heartbeat) | โ Orphaned โ nobody polls | โ
Works โ sub-agent waits |
| **Cron isolated** (one-shot) | โ Same problem | โ
Works if timeout is long enough |
The coding skill just wasn't written with the sub-agent pattern in mind. It's a main-agent skill. For your task queue, the sub-agent-runs-foreground pattern is the right adaptation.
I see, okay, so the missing feedback from like the sub-agent approach where the nudge would be proactive is handled with the heartbeats on the current coding agent skill. That's interesting, okay, I didn't have that in mind, so that's obviously an option as well. So I would like to have a more detailed planning phase and thinking things through in the beginning and working from a queue of tasks. Alright.
Yeah exactly โ those are really the two patterns:
-
Heartbeat-driven polling (current coding skill): main agent stays alive, checks back periodically. Simpler to set up, works today.
-
Sub-agent as babysitter (task queue approach): sub-agent blocks on the CLI, announces result proactively when done. More structured, better for autonomous workflows.
For what you're describing โ a proper planning phase, structured task queue, deliberate delegation โ the sub-agent approach is clearly the better fit. The heartbeat polling works fine when a human says "do this thing" and waits, but a task queue needs deterministic completion signals, not "I'll check on it in 30 minutes."
When you're ready to actually build this out, the pieces are all there. The main design work is really in the task schema and the triage agent's decision-making โ the plumbing is straightforward. Happy to help think through the planning/architecture side when you get to it.
appreciate your help. I will think about it a little and hit you up with any more crusts getting in the way ๐