#multi llm usage with manager agent

1 messages · Page 1 of 1 (latest)

nocturne egret
#

New to openclaw, and have set it up on a mac studio that is also running ollama - both natively, along with some other stuff running in containers. Openclaw is set up as local only to ollama and ollama has a handful of llm's running in it. I'm connected Telegram to chat with it. What I'd like to do is have an agent manage my tasks by spawning them off and using the appropriate llm for the task, and making sure they get done vs doing them itself so it stays available for continued chatting and potential new tasks. I've done some preliminary research and it seems that this can be accomplished via sub-agents or session_spawns. I'm not exactly sure which is better or more appropriate, but it seems sub-agent? Or is there a more appropriate way to do this? Although the docs discuss what each is, I didn't really see which was more appropriate or how I would set this up. Any help, or links about this, would be appreciated.

analog gladeBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

balmy graniteBOT
hardy meadowBOT
#

Yep, for what you described, I’d use sub-agents.

The clean mental model is:

  • sub-agent = the pattern/runtime
  • sessions_spawn = the tool the agent uses to launch one
  • /subagents spawn = the human-facing command

So it’s not really “sub-agents vs sessions_spawn”. sessions_spawn is how sub-agents get created.

What I’d recommend for your setup:

  1. Keep one main manager agent on Telegram
  2. Create specialized worker agents with different default models
  3. Let the manager spawn those workers for long/heavy tasks
  4. Have workers announce results back when done
#

That gives you:

  • a responsive main chat
  • per-task model choice
  • cleaner separation of roles

Best pattern for multi-LLM routing

If you only want “offload long work”, a single agent + subagents is enough.

If you want “use the right model for the right job”, use multi-agent + subagents:

  • agents.list[].model for each worker’s default model
  • subagents.allowAgents so the manager can target them
  • optionally subagents.requireAgentId: true so the manager must choose a named worker explicitly

Minimal idea:

#
{
  agents: {
    defaults: {
      subagents: {
        maxSpawnDepth: 1,
        runTimeoutSeconds: 900,
      },
    },
    list: [
      {
        id: "main",
        default: true,
        model: "ollama/your-manager-model",
        subagents: {
          allowAgents: ["research", "writer", "planner"],
#
          requireAgentId: true,
        },
      },
      { id: "research", model: "ollama/your-research-model" },
      { id: "writer", model: "ollama/your-writing-model" },
      { id: "planner", model: "ollama/your-planning-model" },
    ],
  },
}

A couple important gotchas:

  • You probably do not need maxSpawnDepth: 2 unless you want a spawned sub-agent to spawn more workers. For most setups, keep it at 1.
  • Telegram is fine for background sub-agent completions, but persistent thread-bound subagent sessions are currently Discord-only.
  • With Ollama/local models, tool calling can be hit-or-miss. If the manager fails to spawn reliably, that’s often the model, not your config. I’d keep the manager on your most reliable tool-calling model.
nocturne egret
#

That would be most helpful, here's some details -
These are the LLM's I have available on ollama:

  • gemma4:latest
  • qwen3.5:27b
  • qwen3-coder-nest
  • gpt-oss:latest
  • deepseek-r1:14b

These are the tasks I'm looking to accomplish. They don't all need a separate agent, but the chat manager should always be available and able to spawn a new agent for a new task of up to 3 concurrent tasks. I'd also like to keep the 3 most recently used LLM always loaded to reduce load times when possible. If that's better done through ollam config then I can do that. Is there any other info you need?

#

Here's the tasks -

  • Chat Manager - connected to telegram, respond to chats, spawn appropriate agents for the tasks provided
  • SWE / Coding assistant - from scripting help to creation of of fully functional apps
  • News and Market data agent - provide daily reports on macro markets and news related to a watchlist of stocks
  • Day Trading Technical and Trade Analysis - connect to TradingView (api) and TradeStation (MCP) and review and analyze past trades, and watch specific stocks in real time for trade setups.
  • Travel Agent - trip planning, flight and accommodation finder
  • Medical / Health advisor - review labs and health related data and provide solution, look into ongoing research and medical journals for possible solutions
  • Career Coach - keep an eye on job openings, re-write resume to match job description, possibly manage linkedin
hardy meadowBOT
#

I’m checking the Ollama side too so I don’t hand-wave the keep-loaded bit.

#

Yep, that’s enough to sketch a solid setup.

For your goal, I’d use:

  • one main Telegram-facing manager
  • 2 to 3 worker agents
  • native sub-agents for delegation
  • sessions_spawn under the hood, but you mostly think in terms of sub-agents

Recommended shape

  • main: stays chat-available, decides where work goes
  • coder: code / file / repo tasks
  • reasoner: deeper analysis / planning / multi-step thinking
  • fast: lightweight summaries / drafting / triage

For max 3 concurrent tasks, cap both:

#
  • agents.defaults.subagents.maxConcurrent: 3
  • agents.defaults.subagents.maxChildrenPerAgent: 3

Config skeleton

If your Ollama connection already works, keep it. The important part is the agents + subagents section.

{
  models: {
    providers: {
      ollama: {
        baseUrl: "http://127.0.0.1:11434",
        apiKey: "ollama-local",
        api: "ollama", // important: native Ollama API, not /v1
      },
#
    },
  },

  agents: {
    defaults: {
      subagents: {
        maxConcurrent: 3,
        maxChildrenPerAgent: 3,
        runTimeoutSeconds: 1800,
        archiveAfterMinutes: 60,
      },
    },

    list: [
      {
#
        id: "main",
        default: true,
        model: "ollama/<your-most-reliable-tool-calling-model>",
        subagents: {
          allowAgents: ["coder", "reasoner", "fast"],
          requireAgentId: true,
        },
      },

      {
        id: "coder",
        model: "ollama/qwen3-coder-nest",
      },
      {
        id: "reasoner",
#
        model: "ollama/qwen3.5:27b", // or test deepseek-r1:14b here
      },
      {
        id: "fast",
        model: "ollama/gemma4:latest",
      },
    ],
  },
}

The part people miss: teach the manager to delegate

In the main agent’s AGENTS.md, add something like:

#
When a task is more than a quick reply, delegate it with a sub-agent instead of doing it inline.

Keep the main chat responsive.

Use:
- `coder` for code, files, debugging, repo changes
- `reasoner` for multi-step analysis, planning, comparison, decision support
- `fast` for lightweight summaries, drafting, extraction, triage

Never exceed 3 active child tasks at once.

When delegating:
- choose an explicit `agentId`
- give the child a concrete outcome
- wait for completion announcements instead of polling
#
- summarize results back to the user clearly

That policy matters a lot. Without it, the manager may still try to do too much itself.

Ollama side, for “keep 3 warm”

That part is better handled in Ollama, not OpenClaw.

From Ollama’s docs/FAQ:

  • models stay loaded for 5 minutes by default
  • use keep_alive / OLLAMA_KEEP_ALIVE to keep them loaded longer
  • use OLLAMA_MAX_LOADED_MODELS=3 to allow up to 3 loaded at once
  • check what’s resident with ollama ps
#

On macOS, Ollama docs say to set env vars with launchctl, then restart Ollama, for example:

launchctl setenv OLLAMA_KEEP_ALIVE 30m
launchctl setenv OLLAMA_MAX_LOADED_MODELS 3

Small caveat: that’s not quite “pin exactly these 3 forever”. It’s more “allow up to 3 to stay warm”. Exact residency still depends on memory pressure and usage.

Two practical notes

  1. Use native Ollama API, not /v1
    OpenClaw docs are pretty explicit that tool calling is more reliable with api: "ollama" and no /v1.

  2. Manager model matters a lot
    For local-only setups, the manager should be your most reliable tool-caller, not just your favorite chat model. If delegation feels flaky, that’s often the model.

#

Nice, this is enough to make it concrete.

I would not start with one worker per bullet. I’d start with 5 agents total and split later only if one role gets messy.

I’d group your roles like this

  • main
    Telegram-facing chat manager, stays available, delegates everything non-trivial

  • coder
    SWE / scripting / app-building / debugging

  • research
    travel, general research, career drafting, job-search analysis, resume tailoring

  • markets
    daily macro/news watchlists, trade review, technical analysis

#

important: analysis only, not autonomous trading

  • health
    lab review, paper/journal research, summarization
    important: research assistant, not diagnosis/treatment authority

That gives you clean boundaries without turning your setup into agent spaghetti.

Model layout I’d try first

This is the part I’d keep simple:

  • main → ollama/gpt-oss:latest
  • coder → ollama/qwen3-coder-nest
  • research / markets / health → ollama/qwen3.5:27b

That keeps your 3 core hot models pretty sensible:

#
  • gpt-oss:latest
  • qwen3-coder-nest
  • qwen3.5:27b

Then treat:

  • gemma4:latest as optional lightweight draft/summarize worker
  • deepseek-r1:14b as experimental, not core, unless it proves more useful in your actual workflow

Drop-in config shape

{
  models: {
    providers: {
      ollama: {
        baseUrl: "http://127.0.0.1:11434",
#
        apiKey: "ollama-local",
        api: "ollama",
      },
    },
  },

  agents: {
    defaults: {
      subagents: {
        maxConcurrent: 3,
        maxChildrenPerAgent: 3,
        runTimeoutSeconds: 1800,
        archiveAfterMinutes: 60,
      },
    },
#

    list: [
      {
        id: "main",
        default: true,
        model: "ollama/gpt-oss:latest",
        subagents: {
          allowAgents: ["coder", "research", "markets", "health"],
          requireAgentId: true,
        },
      },

      {
        id: "coder",
        model: "ollama/qwen3-coder-nest",
#
      },

      {
        id: "research",
        model: "ollama/qwen3.5:27b",
      },

      {
        id: "markets",
        model: "ollama/qwen3.5:27b",
      },

      {
        id: "health",
        model: "ollama/qwen3.5:27b",
#
      },
    ],
  },
}

What should be sub-agents vs cron

This part matters a lot:

Good fit for sub-agents

  • coding help
  • app building
  • travel planning
  • resume tailoring
  • one-off market/trade analysis
#
  • one-off health/lab/paper review

Better fit for cron / standing orders

  • daily market + news report
  • job opening watch
  • recurring watchlist scans
  • anything that should happen on a schedule without you asking

Not ideal as a long-running sub-agent

  • “watch these stocks in real time forever”

That’s better as:

  • scheduled polling / cron
  • webhook/event-driven integration
  • or an external market-data pipeline that feeds OpenClaw

Main agent prompt policy

#

In main/AGENTS.md, I’d add something like:

You are the Telegram-facing manager.

Stay available for chat. Do not spend long turns doing deep work yourself.

When a task is non-trivial, delegate with a sub-agent.

Routing:
- coder: code, scripts, repos, debugging, app building
- research: travel, web research, job search, resume tailoring, general drafting
- markets: macro/news reports, watchlist summaries, technical review, trade post-mortems
- health: labs, paper review, journal search, health-data summarization

Rules:
#
- never exceed 3 active child tasks
- use explicit agentId when spawning
- give children concrete outcomes
- wait for child completion announcements instead of polling
- summarize results back to the user clearly

Hard limits:
- markets may analyze, but must not place trades or claim certainty
- health may summarize and research, but must not present itself as a doctor or emergency authority
- external profile/application/posting actions require user approval

Two safety/quality tweaks I’d strongly recommend

  • Markets: keep it read-only unless you explicitly, really want execution later
  • Health: make it cite sources and frame output as research/support, not medical advice
#

Those two domains get risky fast if you make them too autonomous.

Ollama side

For the “keep 3 warm” part, yes, that’s Ollama-side, not OpenClaw-side.

Per Ollama docs:

  • OLLAMA_MAX_LOADED_MODELS=3
  • OLLAMA_KEEP_ALIVE=30m (or whatever warm window you want)
  • check residency with ollama ps

One nuance: that gives you “keep up to 3 warm”, not “pin exactly these 3 forever”.

My honest recommendation

Start with exactly this: