#Q: Using LM Studio local LLM's for main & sub agents via different models for different use cases

1 messages · Page 1 of 1 (latest)

thick nymph
#

Hi Krill!

I want to understand something.

Rite now in my present openclaw setup i the model "GLM-4.7-Flash-REAP-23B-A3B" via a local server running through LM Studio.

My config is here: https://pastebin.com/GhNUNDAs

At this moment, only the main agent is active and utilizing GLM-4.7-Flash-REAP-23B-A3B.

My common sense tells me, it would be best if decentralize different tasks to different sub agents based on the nature of the task.
For example: A coding sub agent, A Document reading sub agent, A Personal Assistant sub agent, A Research sub agent.

Suppose i use the model:
Qwen3-Coder-Next-GGUF for coding
Qwen3-VL-8B-Instruct-GGUF for document reading
GLM-4.7-Flash-REAP-23B-A3B for personal assistance
GLM-4.7-Flash-REAP-23B-A3B for research.

My deployment is completely local. I have one pc with the specs RTX 3090 Ti with ~24GB VRAM, i9-12900K, 32GB RAM.

Now, my main question is, can i have my main and sub agents spin up these models through LM Studio automatically as and when required depending on the task i provide it and use models one after another so as to not overload the system resulting in openclaw tasks breaking in the process?

How can i go about doing this in terms of config rewrites, execution and testing if it works?

dark ravenBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

misty sentinelBOT
#

OpenClaw can select different models for main vs sub-agents, but it cannot “spin up / load / unload” models inside LM Studio. It’s just an OpenAI-compatible client: it sends requests to baseUrl, and LM Studio decides what’s actually loaded/servable.

So the whole plan works only if your LM Studio server can serve multiple model IDs (or you run multiple servers), and you still need to control concurrency so you don’t hammer your single 3090 Ti.

What you can do (supported patterns)

From the docs/FAQ, the supported ways to “use different models for different tasks” are:

The “don’t overload my GPU” part

Right now your config has:

  • agents.defaults.maxConcurrent: 4
  • agents.defaults.subagents.maxConcurrent: 8

On a single local GPU, that’s basically asking for contention/timeouts. If your goal is “one after another”, set both to 1 (at least while testing):

#
agents: {
  defaults: {
    maxConcurrent: 1,
    subagents: {
      maxConcurrent: 1,
      // optionally set a default subagent model:
      // model: "lmstudio/<your-coder-model>",
    },
  },
}

That makes OpenClaw schedule less in parallel. (LM Studio still needs to handle the requests, but this prevents OpenClaw from flooding it.)

Config changes you’ll need (based on your paste)

Your config currently only “allows” one model (glm-4.7…) in two places:

#
  1. Provider catalog:
  • models.providers.lmstudio.models: [ { id: "glm-4.7..." } ]
  1. Model allowlist (important!):
  • agents.defaults.models only contains lmstudio/glm-4.7...

To be able to select the other models (via /model or subagent --model), add them to both:

  • models.providers.lmstudio.models (so OpenClaw knows they exist for that provider)
  • agents.defaults.models (so they are allowed to be chosen)

Also: if you want “vision/doc reading” to be automatic for images, consider setting:

The LM Studio reality check (this decides everything)

OpenClaw can only “switch” models if LM Studio can actually serve them.

#

On your LM Studio machine, check:

  • http://192.168.1.13:9000/v1/models

If that endpoint lists multiple models and LM Studio respects the model field per request, then OpenClaw can genuinely switch between them.

If it only ever lists one model (the currently loaded one), then OpenClaw cannot auto-switch models without you changing what LM Studio has loaded (or you running separate servers/ports per model).

Docs to compare against (local models expectations): https://docs.openclaw.ai/gateway/local-models

Quick testing loop

  1. Add the extra LM Studio model entries + allowlist entries.
  2. Reduce concurrency to 1/1.
  3. Run:
    • openclaw models status
    • openclaw models list --local
  4. In chat:
    • /model list → pick the coder model → try a small coding prompt
#
  • /subagents spawn main "do X" --model lmstudio/<coder> (or whatever agentId you use)

If you tell me what GET /v1/models returns on your LM Studio server (does it show multiple models or just one?), I can tell you which route is viable: “true switching” vs “manual switching / multiple ports / separate agents.”

thick nymph
#

LM Studio JIT + Auto-Evict Setup for Single-GPU Multi-Model

Enable in Developer Tab → Server Settings:

  • JIT Loading: ON → /v1/models lists ALL downloaded models; inference requests auto-load specified model
  • Auto-Evict: ON → max 1 JIT model in VRAM; new request evicts previous automatically
  • Idle TTL: 10-30 min default (set 300s for testing); resets on each request

Runtime Flow (RTX 3090 Ti / 24GB VRAM):

Main agent → requests GLM     → JIT loads GLM (~20GB)
Coding sub-agent → requests Coder → evicts GLM, loads Coder (~8GB)
Idle 10min → TTL fires        → VRAM fully freed
Vision sub-agent → requests VL → JIT loads VL (~10GB)

OpenClaw Config:

models.providers.lmstudio.baseUrl: "http://192.168.1.13:9000/v1"
agents.defaults.maxConcurrent: 1
agents.defaults.subagents.maxConcurrent: 1

No other OpenClaw changes needed — JIT is transparent to it.

Quick Verify:

curl http://192.168.1.13:9000/v1/models  # lists all downloaded
curl -X POST .../v1/chat/completions -d '{"model":"glm-4.7...","messages":[...]}'  # triggers JIT load

Tips:

  • Use Q4_K_M quants to fit VRAM
  • concurrency=1 prevents parallel VRAM conflicts
  • Active generation won't be interrupted by new JIT request (queues)
  • Per-request TTL override: add "ttl": 600 to API payload

"Full notes #1: https://pastebin.com/SLyaP2QY"

Task:

Provide an instructional guide on how to update the config to add more agents and sub agents and designate their own models which can be called upon by said agents and sub agents and lm studio can spin them up as described by the LM STUDIO capabilities mentioned in the full notes #1.

misty sentinelBOT
#

If LM Studio really is doing JIT loading + auto-evict (1 model in VRAM) + idle TTL, then yes: OpenClaw can “switch models on demand” by simply requesting a different model string — but OpenClaw still needs to know/allow those model refs, and you should cap concurrency so it doesn’t send overlapping requests.

Below is a practical, “do this → verify → expand” guide, based on your current config: https://pastebin.com/GhNUNDAs
(Also: OpenClaw config is JSON5, not YAML. If you paste YAML keys, the gateway can refuse to start; config is strict.)


0) Prereq: confirm LM Studio can actually serve multiple model IDs

On the LM Studio machine (or from the gateway host), open:

  • http://192.168.1.13:9000/v1/models

What you want to see: multiple model IDs listed (not just the currently loaded one).
If you only ever see one model there, OpenClaw cannot truly auto-switch models without you manually changing what LM Studio has loaded.

OpenClaw local-models expectations: https://docs.openclaw.ai/gateway/local-models

#

1) Add all the LM Studio models to OpenClaw’s provider catalog

Right now you only have one entry in:

  • models.providers.lmstudio.models

Add the other models you want OpenClaw to be able to select. Example (keep your existing baseUrl/apiKey/api; add more models entries):

models: {
  mode: "merge",
  providers: {
    lmstudio: {
      baseUrl: "http://192.168.1.13:9000/v1",
      apiKey: "__OPENCLAW_REDACTED__",
#
      api: "openai-completions",

      models: [
        { id: "glm-4.7-flash-reap-23b-a3b", input: ["text"], /* ... */ },

        { id: "qwen3-coder-next-gguf", input: ["text"] /* ... */ },

        // vision-capable model: mark image support if it’s true in LM Studio
        { id: "qwen3-vl-8b-instruct-gguf", input: ["text", "image"] /* ... */ },
      ],
    },
  },
},

Notes:

#
  • OpenClaw normalizes model refs to lowercase; keep IDs lowercase to avoid confusion.
  • Most of the fields (reasoning, contextWindow, maxTokens, etc.) are optional for custom providers, but matching what you already do is fine.

Related docs: https://docs.openclaw.ai/concepts/models


2) Add them to agents.defaults.models (this is the “allowlist”)

In your config, agents.defaults.models is present, so it acts as an allowlist. If a model isn’t listed there, /model (and model overrides) will fail with “Model … is not allowed”.

Add entries like:

agents: {
  defaults: {
    models: {
#
      "lmstudio/glm-4.7-flash-reap-23b-a3b": { alias: "GLM" },
      "lmstudio/qwen3-coder-next-gguf": { alias: "Coder" },
      "lmstudio/qwen3-vl-8b-instruct-gguf": { alias: "Vision" },
    },
  },
},

Docs callout: https://docs.openclaw.ai/concepts/models (section “Model is not allowed”)


3) Make it not overload the GPU: set concurrency to 1 (at least initially)

With JIT + auto-evict, you typically want no parallel LLM calls.

In your config you currently have maxConcurrent: 4 and subagents.maxConcurrent: 8. For a single 24GB GPU, start with:

#

4) Decide how sub-agents pick models

Option A (simplest): keep one “main agent”, choose subagent model by default + override when needed

Set a default model for spawned sub-agents:

agents: {
  defaults: {
    model: { primary: "lmstudio/glm-4.7-flash-reap-23b-a3b" },

    subagents: {
      model: "lmstudio/qwen3-coder-next-gguf", // string OR { primary, fallbacks }
      maxConcurrent: 1,
    },
  },
},
#

Then:

  • coding tasks naturally go to subagents (default coder model)
  • research stays on main (GLM)
  • for one-off vision runs, spawn with an explicit model override

Docs: subagents inherit caller model unless agents.defaults.subagents.model is set (or you pass an explicit override).
https://docs.openclaw.ai/tools/subagents

Option B (nice quality-of-life): use imageModel so vision is automatic for images

If your main model is text-only, set:

agents: {
  defaults: {
    model: { primary: "lmstudio/glm-4.7-flash-reap-23b-a3b" },
    imageModel: { primary: "lmstudio/qwen3-vl-8b-instruct-gguf" },
#
  },
},

Then when you send an image/screenshot, OpenClaw can route to the image model automatically (where applicable).
Docs: https://docs.openclaw.ai/concepts/models


5) If you truly want “multiple agents” (coding agent, research agent, etc.)

This is multi-agent routing: separate agentIds with separate workspaces/sessions. It’s heavier than subagents, but it’s the cleanest separation.

High-level steps:

  1. Create the agent workspaces (recommended):
    • openclaw agents add coding
    • openclaw agents add research
#
  1. Add an agents.list with per-agent model overrides.
    • Important: once you define agents.list, include your main agent entry too (and mark it default: true), otherwise you can accidentally “lose” the implicit main agent behavior.
  2. If you want these agents to receive inbound messages directly, add bindings. If you only want to spawn them internally, bindings are optional.

Multi-agent docs: https://docs.openclaw.ai/concepts/multi-agent

Also relevant: subagents can target another agentId only if allowed (subagents.allowAgents).
https://docs.openclaw.ai/tools/subagents


6) Testing checklist (fast, deterministic)

  1. LM Studio: confirm /v1/models lists all models.
  2. OpenClaw:
    • openclaw models status
    • openclaw models list --local
  3. In chat: