I have a Python agent (Pydantic AI + FastAPI) that handles data queries against database. It does entity resolution, disambiguation, and response formatting, all internally using Claude as the LLM via Pydantic AI.
To integrate it with OpenClaw, I wrapped it as a fake OpenAI-compatible API, it exposes /v1/chat/completions, accepts the standard ChatCompletionRequest, runs my agent internally, and returns a ChatCompletionResponse. From OpenClaw's perspective, it looks like any other LLM model.
In my openclaw.json, I register it as a model provider:
Then I have a subagent configured to use this model:
{
"id": "fake-model",
"model": { "primary": "fake/model" },
"workspace": "/data/workspace-fake-model"
}
The workspace SOUL.md and AGENTS.md tell OpenClaw's agent layer to be a pure passthrough, don't reformulate, don't add tools, just relay the response from the "model" as-is, since the real intelligence is in my backend.
Why I did it this way:
- I don't want to expose my agent's internals (prompts, tools, pipeline logic) — it's all encapsulated behind the OpenAI endpoint.
- I can use any framework/architecture inside (Pydantic AI today, something else tomorrow) without changing the OpenClaw integration.
- The orchestrator agent delegates to it like any other subagent.
The problem:
Every time the orchestrator delegates to my subagent, OpenClaw creates a new sub-session visible in the Control UI. For queries that require disambiguation (user asks about "something" → multiple matches → user picks one → re-query), that's 2+ sub-sessions per question. This clutters the interface and makes it hard to track conversations.
Is there a built-in OpenClaw mechanism for integrating an external agent/service that isn't an LLM? Something that avoids the sub-session overhead while still letting the orchestrator delegate to it? Or is the fake-model approach the intended way to do this?