#Using a custom agent behind an OpenAI-compatible endpoint as an OpenClaw "model"

1 messages · Page 1 of 1 (latest)

vestal dagger
#

I have a Python agent (Pydantic AI + FastAPI) that handles data queries against database. It does entity resolution, disambiguation, and response formatting, all internally using Claude as the LLM via Pydantic AI.
To integrate it with OpenClaw, I wrapped it as a fake OpenAI-compatible API, it exposes /v1/chat/completions, accepts the standard ChatCompletionRequest, runs my agent internally, and returns a ChatCompletionResponse. From OpenClaw's perspective, it looks like any other LLM model.
In my openclaw.json, I register it as a model provider:
Then I have a subagent configured to use this model:

{
  "id": "fake-model",
  "model": { "primary": "fake/model" },
  "workspace": "/data/workspace-fake-model"
}

The workspace SOUL.md and AGENTS.md tell OpenClaw's agent layer to be a pure passthrough, don't reformulate, don't add tools, just relay the response from the "model" as-is, since the real intelligence is in my backend.
Why I did it this way:

  • I don't want to expose my agent's internals (prompts, tools, pipeline logic) — it's all encapsulated behind the OpenAI endpoint.
  • I can use any framework/architecture inside (Pydantic AI today, something else tomorrow) without changing the OpenClaw integration.
  • The orchestrator agent delegates to it like any other subagent.
    The problem:
    Every time the orchestrator delegates to my subagent, OpenClaw creates a new sub-session visible in the Control UI. For queries that require disambiguation (user asks about "something" → multiple matches → user picks one → re-query), that's 2+ sub-sessions per question. This clutters the interface and makes it hard to track conversations.
    Is there a built-in OpenClaw mechanism for integrating an external agent/service that isn't an LLM? Something that avoids the sub-session overhead while still letting the orchestrator delegate to it? Or is the fake-model approach the intended way to do this?
cursive gorgeBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

spare orchidBOT
gloomy cedarBOT
#

Short version, the clutter is expected.

Sub-agents always run in their own session, the docs explicitly describe them as agent:<id>:subagent:<uuid>, so every delegation creates a child session. There is not currently a built-in “hidden delegate call in the same session” primitive.

For your setup, I’d split it like this:

  • If your Python service is really a domain capability (DB query + disambiguation + formatting), make it a tool plugin instead of a fake model. Then the main OpenClaw session stays single, your backend stays a black box, and no extra Control UI sub-sessions get created.
  • If your service truly owns its own native runtime/session model, an agent harness plugin is the closer fit, but that surface is experimental and aimed at trusted native runtimes.
  • If you keep fake-model + subagent, that is a valid pattern, but the extra sessions are by design. The only mitigation there is reusing one persistent bound child session, not spawning a fresh one each time.

So, yes, your fake-model approach is reasonable for “custom provider,” but it is not the no-subsession integration path.
If you want “delegate to this external thing, but keep one visible conversation,” a tool/plugin is usually the intended fit.

Relevant docs:

#

If you want, paste openclaw --version and I can sketch the cleanest plugin shape for your FastAPI service.

vestal dagger
#

The problem is that the queries work based on the user that is talking through whatsapp, at first step user always must link its number to the userid of the org he belongs, so our fake model can isolate user data source it's going to query to, so this structure might make turning it into a tool worse, doesnt it?

gloomy cedarBOT
#

Not necessarily, actually that part is one of the stronger arguments for a tool.

The key is, with a plugin tool you should not make the LLM carry the WhatsApp identity in prompt text. The tool can read trusted runtime context and inject it into your backend call itself.

So the flow becomes:

  • user sends message from WhatsApp
  • OpenClaw tool gets trusted context like sender/session/channel
  • tool calls your FastAPI with:
    • user text
    • trusted sender id / channel info
    • session id
    • optional disambiguation token from prior step
  • your backend resolves whatsapp-number -> org user, keeps state server-side, and returns either:
    • final answer, or
    • “ask this clarification question” + opaque token
#

That means:

  • no extra sub-sessions
  • identity stays outside the prompt
  • your backend still owns linkage + data isolation
  • disambiguation can stay stateful on your side

So I’d frame the tradeoff like this:

  • Fake model + subagent

    • best when you want your backend to own the whole turn end-to-end
    • but child sessions are expected
  • Tool plugin

    • best when you want one visible OpenClaw conversation
    • and it is actually better for trusted caller identity / tenancy lookup
#

The real downside of the tool path is different: the main OpenClaw agent is still in the loop, so if you need strict verbatim passthrough every time, that is a bit weaker than the fake-model approach.

So the honest answer is:

  • for tenant isolation by WhatsApp sender, tool is not worse
  • for perfect passthrough of backend wording, fake-model is stronger
  • if you want both “backend fully owns the dialog” and “no child session clutter”, OpenClaw does not really have a first-class invisible-inline-delegate primitive today

If you want, I can sketch the exact plugin shape I’d use for your case, especially the “clarification token” pattern. Docs: https://docs.openclaw.ai/plugins/building-plugins https://docs.openclaw.ai/tools/subagents

vestal dagger
#

yes, sketch it for me so I can see if that would work

gloomy cedarBOT
#

Yep, I’d sketch it like this.

1) OpenClaw side: one tool, no subagent

Expose a single tool to the main agent, something like org_query.

Tool params visible to the LLM:

{
  "userText": "string",
  "stateToken": "string?"
}

That is intentionally small.
No WhatsApp number, no org id, no tenant id in tool params.

#

2) Tool runtime injects trusted caller context

Inside the plugin tool, use the trusted context from OpenClaw, then call your FastAPI backend with both:

  • LLM-provided input
  • trusted runtime metadata

Rough shape:

// sketch, not copy-paste exact
api.registerTool((ctx) => ({
  name: "org_query",
  description: "Query org data for the current user",
  parameters: Type.Object({
    userText: Type.String(),
#
    stateToken: Type.Optional(Type.String()),
  }),
  async execute(_id, params) {
    const res = await fetch("https://your-backend/query", {
      method: "POST",
      headers: { "content-type": "application/json" },
      body: JSON.stringify({
        userText: params.userText,
        stateToken: params.stateToken ?? null,

        // trusted OpenClaw context
        senderId: ctx.requesterSenderId,
        channel: ctx.messageChannel,
        sessionId: ctx.sessionId,
        sessionKey: ctx.sessionKey,
#
        deliveryContext: ctx.deliveryContext,
      }),
    });

    const data = await res.json();

    return {
      content: [
        {
          type: "text",
          text: JSON.stringify(data),
        },
      ],
    };
  },
#
}));

The important bit is ctx.requesterSenderId, ctx.messageChannel, ctx.sessionId.

3) Backend contract

Have your FastAPI service return a small typed result, for example:

{
  "kind": "answer | link_required | disambiguation_required | error",
  "text": "user-visible text",
  "stateToken": "opaque-token-or-null"
}
#

Examples:

First time, user not linked

{
  "kind": "link_required",
  "text": "Before I can query your data, please send your employee ID.",
  "stateToken": "st_abc123"
}

Ambiguous result

{
  "kind": "disambiguation_required",
  "text": "I found 3 matches for ACME. Did you mean ACME Brasil, ACME Logistics, or ACME Health?",
#
  "stateToken": "st_def456"
}

Final answer

{
  "kind": "answer",
  "text": "ACME Brasil has 42 open invoices totaling R$ 91,300.",
  "stateToken": null
}

4) Main agent instruction

In the main agent AGENTS.md, give one narrow rule:

#
  • when org_query returns kind=textual-response, send text verbatim
  • if stateToken exists, keep using it on the next org_query call
  • do not summarize or reformat the tool’s text

That preserves your “backend owns wording” goal pretty well.

5) Why this fits your WhatsApp identity case

This is actually a good fit because:

  • tenant resolution happens from trusted sender metadata
  • user/org linkage stays in your backend
  • disambiguation state stays in your backend
  • OpenClaw keeps one visible session
  • no fake model, no child sub-sessions

6) Mental model

#

So the conversation becomes:

  1. User asks in WhatsApp
  2. Main agent calls org_query(userText)
  3. Backend maps WhatsApp sender to org user
  4. Backend returns either:
    • link prompt
    • clarification prompt
    • final answer
  5. Main agent relays that text
  6. Next user reply calls org_query(userText, stateToken)

Same chat, same session, no clutter.

If you want, next I can sketch the FastAPI request/response schema for this pattern too. Docs: https://docs.openclaw.ai/plugins/building-plugins

vestal dagger
#

but backend would be external and messages from user MUST be repassed exactly as it was inputed by the user

gloomy cedarBOT
#

Yep, that changes the design a bit.

If exact user text is a hard requirement, then I would not let the LLM pass userText into the tool at all. That part would be too weak.

Better sketch for your requirement

Make the tool params only something like:

{
  "stateToken": "string?"
}

Then inside the plugin:

  1. get trusted context (ctx.sessionId, ctx.sessionKey, ctx.requesterSenderId, ctx.messageChannel)
#
  1. read the latest user turn from the current OpenClaw session transcript
  2. forward that exact text to your external backend
  3. include trusted sender/channel/session metadata alongside it

So the LLM only decides when to call the tool, not what text gets forwarded.

Rough shape

api.registerTool((ctx) => ({
  name: "org_query",
  description: "Handle org-specific user queries using the external backend",
  parameters: Type.Object({
    stateToken: Type.Optional(Type.String()),
  }),
  async execute(_id, params) {
#
    const cfg = ctx.runtimeConfig ?? ctx.config;
    const sessionFile = api.runtime.agent.session.resolveSessionFilePath(cfg, ctx.sessionId!);

    const rawUserText = await readLatestUserMessageFromJsonl(sessionFile);

    const res = await fetch("https://your-backend/query", {
      method: "POST",
      headers: { "content-type": "application/json" },
      body: JSON.stringify({
        rawUserText,                   // exact text from transcript
        stateToken: params.stateToken ?? null,
        senderId: ctx.requesterSenderId,
        channel: ctx.messageChannel,
        sessionId: ctx.sessionId,
        sessionKey: ctx.sessionKey,
#
      }),
    });

    return {
      content: [{ type: "text", text: await res.text() }],
    };
  },
}));

Why this is better

  • backend stays external
  • backend gets the exact message OpenClaw received
  • no LLM paraphrase in the forwarded payload
  • no extra sub-sessions
#
  • WhatsApp sender mapping still uses trusted metadata

Important nuance

There’s one subtle boundary here:

  • yes: you can preserve the exact message text as OpenClaw received it from the channel adapter
  • no: this is not a guarantee of raw WhatsApp wire bytes if the channel layer already normalized something

If you need truly transport-level raw payload fidelity, that stops being a normal tool/plugin problem and starts looking more like a channel-level integration.

So, for your case, I’d update the recommendation to:

  • tool/plugin can still work
  • but it should be a raw-turn relay tool, not an LLM-populated userText tool

Docs worth checking:

vestal dagger
#

what ifn I just use the way I use the fake llm but as the default and only model?

gloomy cedarBOT
#

Yes, honestly, if this bot is basically a thin shell over your backend, that may be the cleanest option.

If you make fake/model the primary model for the agent that handles this channel, then:

  • no subagent spawn
  • no Control UI child-session clutter
  • user turns go straight through the normal provider/model path
  • your backend sees the conversation in the messages[] payload and can read the latest user turn directly

For your “must forward the user message exactly” requirement, this is actually a better fit than the tool approach.

What I’d recommend

Not “gateway-wide default for everything” unless you really want that.

Instead:

#
  • create a dedicated agent
  • set that agent’s model.primary to fake/model
  • bind the relevant WhatsApp/account/channel to that agent

That gives you one clean agent whose whole brain is your backend.

Why this is probably better for your case

Your backend already owns:

  • linking WhatsApp sender -> org user
  • disambiguation flow
  • response formatting
  • hidden prompts/tools/pipeline

So letting OpenClaw treat it as the main model is pretty natural.

#

Main caveat

Your backend is now effectively the main assistant runtime for that agent, so it has to tolerate the normal OpenClaw wrapper:

  • system prompt / workspace instructions
  • full transcript in messages
  • possible tool schemas, unless you disable tools

So if you want pure passthrough, I’d also keep that agent very stripped down:

  • minimal AGENTS.md / SOUL.md
  • no extra tools, or a very tight allowlist
  • no fallback model that changes behavior unexpectedly

Short answer

So yes, using the fake LLM as the only model for that agent is a very reasonable architecture here, and probably simpler than forcing this into a tool if exact message forwarding matters.