#fresh install

1 messages · Page 1 of 1 (latest)

kind lintel
#

installing openclaw fresh on my ubuntu, how do I set the config to point at llama-server

round spindleBOT
velvet valleyBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

round spindleBOT
void vesselBOT
#

Let me verify the docs path so I don’t hand you made-up config keys.

#

I’m checking the provider docs for the exact config shape for llama-server specifically.

#

Yep, for llama-server use a custom OpenAI-compatible provider in ~/.openclaw/openclaw.json (JSON5, not YAML):

{
  agents: {
    defaults: {
      model: { primary: "llamacpp/your-model-id" },
    },
  },
  models: {
    mode: "merge",
    providers: {
      llamacpp: {
        baseUrl: "http://127.0.0.1:8080/v1",
        api: "openai-completions",
        models: [
#
          {
            id: "your-model-id",
            name: "Your model",
            reasoning: false,
            input: ["text"],
            contextWindow: 128000,
            maxTokens: 8192,
          },
        ],
      },
    },
  },
}

Then:

#
openclaw gateway restart
openclaw models status
openclaw status

Replace:

  • 8080 with your llama-server port
  • your-model-id with whatever this returns:
curl http://127.0.0.1:8080/v1/models

If your backend only accepts string content, add:

#
compat: { requiresStringContent: true }

inside that model entry.

Useful docs:

Also, what does openclaw --version show? If you paste that plus your llama-server port/model id, I can give you the exact snippet.

#

Two things:

  1. Your gateway token is visible in that paste. Please rotate it after this.
  2. primary: "llama-server" is not enough. OpenClaw wants a provider/model ref, plus a models.providers entry that points to your local server.

Use this shape instead:

{
  agents: {
    defaults: {
      workspace: "/home/matias/.openclaw/workspace",
      model: {
        primary: "llama-server/your-model-id"
      }
      // remove agents.defaults.models for now
#
    }
  },

  models: {
    mode: "merge",
    providers: {
      "llama-server": {
        baseUrl: "http://127.0.0.1:8080/v1",
        api: "openai-completions",
        models: [
          {
            id: "your-model-id",
            name: "Local llama.cpp model",
            reasoning: false,
            input: ["text"],
#
            contextWindow: 32768,
            maxTokens: 4096
          }
        ]
      }
    }
  }
}

Then restart + verify:

openclaw gateway restart
openclaw models status
#
openclaw status

And get the real model id from your server with:

curl http://127.0.0.1:8080/v1/models

So if that returns something like qwen2.5-coder-7b-instruct, your primary becomes:

"primary": "llama-server/qwen2.5-coder-7b-instruct"

If llama-server is picky about content format, add this inside the model entry:

kind lintel
#

openclaw command not found

void vesselBOT
#

I’m checking the install docs so I don’t send you down the wrong path.

#

Ah, that usually means the npm global bin dir isn’t on your PATH.

Try this on Ubuntu:

npm prefix -g
echo "$PATH"

If the prefix’s bin folder is missing from PATH, add it to ~/.bashrc:

echo 'export PATH="$(npm prefix -g)/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
hash -r
#

Then check:

openclaw --version

If you hit permission errors on global npm installs, use a user-writable prefix:

mkdir -p "$HOME/.npm-global"
npm config set prefix "$HOME/.npm-global"
echo 'export PATH="$HOME/.npm-global/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
hash -r

Then reinstall if needed:

#
npm install -g openclaw@latest

If you installed from a repo checkout instead of globally, use:

pnpm openclaw ...

instead of openclaw ....

Docs: https://docs.openclaw.ai/install/node

Also, small security thing, you pasted your gateway token above in public, so please rotate that after you’re back in.

kind lintel
#

⚠️ Context limit exceeded. I've reset our conversation to start fresh - please try again.

To prevent this, increase your compaction buffer by setting agents.defaults.compaction.reserveTokensFloor to 20000 or higher in your config.

void vesselBOT
#

I’m checking the compaction docs so I can give you the exact config shape.

#

Yep, that means auto-compaction didn’t have enough headroom.

Add this to ~/.openclaw/openclaw.json:

{
  agents: {
    defaults: {
      compaction: {
        mode: "safeguard",
        reserveTokensFloor: 20000,
        notifyUser: true
      }
    }
  }
}
#

Then restart the gateway.

Also, since you’re using llama-server, make sure your model entry has a realistic context size, for example:

contextWindow: 32768,
maxTokens: 4096

If the local model is only 8k or 16k context, OpenClaw will hit this pretty easily, especially with tools.

Quick recovery in chat:

  • /compact to force compaction now
  • /new for a clean session

Docs:

kind lintel
#

I am running qwen3.6-35b on maximum 262k context, how much should I set as contextWindow and maxTokens and the restokensfloor

void vesselBOT
#

Let me check the token docs for the exact meaning of each field before I give you a number.

#

For that setup, I’d start with:

  • contextWindow: 262144
  • maxTokens: 8192
  • reserveTokensFloor: 24000

And, if you want a safer runtime cap, also add:

  • contextTokens: 220000

So the model block would look like:

{
  agents: {
    defaults: {
#
      compaction: {
        mode: "safeguard",
        reserveTokensFloor: 24000
      }
    }
  },
  models: {
    providers: {
      "llama-server": {
        baseUrl: "http://127.0.0.1:8080/v1",
        api: "openai-completions",
        models: [
          {
            id: "qwen3.6-35b",
            name: "Qwen 3.6 35B",
#
            reasoning: false,
            input: ["text"],
            contextWindow: 262144,
            contextTokens: 220000, // optional, recommended
            maxTokens: 8192
          }
        ]
      }
    }
  }
}

Why:

  • contextWindow should match the model/server’s real max context.
#
  • maxTokens is output cap, not context size. Setting it to 262k would be kind of insane for normal agent turns.
  • reserveTokensFloor at 24000 gives compaction some breathing room. 20000 is the minimum I’d use, 24000 is a nicer default.

If it still feels unstable on long threads, drop contextTokens further, like 200000.

If you want, I can turn your whole config into a ready-to-paste final version.

kind lintel
#

how do I set embedding for openclaw on llama-server? I am running qwen3.6-35B as the main model, how do I add a qwen3-embedding model?

void vesselBOT
#

I’m checking the memory docs for the exact embeddings config shape so I don’t invent keys.

#

Yep, embeddings are configured separately from your main chat model.

For llama-server, use agents.defaults.memorySearch, not models.providers:

{
  agents: {
    defaults: {
      memorySearch: {
        enabled: true,
        provider: "openai",
        model: "qwen3-embedding", // exact model id exposed by your llama-server
        fallback: "none",
        remote: {
          baseUrl: "http://127.0.0.1:8080/v1"
          // apiKey: "..." only if your llama-server requires one
#
        }
      }
    }
  }
}

A few important bits:

  • Keep your main model as qwen3.6-35b in models.providers...
  • Set the embedding model here under memorySearch.model
  • provider: "openai" is correct for a local OpenAI-compatible /v1/embeddings endpoint

Then verify / rebuild the memory index:

#
openclaw memory status --deep
openclaw memory index --force

or:

openclaw memory status --deep --index

If you change the embedding model later, reindex again.

So the split is basically:

  • chat/completions → models.providers["llama-server"]
  • embeddings → agents.defaults.memorySearch