#rtx 4060 gpu intel i7 cpu and 16gb ram - local model possibility

1 messages · Page 1 of 1 (latest)

earnest aspen
#

After some experience with a cloud model on RPI 5 , I wanted to try out a locally running model.
So I installed Ubuntu (on WSL) on a laptop with RTX 4060 GPU and intel i7 CPU with 16GB ram.
Ollama can run a 3b or 4b model (qwen3, qwen3.5) quite well on this laptop.
But when I run OpenClaw with same model it can never answer questions. I believe this is because of the context amount that OpenClaw requires. However, I read many people running similar installations without an issue. There are also people running this on a very low setup like Jetson Orin nano and Apple Mac Mini.
So I believe I am doing something wrong , using a wrong coniguration or having optimization issues. Can anyone with a similar setup (especially 16GB ram and on WSL) share their experience ? Maybe a config file that I can compare with mine ?

If I turn to a cloud model, OpenClaw can answer on the same laptop. So this is not an issue of OpenClaw accessing to Ollama. Both OpenClaw and Ollama run on the same WSL instance.
If this is not possible with WSL , I can also try a native linux installation on the same laptop. But before trying that , I really would like to hear some experiences.

Btw, I sometimes see the following line when starting up through CLI.
"model requires more system memory than is available (8.6 GiB)"

free -h shows:
12GB RAM available in WSL.

modest patrolBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

atomic capeBOT
#

WSL + RTX 4060 + Ollama can work, but the failure mode you describe usually comes down to one (or both) of these:

  1. OpenClaw is pointed at Ollama’s /v1 (OpenAI-compatible) endpoint
    That mode is notorious for “no reply / raw tool JSON / weird hangs”, because tool-calling isn’t reliable there. OpenClaw’s Ollama integration is meant to hit the native API.

  2. Context/KV-cache memory blow-up (and/or WSL memory cap)
    That model requires more system memory than is available (8.6 GiB) line is basically Ollama saying “with the ctx size I’m trying to allocate, I can’t fit”. OpenClaw often drives a larger effective context than a quick ollama run … test.

Quick checks / fixes

A) Make sure you’re using native Ollama API (no /v1)
In config, the good shape is:

  • baseUrl: "http://127.0.0.1:11434" (no /v1)
  • api: "ollama"

Docs: https://docs.openclaw.ai/providers/ollama (see the big warning at the top)

#

B) Force a smaller context window for the model (to reduce RAM/KV cache)
If your model advertises a huge context length, OpenClaw may request it and Ollama will try to allocate a big KV cache. Try overriding to something like 4096 or 8192 in your OpenClaw model entry (contextWindow), and keep maxTokens modest too.

C) Verify WSL isn’t capped lower than you think
It’s very common for WSL to be memory-limited via Windows .wslconfig, even if free -h looks “kinda ok”. If Ollama thinks only ~8.6GiB is available, that’s a big hint.

What I need from you to diagnose precisely

Paste these outputs (redact tokens if any):

openclaw --version
openclaw models status
openclaw status --all

And then run:

#
openclaw logs --follow

…send one message that “never gets answered”, and paste the relevant log chunk around the Ollama request/response (that will tell us immediately whether it’s /v1 mode, ctx-size allocation, timeout, tool-call parsing, etc.).

If you also paste just your models.providers.ollama block (and the agents.defaults.model part), I can point out the exact misconfig.

earnest aspen
#

ok. thanks for the long explanation. let me start it up again and then I'll send what you asked for

atomic capeBOT
#

Cool — when you restart, the fastest “one paste that contains almost everything” is:

openclaw status --all
openclaw models status

Then:

  1. start a log tail:
openclaw logs --follow
  1. send one message that normally gets “no reply”
  2. paste the log section starting a few lines before the Ollama request and ending at the first error/timeout (or the end of the run)
#

status --all already redacts secrets, so it’s generally safe to share.

earnest aspen
#

let me first clarify; I am not using the v1 endpoint. As you'll see in the config: "models": {
"providers": {
"ollama": {
"api": "ollama",
"apiKey": "ollama-local",
"baseUrl": "http://127.0.0.1:11434",
"models": [
{
"contextWindow": 262144,
"cost": {
"cacheRead": 0,
"cacheWrite": 0,
"input": 0,
"output": 0
},
"id": "qwen3.5:4b",
"input": [
"text",
"image"
],
"maxTokens": 128000,
"name": "qwen3.5:4b",
"reasoning": true ----------- but context window is too high. maybe I should share all of my config so you can tell me which parts should be corrected.

atomic capeBOT
#

Yeah, that config snippet explains the whole thing: you’re telling OpenClaw/Ollama that this 4B model should run with a 262,144 token context (contextWindow: 262144). In Ollama that effectively drives options.num_ctx, which explodes the KV cache size → RAM allocation failure → the “requires more system memory than is available (8.6 GiB)” message → and OpenClaw looks like it “never answers”.

On a 16GB machine (and WSL often effectively less), you generally need to start way lower.

Fix (try this first)

Drop to something realistic, e.g.:

{
  models: {
    providers: {
      ollama: {
        api: "ollama",
        baseUrl: "http://127.0.0.1:11434",
        apiKey: "ollama-local",
        models: [
#
          {
            id: "qwen3.5:4b",
            name: "qwen3.5:4b",
            input: ["text"],        // only add "image" if you truly have a vision-capable model
            reasoning: false,       // optional: start simple
            contextWindow: 4096,    // try 4096 first, then 8192 if stable
            maxTokens: 1024         // or 2048; 128000 is not realistic here
          }
        ]
      }
    }
  }
}

Then restart the gateway (openclaw gateway restart) and retry.

#

Notes

  • maxTokens: 128000 is also a bad idea here (even if it doesn’t allocate KV cache, it invites super long generations/timeouts).
  • input: ["text","image"] should only be set if the model actually supports images; otherwise you can get confusing behavior later.

If it still misbehaves after lowering ctx

Paste:

openclaw --version
openclaw models status

and the relevant chunk from openclaw logs --follow when you send a test prompt.

You don’t need to paste your whole config yet — this one change is the big smoking gun.

earnest aspen
#

ok but when I set context window to 4096 I get an error when sending anything to the agent: Agent failed before reply: Model context window too small (4096 tokens) Minimum is 16000.

atomic capeBOT
#

That error is expected: OpenClaw has a runtime hard minimum context window of 16,000 tokens. Anything below that gets blocked before it even tries to call Ollama.

So the “working range” for your config is basically:

  • contextWindow >= 16000 (practically: 16384)
  • and keep maxTokens sane (otherwise you’ll just trigger long generations / timeouts)

Try this next:

{
  models: {
    providers: {
      ollama: {
        api: "ollama",
        baseUrl: "http://127.0.0.1:11434",
#
        apiKey: "ollama-local",
        models: [
          {
            id: "qwen3.5:4b",
            name: "qwen3.5:4b",
            input: ["text"],
            reasoning: false,
            contextWindow: 16384,
            maxTokens: 1024
          }
        ]
      }
    }
  }
}
#

Restart the gateway after editing.

If Ollama still throws the “requires more system memory…” message at 16k ctx, then you’re simply hitting RAM limits (WSL often effectively caps memory). In that case the two viable options are:

  1. Increase WSL memory/swap via Windows .wslconfig, or
  2. Use a model/settings that can actually run at 16k ctx on your available RAM (or use cloud/hybrid).

If you paste openclaw --version + the log chunk from openclaw logs --follow for one failed prompt, I can tell you which of those two you’re hitting.

earnest aspen
#

ok. after these adjustments. it seems to work

atomic capeBOT
#

Nice — that lines up with it being the ctx/KV-cache sizing issue.

If this is stable now, please mark the solution so it gets indexed on Answer Overflow: right‑click the solution message → Apps → “Mark Solution”.

Also (optional but useful for future folks): can you confirm what ended up working for you (e.g. contextWindow: 16384 + maxTokens: 1024, and whether you kept reasoning off)?