#Trying to disable security measures

1 messages · Page 1 of 1 (latest)

clever gulch
#

I can't execute any kind of commands or i can do anything anymore, before i had openclaw to connect to other ssh servers and isn't possible anymore

viscid glenBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

willow lanternBOT
#

OpenClaw can’t “SSH into other servers” as a special feature — the only way it does that is by running your local ssh binary via the exec tool. If exec got locked down (tool policy, sandboxing, or approvals), then everything that looks like “run a command” will stop working.

A few quick clarifiers so we can pinpoint which gate is blocking you:

  1. Are you talking about your own self-hosted Gateway, or the bot in this Discord?
  • The Discord bot here won’t run arbitrary commands / SSH for you. For that you need your own Gateway.
  1. What’s the exact error you see when you ask it to run something? (copy/paste)
    Common ones map directly to fixes:
  • “Tool exec is not allowed” → tool policy blocked it (tools.allow/deny, tool profile)
  • “host=sandbox … sandboxing is off / not available” → exec is pointed at sandbox, but sandboxing isn’t enabled
  • “approval-pending/denied” → exec approvals are set to deny / can’t prompt
  1. What version? Please run and paste:
openclaw --version
#
openclaw status --all

(redact tokens)


If your goal is “let my agent run ssh to machines I own”

You generally want exec to run on the gateway host or a paired node, not in the sandbox:

On the machine that should execute ssh, check your approvals and allowlist ssh (example):

#
which ssh
openclaw approvals get
openclaw approvals allowlist add --agent main "/usr/bin/ssh"   # adjust path to your `which ssh`

I’m not going to help “disable security measures” globally, but I can help you restore the legit/secure flow (gateway vs node vs sandbox + allowlist) once you paste the exact error + openclaw --version.

clever gulch
#

Is the latest version, i can't still ask my openclaw to connect to another vps to execute things

willow lanternBOT
#

“Latest version” helps, but I still need the exact version + the exact error text — otherwise we’re guessing.

1) First: what does it say when it fails?

When you ask it to do the SSH action, do you see something like:

  • Tool exec is not allowed
  • approval-pending / denied
  • ssh: not found
  • host=sandbox ... sandboxing is off
  • something else?

Copy/paste that message.

2) Paste these outputs (from the machine running your Gateway)

openclaw --version
#
openclaw status --all

(redact tokens)

3) Important detail: OpenClaw doesn’t “connect to another VPS” magically

It can only do it by running your local ssh via the exec tool. So to SSH to a second VPS:

  • exec must be allowed by tool policy (tools.allow/deny)
  • exec must run on the gateway host (or a paired node), not the sandbox
  • the execution host must have ssh available and access to the SSH keys
  • if approvals are enabled, ssh often needs to be allowlisted

A very common fix is: allowlist the ssh binary in exec approvals on the execution host:

#
which ssh
openclaw approvals get
openclaw approvals allowlist add --agent main "/usr/bin/ssh"   # use the path from `which ssh`

And in chat, make sure you’re not running inside the sandbox:

  • run /exec to see current settings
  • if it’s host=sandbox, switch to gateway for that session: /exec host=gateway

Send the error text + openclaw --version and I’ll tell you exactly which of these gates is blocking you.

clever gulch
#

Is not a sandbox mode, is the normal one, how i can enable the tool policy if i got tools profile in full?

willow lanternBOT
#

If tools.profile is full, then tool policy is already “enabled” in the sense that the model is allowed to see and request all tools. You only “restrict” tools further with tools.allow / tools.deny (deny wins).

What usually trips people up here is this separate thing:

Tool policy ≠ where exec runs

Even with tools enabled, exec defaults to host=sandbox (see docs), and if you’re not running with sandboxing enabled, OpenClaw will fail closed instead of silently running on your host.

So if you’re “normal mode / not sandboxed”, you almost certainly need to set exec’s default host to the gateway (or a node).

Fix (config): make exec run on the gateway host

In ~/.openclaw/openclaw.json:

{
  tools: {
    exec: {
#
      host: "gateway",
      security: "allowlist",
      ask: "on-miss",
    },
  },
}

Docs:

Next gate: exec approvals (allowlisting ssh)

If you’re trying to run ssh, host-exec may still be blocked by exec approvals on that machine. Check + allowlist:

#
which ssh
openclaw approvals get
openclaw approvals allowlist add --agent main "/usr/bin/ssh"   # use the path from `which ssh`

Docs:

Quick sanity checks

Please paste the exact error text you get when it tries to run a command, and:

openclaw --version

Also: confirm where your Gateway runs (VPS? your laptop?) — because ssh will only work from the machine that’s actually executing exec and has the right SSH keys.

clever gulch
#

On a Windows VPS

willow lanternBOT
#

Got it — Windows VPS changes the likely failure mode a lot:

  • there’s no companion “approvals UI” like on macOS
  • so anything that would require an approval prompt will usually end up denied (askFallback defaults to deny)
  • and exec also defaults to host=sandbox, which won’t work if you didn’t enable sandboxing

Here’s the clean/safe setup for “let OpenClaw run ssh from my Windows VPS”.

1) Make exec run on the gateway host (Windows VPS), not sandbox

In %USERPROFILE%\.openclaw\openclaw.json (same as ~/.openclaw/openclaw.json on Linux docs):

{
  tools: {
    exec: {
      host: "gateway",
#
      security: "allowlist",
      ask: "on-miss",
      // optional: if ssh isn't found
      pathPrepend: ["C:\\Windows\\System32\\OpenSSH"],
    },
  },
}

Docs: https://docs.openclaw.ai/tools/exec

2) Ensure Windows actually has ssh.exe

In PowerShell on the VPS:

#

4) If it still fails, I need the exact error text

Please paste:

openclaw --version
openclaw status --all

and the exact error message OpenClaw returns when it tries to run a command.

Also: confirm whether you’re trying to SSH from that Windows VPS to other servers you own (that’s the supported pattern), vs expecting OpenClaw to “remote into” another VPS without ssh (it can’t).

clever gulch
#

How i can completely delete/uninstall openclaw on windows and WSL?

willow lanternBOT
#

Use the built-in uninstaller if you still have the CLI — it’s the cleanest way and removes the gateway service + state for you.

Docs: https://docs.openclaw.ai/install/uninstall

Windows (PowerShell)

Run as the same Windows user that installed it:

openclaw uninstall --all --yes --non-interactive

If you want to do it manually (or the CLI is already broken), remove the Windows service task + state:

# stop/remove the Scheduled Task (gateway service)
schtasks /End /TN "OpenClaw Gateway" 2>$null
#
schtasks /Delete /F /TN "OpenClaw Gateway"

# delete state/config/workspace (THIS DELETES EVERYTHING OpenClaw stored)
Remove-Item -Recurse -Force "$env:USERPROFILE\.openclaw" -ErrorAction SilentlyContinue
Remove-Item -Recurse -Force "$env:USERPROFILE\.openclaw-workspace" -ErrorAction SilentlyContinue  # only if you used a custom dir

# remove the global CLI (pick what you used)
npm rm -g openclaw
# or: pnpm remove -g openclaw

Notes:

  • If you used an OpenClaw profile (OPENCLAW_PROFILE), you may also have C:\Users\<you>\.openclaw-<profile> to delete, and a task named OpenClaw Gateway (<profile>).

WSL (run inside your WSL distro)

In WSL, it’s a completely separate install/state from Windows. Run:

#
openclaw uninstall --all --yes --non-interactive

Manual wipe (if needed):

openclaw gateway stop || true
openclaw gateway uninstall || true

rm -rf ~/.openclaw ~/.openclaw-*/ ~/.openclaw/workspace

npm rm -g openclaw   # or pnpm remove -g openclaw

If you tell me how you installed on each side (install.sh / npm / git checkout) and whether you used profiles, I can tailor it so you don’t miss any leftover service/task.

clever gulch
#

How i can install it back?

willow lanternBOT
#

Reinstall is straightforward — use the installer script (it installs Node if needed and runs onboarding).

Docs: https://docs.openclaw.ai/install

Option A (recommended on Windows): install inside WSL2

In your WSL terminal:

curl -fsSL https://openclaw.ai/install.sh | bash

Option B: install on native Windows (PowerShell)

In an elevated PowerShell (or normal PS if your npm global install doesn’t need admin):

iwr -useb https://openclaw.ai/install.ps1 | iex
#

After install (either way)

Run:

openclaw doctor
openclaw status

If your end goal is “SSH to other VPSes from OpenClaw”, after reinstall you’ll still want to set exec to run on the gateway host and allowlist ssh.exe (we can do that once it’s back up).

clever gulch
#

It's done, but openclaw now uses RAM and not VRAM

willow lanternBOT
#

OpenClaw itself is just a Node.js “control plane” process — it will use normal system RAM, not GPU VRAM. GPU/VRAM only comes into play if you’re running a local model server (Ollama / vLLM / etc.) on that machine and OpenClaw is configured to call it.

So usually when someone says “after reinstall it uses RAM not VRAM”, it’s one of these:

  1. Your VPS doesn’t actually have a GPU → you’ll never see VRAM usage there.
  2. You do have a GPU, but after reinstall you’re now using a cloud provider model again → inference happens remotely, so your GPU stays idle.
  3. You’re using a local runner, but it’s running in CPU mode (driver/CUDA/WSL GPU passthrough issue).

Quick check

On the VPS, run:

openclaw models status

and tell me what provider it says you’re using for the active model (don’t paste tokens).

#

If you want VRAM usage

You need:

  • a GPU on that VPS
  • a GPU-backed local inference server running (that’s what consumes VRAM)
  • OpenClaw configured to point at that server

If you tell me:

  • does the VPS have an NVIDIA GPU? (yes/no)
  • are you running Windows-native OpenClaw or WSL2 OpenClaw?
  • what does openclaw models status show?

…I can point you to the exact provider doc for your setup.

clever gulch
#

It runs on WSL2 and it has a NVIDIA GPU

willow lanternBOT
#

Cool — WSL2 + NVIDIA GPU is exactly the case where this is possible, but the key point remains:

OpenClaw will still use system RAM.
VRAM usage only happens in the model server you run locally (Ollama / vLLM / etc.). OpenClaw just sends HTTP requests to it.

So the question becomes: are you currently using a local GPU-backed provider, or did you fall back to a cloud provider after reinstall?

1) Verify the GPU is visible inside WSL2

In WSL:

nvidia-smi

If that doesn’t work, your WSL2 GPU passthrough/driver stack isn’t set up yet (Windows NVIDIA driver needs WSL support).

2) Check what model/provider OpenClaw is actually using

#

On the WSL2 box:

openclaw models status

If it shows OpenAI/Anthropic/etc, then inference is remote → no VRAM usage locally is expected.

3) If you want VRAM usage: point OpenClaw at a local model server

The most common local option is Ollama (runs inside WSL2 and can use CUDA if your setup is right).

OpenClaw docs for that:
https://docs.openclaw.ai/providers/ollama

High level steps:

  • install/run Ollama in WSL2
  • pull a model in Ollama
#
  • set OLLAMA_API_KEY (any value) so OpenClaw enables the provider
  • set your agent’s default model to ollama/<your-model-id>

Then you’ll see VRAM usage when the model is actively generating (watch nvidia-smi while you send a prompt).

If you paste the output of openclaw models status (redact anything sensitive), I can tell you exactly what OpenClaw is currently routing to and what to change.

clever gulch
#

It gets detected but it still uses RAM and not VRAM

willow lanternBOT
#

If you mean “the OpenClaw gateway process is using RAM”: that’s normal and expected. The gateway is a Node.js service — it won’t ever “move into VRAM”.

VRAM is only used by the local model runtime (Ollama / vLLM / etc.) while it is generating tokens.

So the real question is: is your model runtime actually using the GPU, or is it running CPU-only / spilling to RAM?

1) Confirm you’re looking at the right thing

While you send a prompt (so it’s actively generating), run this in another WSL terminal:

nvidia-smi -l 1
  • If you never see an ollama / python / similar process show up in the GPU process list, then no GPU inference is happening.
  • If you do see a process and some VRAM usage, then everything is fine — OpenClaw still uses RAM, but inference is on GPU.

2) Confirm OpenClaw is actually using a local provider (not cloud)

#

Run:

openclaw models status

If it shows you’re using OpenAI/Anthropic/etc, then inference is remote → 0 VRAM usage locally.

If you want VRAM usage locally, openclaw models status should show a local provider (commonly ollama or vllm) and your active model should be something like ollama/<id> or vllm/<id>.

3) If you are using Ollama and still no VRAM

That’s then an Ollama/GPU setup issue (CPU-only install, CUDA/WSL driver mismatch, or the model is forced to run mostly on CPU / doesn’t fit VRAM).

At that point, paste:

  • output of openclaw models status
  • output of nvidia-smi (just the first screen)
#

…and I can tell you whether OpenClaw is still routing to cloud, or whether it’s local-but-CPU.

clever gulch
#

Wed Mar 4 02:42:40 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 590.57 Driver Version: 591.86 CUDA Version: 13.1 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A |
| 0% 47C P8 13W / 170W | 833MiB / 12288MiB | 11% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

willow lanternBOT
#

That nvidia-smi output is the smoking gun:

  • GPU is visible to WSL2 (good)
  • no running GPU processes (so nothing in WSL is doing CUDA inference right now)
  • the ~833MiB “used” is basically baseline/driver/display overhead — not model inference

So this isn’t “OpenClaw chose RAM over VRAM” — it’s either:

  1. OpenClaw is still using a cloud model (so your GPU will stay idle), or
  2. you intended to use a local model server (Ollama/vLLM) but it’s not running / not being used / running CPU-only, or
  3. your model server is running on Windows side, and WSL’s nvidia-smi won’t show its processes (check Windows nvidia-smi in that case).

Step 1: confirm what OpenClaw is actually routing to

In WSL:

#
openclaw models status

If it shows OpenAI/Anthropic/etc as the active provider, that explains it: inference is remote → no VRAM use locally.

Step 2: if you want VRAM usage, you need a local GPU model server + point OpenClaw at it

Most people use Ollama for this. OpenClaw docs:
https://docs.openclaw.ai/providers/ollama

Key gotcha on reinstalls/daemons: setting OLLAMA_API_KEY=... in your interactive shell often doesn’t reach the Gateway service. Put it in one of these instead:

  • ~/.openclaw/.env (WSL):
    OLLAMA_API_KEY=ollama-local
  • or config (~/.openclaw/openclaw.json):
#
{
  models: {
    providers: {
      ollama: { apiKey: "ollama-local", baseUrl: "http://127.0.0.1:11434", api: "ollama" }
    }
  },
  agents: {
    defaults: { model: { primary: "ollama/<your-model-id>" } }
  }
}

(Replace <your-model-id> with whatever your Ollama server exposes.)

Step 3: prove GPU inference is happening

While a local model is generating, run:

#
nvidia-smi -l 1

You should see a compute process show up (often ollama or python) and VRAM climb.

If you paste your openclaw models status output, I can tell you immediately whether you’re still on cloud routing or just missing the Ollama/vLLM wiring.

clever gulch
#

🦞 OpenClaw 2026.3.2 (85377a2)
If it's repetitive, I'll automate it; if it's hard, I'll bring jokes and a rollback plan.

Config : ~/.openclaw/openclaw.json
Agent dir : ~/.openclaw/agents/main/agent
Default : ollama/qwen3.5:9b
Fallbacks (0) : -
Image model : -
Image fallbacks (0): -
Aliases (0) : -
Configured models (0): all

Auth overview
Auth store : ~/.openclaw/agents/main/agent/auth-profiles.json
Shell env : off
Providers w/ OAuth/tokens (0): -

  • ollama effective=env:ol...al | env=ol...al | source=env: OLLAMA_API_KEY | models.json=ol...al | source=models.json: ~/.openclaw/agents/main/agent/models.json
  • vllm effective=profiles:~/.openclaw/agents/main/agent/auth-profiles.json | profiles=1 (oauth=0, token=0, api_key=1) | vllm:default=no...ed

OAuth/token status

  • none
    01:44:27 [agents/model-providers] Failed to discover vLLM models: TypeError: fetch failed
willow lanternBOT
#

Your openclaw models status confirms OpenClaw is routing to Ollama (Default: ollama/...), so the OpenClaw side is fine.

Now look at what your nvidia-smi screenshot actually says:

  • GPU is visible (good)
  • Processes: “No running processes found” → at the moment you ran it, nothing in WSL was doing GPU compute

Two important gotchas here:

1) Ollama/GPU only shows up while it’s actively generating

Run this in one WSL terminal:

nvidia-smi -l 1

Then in another terminal, trigger a long generation (either via ollama run ... or by sending a prompt to OpenClaw that takes a while). You should see a process appear and VRAM climb during generation.

#

If you only check when it’s idle, it’ll look like “no VRAM usage”.

2) You might be hitting Windows↔WSL split

If Ollama is actually running on the Windows side (and WSL is just connecting to 127.0.0.1:11434 via localhost forwarding), then WSL’s nvidia-smi may show no processes even though Windows is using the GPU.

To confirm where Ollama is running, inside WSL run:

ss -ltnp | grep 11434 || true
ps aux | grep -i ollama | head
  • If you see an ollama process in WSL bound to :11434, it’s running in WSL.
  • If you don’t see it, but OpenClaw still works, you’re probably talking to Windows Ollama.

3) Also: “RAM usage” doesn’t mean GPU isn’t used

Even with GPU acceleration, Ollama will still use a bunch of system RAM for caching/overhead. The real signal is: do you see a GPU process + rising VRAM while generating.

#

If you paste the output of those two commands (ss ... + ps ...), I can tell you immediately whether you should be checking Windows nvidia-smi instead, or whether Ollama-in-WSL is running CPU-only.

clever gulch
#

It is running in WSL, but before on windows it wasn't taking the RAM

willow lanternBOT
#

Yep — if you moved from “native Windows” to WSL2, it’ll look like it suddenly “eats RAM”, even if GPU is working.

Two separate things are happening:

1) OpenClaw won’t use VRAM, and it will always use some RAM

The Gateway is a Node.js service. It’s not the heavy part.

The heavy RAM usage is almost always Ollama + the loaded model.

You can confirm in WSL:

ps aux --sort=-rss | head -n 15

You’ll likely see ollama near the top.

#

2) WSL2 memory behavior: Windows shows it as “vmmem” RAM

WSL2 is a VM. When Linux allocates memory (for Ollama/model cache), Windows shows that as RAM use, and WSL often doesn’t give it back quickly even after load drops.

If your goal is “keep WSL from ballooning”, set a cap + enable memory reclaim in %UserProfile%.wslconfig (Windows side):

[wsl2]
memory=8GB          # pick a safe cap for your VPS
processors=4

[experimental]
autoMemoryReclaim=gradual

Then apply it:

#
wsl --shutdown

Microsoft docs: https://learn.microsoft.com/en-us/windows/wsl/wsl-config

3) Even with GPU, Ollama can still use a lot of RAM

GPU acceleration doesn’t mean “no RAM”: weights/caches/overhead still live in system RAM (and in WSL that shows up as VM RAM).

If you tell me roughly how much RAM it’s using and whether that’s acceptable, I can suggest the least painful knob to turn (WSL cap vs switching back to native Windows vs using a cloud provider).

clever gulch
#

There is a way to use native ollama on windows on pick it on linux WSL?

willow lanternBOT
#

Yes — you can run Ollama natively on Windows and have OpenClaw (in WSL2) use it, but two things must be true:

  1. WSL must be able to reach the Windows Ollama port, and
  2. OpenClaw must point its Ollama baseUrl at that reachable address.

Step 1: Make Windows Ollama reachable from WSL

By default, Windows Ollama usually binds to 127.0.0.1:11434 (Windows-local only). WSL2 typically cannot reach Windows’ 127.0.0.1 unless you’re using newer “mirrored networking”.

Reliable approach (works on classic WSL2 NAT)

From WSL, the Windows host is usually the DNS “nameserver” IP:

WIN_HOST=$(awk '/nameserver/ {print $2; exit}' /etc/resolv.conf)
echo $WIN_HOST
curl "http://$WIN_HOST:11434/api/version"
#
  • If that curl fails, Windows Ollama is not listening on an address WSL can reach.
  • Fix is on the Windows side: make Ollama listen on an interface reachable from WSL (often by changing Ollama’s bind address). Ollama’s docs mention this via OLLAMA_HOST (be careful: binding to 0.0.0.0 can expose it unless firewall rules restrict it). Official FAQ: https://docs.ollama.com/faq

If you’re on a recent WSL build (optional)

You can also try enabling WSL “mirrored networking” in %UserProfile%\.wslconfig and restarting WSL; that can make Windows localhost more seamless. Microsoft docs: https://learn.microsoft.com/en-us/windows/wsl/wsl-config

Step 2: Point OpenClaw (WSL) at the Windows Ollama URL

Once curl http://$WIN_HOST:11434/api/version works from WSL, configure OpenClaw’s Ollama provider baseUrl to http://<WIN_HOST>:11434 (no /v1).

OpenClaw Ollama docs: https://docs.openclaw.ai/providers/ollama

Important: once you set models.providers.ollama explicitly, OpenClaw stops auto-discovery and you’ll need to define the model(s) you want in config.


If you tell me what you get from this in WSL:

#
WIN_HOST=$(awk '/nameserver/ {print $2; exit}' /etc/resolv.conf)
curl -v "http://$WIN_HOST:11434/api/version"

…I can tell you whether you’re blocked at “Windows Ollama not reachable” vs “OpenClaw config”.

clever gulch
#

On WSL it loads RAM and it doesn't load VRAM, it looks like it's another model running on the same machine

willow lanternBOT
#

What you’re describing (“it feels like another model on the same machine” + “WSL eats RAM and no VRAM”) is exactly what happens when you accidentally have two Ollama instances:

  • one running in Windows
  • one running in WSL
  • and OpenClaw (running in WSL) is talking to the WSL one at http://127.0.0.1:11434 (WSL-local), not the Windows one.

Because WSL’s 127.0.0.1 is not Windows’ 127.0.0.1, both can exist and look “local”, but they’re different servers with different model caches and different GPU behavior.

1) Prove which Ollama OpenClaw is hitting (WSL vs Windows)

In WSL, run:

curl -s http://127.0.0.1:11434/api/tags | head
ps aux | grep -i ollama | head

On Windows (PowerShell), run:

#
iwr -useb http://127.0.0.1:11434/api/tags | select -First 20

If the model lists differ (or WSL shows an ollama process but Windows also has one), you’ve got two.

2) About “RAM vs VRAM”

Even when GPU is used, Ollama will still use a bunch of system RAM (model cache, KV cache, overhead). VRAM is additional, not a replacement.

But you should at least see a GPU process while generating. In WSL:

nvidia-smi -l 1

…and at the same time trigger a long response. If the process list stays empty during generation, that Ollama instance is CPU-only.

#

3) If your goal is “use Windows-native Ollama (GPU) and keep WSL RAM low”

That’s doable, but you need OpenClaw-in-WSL to target the Windows Ollama base URL (not WSL localhost). The clean way is:

  • stop/disable the WSL Ollama so there’s no ambiguity
  • make Windows Ollama reachable from WSL (via Windows-host IP / binding)
  • set OpenClaw’s models.providers.ollama.baseUrl to that Windows host address

Before we change anything, paste the outputs of the 3 commands above (WSL tags + WSL process + Windows tags). That will tell us in 30 seconds whether you’re running two Ollamas and which one is CPU-only.

clever gulch
#

NAME ID SIZE PROCESSOR CONTEXT UNTIL
qwen3.5:9b 6488c96fa5fa 21 GB 54%/46% CPU/GPU 262144 4 minutes from now

willow lanternBOT
#

That output actually explains everything:

  • SIZE 21 GB on an RTX 3060 12GB
  • CONTEXT 262144 (262k!)
  • PROCESSOR 54%/46% CPU/GPU → Ollama is offloading a big chunk to CPU/RAM because it can’t keep it all in VRAM.

So it’s not “not using VRAM” — it is (46%). It’s just also forced to use a ton of RAM.

Why RAM is huge

That 262k context is insanely memory-hungry. Ollama’s own docs say context length directly increases memory requirements, and 256k context is basically meant for very large VRAM boxes.

Ollama context docs: https://docs.ollama.com/context-length

How to make it mostly VRAM (and use far less RAM)

You need to lower the context length and/or use a smaller model/quantization so it fits in 12GB VRAM.

Start by lowering context (WSL example from Ollama docs):

#
# stop whatever is currently running ollama serve, then:
OLLAMA_CONTEXT_LENGTH=8192 ollama serve

Then run a prompt again and check:

ollama ps

You’re aiming for something like 100% GPU and a much smaller CONTEXT value.

If you truly need 262k context, you’ll basically need a much bigger GPU (or accept CPU/RAM offload).

If you tell me what RAM usage you’re seeing (and whether you actually need >8k/16k context), I can suggest a sane OLLAMA_CONTEXT_LENGTH target for your 3060.

clever gulch
#

Ok, which is the maximum context that i could configure with a RTX 3060 and 24GB RAM?

willow lanternBOT
#

There isn’t a single fixed “max context” number for an RTX 3060 + 24GB RAM, because it depends heavily on:

  • the model size / quantization (your ollama ps showed 21GB, which already almost fills your 24GB RAM by itself),
  • how much of the model is on GPU vs CPU offload,
  • and how much memory WSL + everything else is using.

But you can give a very practical answer:

Practical max (what’s likely to work well)

For an RTX 3060 (12GB VRAM):

  • 4096 tokens: “always safe” baseline (this is also Ollama’s default for <24GiB VRAM)
  • 8192 tokens: usually the sweet spot if you want a bit more room
  • 16384 tokens: sometimes works, but often starts forcing CPU/RAM offload depending on the model
  • 32768+ tokens: on a 12GB card, expect heavy offload and big RAM usage (usually not worth it unless you accept slowdown)

Given your specific case (a model showing 21GB), I’d start with 4096 or 8192. With only 24GB RAM total, 16k+ is very likely to push you into swap / OOM territory.

#

Ollama docs (matches this VRAM-based default guidance): https://docs.ollama.com/context-length

How to find your real max in 5 minutes

Increase until you see CPU offload or RAM pressure:

  1. Set and restart Ollama:
OLLAMA_CONTEXT_LENGTH=8192 ollama serve
  1. Run one longer prompt, then check:
ollama ps
free -h
nvidia-smi
#
  1. If it’s still stable and PROCESSOR is close to 100% GPU, try 16384.
    If you see CPU% rising a lot (or RAM getting tight), back down.

If you tell me what ollama ps shows at 8192 (especially the CPU/GPU split) I can tell you whether 16384 is realistic on your setup.

clever gulch
#

64k context window is okay?

willow lanternBOT
#

On an RTX 3060 (12GB VRAM) + 24GB RAM, 64k context is generally not “okay” in the sense of “fast + stable”.

Per Ollama’s own guidance, <24GiB VRAM defaults to 4k context, and higher context lengths quickly blow up memory usage (KV cache) and force CPU/RAM offload: https://docs.ollama.com/context-length

You might be able to force 64k, but expect one or more of:

  • heavy CPU offload (your ollama ps will show CPU% climbing)
  • huge RAM usage / swapping
  • slow responses or outright OOM crashes

With the model you showed earlier (~21GB), 64k is basically a non-starter on 24GB RAM.

If you want a sane setup on that box:

  • try 8k first, then 16k if it stays mostly GPU
  • for true 64k+ agent/coding workflows, it’s usually better to use a cloud model (or a GPU with a lot more VRAM).
clever gulch
#

How much VRAM is recommended on a Qwen3.5:9B?

willow lanternBOT
#

It depends a lot on quantization and the context length you run with, but for a 9B model the “GPU-only (no CPU offload)” rule of thumb looks like this:

Rough VRAM targets for Qwen3.5:9B

  • ~8GB VRAM: can work if you’re using a heavy quant (e.g. ~4-bit) and keeping context modest (4k–8k).
  • ~12GB VRAM (your RTX 3060): often OK for 4–5 bit at 4k–8k context, but you may still see some CPU offload depending on the exact build/options.
  • ~16GB VRAM: comfortable for higher quality quants (5–6 bit) and/or 8k–16k context.
  • ~24GB VRAM: comfortable for 8-bit or “almost full” runs, and for larger contexts without falling back to CPU so quickly.
  • 64k+ context: typically wants a lot more VRAM (or you accept heavy CPU/RAM offload).

How to tell on your machine

ollama ps is the truth. If it shows 100% GPU you “fit”. If it shows a split like CPU/GPU, you don’t.

In your earlier output you had SIZE ~21GB and a huge context, so on a 12GB card it’s expected you’ll see CPU offload. If you want that model to stay mostly on GPU on a 3060, you’ll need to lower context and/or use a smaller quant so the “SIZE” comes down.

If you paste ollama list (just the size column + model name), I can tell you whether you’re running a quant that can realistically fit in 12GB.

clever gulch
#

If i do 128GB RAM without VRAM, how it will run?

willow lanternBOT
#

If you have 128GB system RAM but no GPU (no VRAM), the model will run CPU-only.

What that means in practice:

  • It will run (RAM is enough to load big weights and big KV caches).
  • It will be much slower than GPU, because the bottleneck becomes CPU compute, not memory.
  • You’ll also see latency spike hard as you increase context length (KV cache grows; attention gets expensive).

For something around 9B parameters:

  • On a strong server CPU you might get “usable but not snappy” speeds.
  • On a typical VPS CPU, it often feels sluggish (especially for agent/tool workflows).

So: 128GB RAM helps you not crash, but it doesn’t magically replace VRAM performance.

If your goal is OpenClaw doing tool-y things (exec/ssh, browsing, etc.) with decent responsiveness, CPU-only local inference is usually frustrating. The common practical setups are:

  • keep local models for “cheap chat”, and
  • use a cloud model (or a GPU box) when you need fast/strong agent behavior.