#Trying to disable security measures
1 messages · Page 1 of 1 (latest)
OpenClaw can’t “SSH into other servers” as a special feature — the only way it does that is by running your local ssh binary via the exec tool. If exec got locked down (tool policy, sandboxing, or approvals), then everything that looks like “run a command” will stop working.
A few quick clarifiers so we can pinpoint which gate is blocking you:
- Are you talking about your own self-hosted Gateway, or the bot in this Discord?
- The Discord bot here won’t run arbitrary commands / SSH for you. For that you need your own Gateway.
- What’s the exact error you see when you ask it to run something? (copy/paste)
Common ones map directly to fixes:
- “Tool exec is not allowed” → tool policy blocked it (
tools.allow/deny, tool profile) - “host=sandbox … sandboxing is off / not available” → exec is pointed at sandbox, but sandboxing isn’t enabled
- “approval-pending/denied” → exec approvals are set to deny / can’t prompt
- What version? Please run and paste:
openclaw --version
openclaw status --all
(redact tokens)
If your goal is “let my agent run ssh to machines I own”
You generally want exec to run on the gateway host or a paired node, not in the sandbox:
- Exec tool docs: https://docs.openclaw.ai/tools/exec
- Exec approvals (allowlist / prompts): https://docs.openclaw.ai/tools/exec-approvals
- Approvals CLI: https://docs.openclaw.ai/cli/approvals
- Nodes overview: https://docs.openclaw.ai/nodes
On the machine that should execute ssh, check your approvals and allowlist ssh (example):
which ssh
openclaw approvals get
openclaw approvals allowlist add --agent main "/usr/bin/ssh" # adjust path to your `which ssh`
I’m not going to help “disable security measures” globally, but I can help you restore the legit/secure flow (gateway vs node vs sandbox + allowlist) once you paste the exact error + openclaw --version.
Is the latest version, i can't still ask my openclaw to connect to another vps to execute things
“Latest version” helps, but I still need the exact version + the exact error text — otherwise we’re guessing.
1) First: what does it say when it fails?
When you ask it to do the SSH action, do you see something like:
Tool exec is not allowedapproval-pending/deniedssh: not foundhost=sandbox ... sandboxing is off- something else?
Copy/paste that message.
2) Paste these outputs (from the machine running your Gateway)
openclaw --version
openclaw status --all
(redact tokens)
3) Important detail: OpenClaw doesn’t “connect to another VPS” magically
It can only do it by running your local ssh via the exec tool. So to SSH to a second VPS:
- exec must be allowed by tool policy (
tools.allow/deny) - exec must run on the gateway host (or a paired node), not the sandbox
- the execution host must have
sshavailable and access to the SSH keys - if approvals are enabled,
sshoften needs to be allowlisted
A very common fix is: allowlist the ssh binary in exec approvals on the execution host:
which ssh
openclaw approvals get
openclaw approvals allowlist add --agent main "/usr/bin/ssh" # use the path from `which ssh`
And in chat, make sure you’re not running inside the sandbox:
- run
/execto see current settings - if it’s
host=sandbox, switch to gateway for that session:/exec host=gateway
Send the error text + openclaw --version and I’ll tell you exactly which of these gates is blocking you.
Is not a sandbox mode, is the normal one, how i can enable the tool policy if i got tools profile in full?
If tools.profile is full, then tool policy is already “enabled” in the sense that the model is allowed to see and request all tools. You only “restrict” tools further with tools.allow / tools.deny (deny wins).
What usually trips people up here is this separate thing:
Tool policy ≠where exec runs
Even with tools enabled, exec defaults to host=sandbox (see docs), and if you’re not running with sandboxing enabled, OpenClaw will fail closed instead of silently running on your host.
So if you’re “normal mode / not sandboxed”, you almost certainly need to set exec’s default host to the gateway (or a node).
Fix (config): make exec run on the gateway host
In ~/.openclaw/openclaw.json:
{
tools: {
exec: {
host: "gateway",
security: "allowlist",
ask: "on-miss",
},
},
}
Docs:
- Exec tool: https://docs.openclaw.ai/tools/exec
Next gate: exec approvals (allowlisting ssh)
If you’re trying to run ssh, host-exec may still be blocked by exec approvals on that machine. Check + allowlist:
which ssh
openclaw approvals get
openclaw approvals allowlist add --agent main "/usr/bin/ssh" # use the path from `which ssh`
Docs:
- Exec approvals: https://docs.openclaw.ai/tools/exec-approvals
- Approvals CLI: https://docs.openclaw.ai/cli/approvals
Quick sanity checks
Please paste the exact error text you get when it tries to run a command, and:
openclaw --version
Also: confirm where your Gateway runs (VPS? your laptop?) — because ssh will only work from the machine that’s actually executing exec and has the right SSH keys.
On a Windows VPS
Got it — Windows VPS changes the likely failure mode a lot:
- there’s no companion “approvals UI” like on macOS
- so anything that would require an approval prompt will usually end up denied (askFallback defaults to deny)
- and
execalso defaults tohost=sandbox, which won’t work if you didn’t enable sandboxing
Here’s the clean/safe setup for “let OpenClaw run ssh from my Windows VPS”.
1) Make exec run on the gateway host (Windows VPS), not sandbox
In %USERPROFILE%\.openclaw\openclaw.json (same as ~/.openclaw/openclaw.json on Linux docs):
{
tools: {
exec: {
host: "gateway",
security: "allowlist",
ask: "on-miss",
// optional: if ssh isn't found
pathPrepend: ["C:\\Windows\\System32\\OpenSSH"],
},
},
}
Docs: https://docs.openclaw.ai/tools/exec
2) Ensure Windows actually has ssh.exe
In PowerShell on the VPS:
ssh -V
where.exe ssh
Typical path is:
C:\Windows\System32\OpenSSH\ssh.exe
3) Allowlist ssh.exe in exec approvals (this is the big one on headless VPS)
openclaw approvals get
openclaw approvals allowlist add --agent main "C:\\Windows\\System32\\OpenSSH\\ssh.exe"
Docs:
4) If it still fails, I need the exact error text
Please paste:
openclaw --version
openclaw status --all
and the exact error message OpenClaw returns when it tries to run a command.
Also: confirm whether you’re trying to SSH from that Windows VPS to other servers you own (that’s the supported pattern), vs expecting OpenClaw to “remote into” another VPS without ssh (it can’t).
How i can completely delete/uninstall openclaw on windows and WSL?
Use the built-in uninstaller if you still have the CLI — it’s the cleanest way and removes the gateway service + state for you.
Docs: https://docs.openclaw.ai/install/uninstall
Windows (PowerShell)
Run as the same Windows user that installed it:
openclaw uninstall --all --yes --non-interactive
If you want to do it manually (or the CLI is already broken), remove the Windows service task + state:
# stop/remove the Scheduled Task (gateway service)
schtasks /End /TN "OpenClaw Gateway" 2>$null
schtasks /Delete /F /TN "OpenClaw Gateway"
# delete state/config/workspace (THIS DELETES EVERYTHING OpenClaw stored)
Remove-Item -Recurse -Force "$env:USERPROFILE\.openclaw" -ErrorAction SilentlyContinue
Remove-Item -Recurse -Force "$env:USERPROFILE\.openclaw-workspace" -ErrorAction SilentlyContinue # only if you used a custom dir
# remove the global CLI (pick what you used)
npm rm -g openclaw
# or: pnpm remove -g openclaw
Notes:
- If you used an OpenClaw profile (
OPENCLAW_PROFILE), you may also haveC:\Users\<you>\.openclaw-<profile>to delete, and a task namedOpenClaw Gateway (<profile>).
WSL (run inside your WSL distro)
In WSL, it’s a completely separate install/state from Windows. Run:
openclaw uninstall --all --yes --non-interactive
Manual wipe (if needed):
openclaw gateway stop || true
openclaw gateway uninstall || true
rm -rf ~/.openclaw ~/.openclaw-*/ ~/.openclaw/workspace
npm rm -g openclaw # or pnpm remove -g openclaw
If you tell me how you installed on each side (install.sh / npm / git checkout) and whether you used profiles, I can tailor it so you don’t miss any leftover service/task.
How i can install it back?
Reinstall is straightforward — use the installer script (it installs Node if needed and runs onboarding).
Docs: https://docs.openclaw.ai/install
Option A (recommended on Windows): install inside WSL2
In your WSL terminal:
curl -fsSL https://openclaw.ai/install.sh | bash
Option B: install on native Windows (PowerShell)
In an elevated PowerShell (or normal PS if your npm global install doesn’t need admin):
iwr -useb https://openclaw.ai/install.ps1 | iex
After install (either way)
Run:
openclaw doctor
openclaw status
If your end goal is “SSH to other VPSes from OpenClaw”, after reinstall you’ll still want to set exec to run on the gateway host and allowlist ssh.exe (we can do that once it’s back up).
It's done, but openclaw now uses RAM and not VRAM
OpenClaw itself is just a Node.js “control plane” process — it will use normal system RAM, not GPU VRAM. GPU/VRAM only comes into play if you’re running a local model server (Ollama / vLLM / etc.) on that machine and OpenClaw is configured to call it.
So usually when someone says “after reinstall it uses RAM not VRAM”, it’s one of these:
- Your VPS doesn’t actually have a GPU → you’ll never see VRAM usage there.
- You do have a GPU, but after reinstall you’re now using a cloud provider model again → inference happens remotely, so your GPU stays idle.
- You’re using a local runner, but it’s running in CPU mode (driver/CUDA/WSL GPU passthrough issue).
Quick check
On the VPS, run:
openclaw models status
and tell me what provider it says you’re using for the active model (don’t paste tokens).
If you want VRAM usage
You need:
- a GPU on that VPS
- a GPU-backed local inference server running (that’s what consumes VRAM)
- OpenClaw configured to point at that server
If you tell me:
- does the VPS have an NVIDIA GPU? (yes/no)
- are you running Windows-native OpenClaw or WSL2 OpenClaw?
- what does
openclaw models statusshow?
…I can point you to the exact provider doc for your setup.
It runs on WSL2 and it has a NVIDIA GPU
Cool — WSL2 + NVIDIA GPU is exactly the case where this is possible, but the key point remains:
OpenClaw will still use system RAM.
VRAM usage only happens in the model server you run locally (Ollama / vLLM / etc.). OpenClaw just sends HTTP requests to it.
So the question becomes: are you currently using a local GPU-backed provider, or did you fall back to a cloud provider after reinstall?
1) Verify the GPU is visible inside WSL2
In WSL:
nvidia-smi
If that doesn’t work, your WSL2 GPU passthrough/driver stack isn’t set up yet (Windows NVIDIA driver needs WSL support).
2) Check what model/provider OpenClaw is actually using
On the WSL2 box:
openclaw models status
If it shows OpenAI/Anthropic/etc, then inference is remote → no VRAM usage locally is expected.
3) If you want VRAM usage: point OpenClaw at a local model server
The most common local option is Ollama (runs inside WSL2 and can use CUDA if your setup is right).
OpenClaw docs for that:
https://docs.openclaw.ai/providers/ollama
High level steps:
- install/run Ollama in WSL2
- pull a model in Ollama
- set
OLLAMA_API_KEY(any value) so OpenClaw enables the provider - set your agent’s default model to
ollama/<your-model-id>
Then you’ll see VRAM usage when the model is actively generating (watch nvidia-smi while you send a prompt).
If you paste the output of openclaw models status (redact anything sensitive), I can tell you exactly what OpenClaw is currently routing to and what to change.
It gets detected but it still uses RAM and not VRAM
If you mean “the OpenClaw gateway process is using RAM”: that’s normal and expected. The gateway is a Node.js service — it won’t ever “move into VRAM”.
VRAM is only used by the local model runtime (Ollama / vLLM / etc.) while it is generating tokens.
So the real question is: is your model runtime actually using the GPU, or is it running CPU-only / spilling to RAM?
1) Confirm you’re looking at the right thing
While you send a prompt (so it’s actively generating), run this in another WSL terminal:
nvidia-smi -l 1
- If you never see an
ollama/python/ similar process show up in the GPU process list, then no GPU inference is happening. - If you do see a process and some VRAM usage, then everything is fine — OpenClaw still uses RAM, but inference is on GPU.
2) Confirm OpenClaw is actually using a local provider (not cloud)
Run:
openclaw models status
If it shows you’re using OpenAI/Anthropic/etc, then inference is remote → 0 VRAM usage locally.
If you want VRAM usage locally, openclaw models status should show a local provider (commonly ollama or vllm) and your active model should be something like ollama/<id> or vllm/<id>.
3) If you are using Ollama and still no VRAM
That’s then an Ollama/GPU setup issue (CPU-only install, CUDA/WSL driver mismatch, or the model is forced to run mostly on CPU / doesn’t fit VRAM).
At that point, paste:
- output of
openclaw models status - output of
nvidia-smi(just the first screen)
…and I can tell you whether OpenClaw is still routing to cloud, or whether it’s local-but-CPU.
Wed Mar 4 02:42:40 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 590.57 Driver Version: 591.86 CUDA Version: 13.1 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A |
| 0% 47C P8 13W / 170W | 833MiB / 12288MiB | 11% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
That nvidia-smi output is the smoking gun:
- GPU is visible to WSL2 (good)
- no running GPU processes (so nothing in WSL is doing CUDA inference right now)
- the ~833MiB “used” is basically baseline/driver/display overhead — not model inference
So this isn’t “OpenClaw chose RAM over VRAM” — it’s either:
- OpenClaw is still using a cloud model (so your GPU will stay idle), or
- you intended to use a local model server (Ollama/vLLM) but it’s not running / not being used / running CPU-only, or
- your model server is running on Windows side, and WSL’s
nvidia-smiwon’t show its processes (check Windowsnvidia-smiin that case).
Step 1: confirm what OpenClaw is actually routing to
In WSL:
openclaw models status
If it shows OpenAI/Anthropic/etc as the active provider, that explains it: inference is remote → no VRAM use locally.
Step 2: if you want VRAM usage, you need a local GPU model server + point OpenClaw at it
Most people use Ollama for this. OpenClaw docs:
https://docs.openclaw.ai/providers/ollama
Key gotcha on reinstalls/daemons: setting OLLAMA_API_KEY=... in your interactive shell often doesn’t reach the Gateway service. Put it in one of these instead:
~/.openclaw/.env(WSL):
OLLAMA_API_KEY=ollama-local- or config (
~/.openclaw/openclaw.json):
{
models: {
providers: {
ollama: { apiKey: "ollama-local", baseUrl: "http://127.0.0.1:11434", api: "ollama" }
}
},
agents: {
defaults: { model: { primary: "ollama/<your-model-id>" } }
}
}
(Replace <your-model-id> with whatever your Ollama server exposes.)
Step 3: prove GPU inference is happening
While a local model is generating, run:
nvidia-smi -l 1
You should see a compute process show up (often ollama or python) and VRAM climb.
If you paste your openclaw models status output, I can tell you immediately whether you’re still on cloud routing or just missing the Ollama/vLLM wiring.
🦞 OpenClaw 2026.3.2 (85377a2)
If it's repetitive, I'll automate it; if it's hard, I'll bring jokes and a rollback plan.
Config : ~/.openclaw/openclaw.json
Agent dir : ~/.openclaw/agents/main/agent
Default : ollama/qwen3.5:9b
Fallbacks (0) : -
Image model : -
Image fallbacks (0): -
Aliases (0) : -
Configured models (0): all
Auth overview
Auth store : ~/.openclaw/agents/main/agent/auth-profiles.json
Shell env : off
Providers w/ OAuth/tokens (0): -
- ollama effective=env:ol...al | env=ol...al | source=env: OLLAMA_API_KEY | models.json=ol...al | source=models.json: ~/.openclaw/agents/main/agent/models.json
- vllm effective=profiles:~/.openclaw/agents/main/agent/auth-profiles.json | profiles=1 (oauth=0, token=0, api_key=1) | vllm:default=no...ed
OAuth/token status
- none
01:44:27 [agents/model-providers] Failed to discover vLLM models: TypeError: fetch failed
Your openclaw models status confirms OpenClaw is routing to Ollama (Default: ollama/...), so the OpenClaw side is fine.
Now look at what your nvidia-smi screenshot actually says:
- GPU is visible (good)
- Processes: “No running processes found” → at the moment you ran it, nothing in WSL was doing GPU compute
Two important gotchas here:
1) Ollama/GPU only shows up while it’s actively generating
Run this in one WSL terminal:
nvidia-smi -l 1
Then in another terminal, trigger a long generation (either via ollama run ... or by sending a prompt to OpenClaw that takes a while). You should see a process appear and VRAM climb during generation.
If you only check when it’s idle, it’ll look like “no VRAM usage”.
2) You might be hitting Windows↔WSL split
If Ollama is actually running on the Windows side (and WSL is just connecting to 127.0.0.1:11434 via localhost forwarding), then WSL’s nvidia-smi may show no processes even though Windows is using the GPU.
To confirm where Ollama is running, inside WSL run:
ss -ltnp | grep 11434 || true
ps aux | grep -i ollama | head
- If you see an
ollamaprocess in WSL bound to:11434, it’s running in WSL. - If you don’t see it, but OpenClaw still works, you’re probably talking to Windows Ollama.
3) Also: “RAM usage” doesn’t mean GPU isn’t used
Even with GPU acceleration, Ollama will still use a bunch of system RAM for caching/overhead. The real signal is: do you see a GPU process + rising VRAM while generating.
If you paste the output of those two commands (ss ... + ps ...), I can tell you immediately whether you should be checking Windows nvidia-smi instead, or whether Ollama-in-WSL is running CPU-only.
It is running in WSL, but before on windows it wasn't taking the RAM
Yep — if you moved from “native Windows” to WSL2, it’ll look like it suddenly “eats RAM”, even if GPU is working.
Two separate things are happening:
1) OpenClaw won’t use VRAM, and it will always use some RAM
The Gateway is a Node.js service. It’s not the heavy part.
The heavy RAM usage is almost always Ollama + the loaded model.
You can confirm in WSL:
ps aux --sort=-rss | head -n 15
You’ll likely see ollama near the top.
2) WSL2 memory behavior: Windows shows it as “vmmem” RAM
WSL2 is a VM. When Linux allocates memory (for Ollama/model cache), Windows shows that as RAM use, and WSL often doesn’t give it back quickly even after load drops.
If your goal is “keep WSL from ballooning”, set a cap + enable memory reclaim in %UserProfile%.wslconfig (Windows side):
[wsl2]
memory=8GB # pick a safe cap for your VPS
processors=4
[experimental]
autoMemoryReclaim=gradual
Then apply it:
wsl --shutdown
Microsoft docs: https://learn.microsoft.com/en-us/windows/wsl/wsl-config
3) Even with GPU, Ollama can still use a lot of RAM
GPU acceleration doesn’t mean “no RAM”: weights/caches/overhead still live in system RAM (and in WSL that shows up as VM RAM).
If you tell me roughly how much RAM it’s using and whether that’s acceptable, I can suggest the least painful knob to turn (WSL cap vs switching back to native Windows vs using a cloud provider).
There is a way to use native ollama on windows on pick it on linux WSL?
Yes — you can run Ollama natively on Windows and have OpenClaw (in WSL2) use it, but two things must be true:
- WSL must be able to reach the Windows Ollama port, and
- OpenClaw must point its Ollama
baseUrlat that reachable address.
Step 1: Make Windows Ollama reachable from WSL
By default, Windows Ollama usually binds to 127.0.0.1:11434 (Windows-local only). WSL2 typically cannot reach Windows’ 127.0.0.1 unless you’re using newer “mirrored networking”.
Reliable approach (works on classic WSL2 NAT)
From WSL, the Windows host is usually the DNS “nameserver” IP:
WIN_HOST=$(awk '/nameserver/ {print $2; exit}' /etc/resolv.conf)
echo $WIN_HOST
curl "http://$WIN_HOST:11434/api/version"
- If that
curlfails, Windows Ollama is not listening on an address WSL can reach. - Fix is on the Windows side: make Ollama listen on an interface reachable from WSL (often by changing Ollama’s bind address). Ollama’s docs mention this via
OLLAMA_HOST(be careful: binding to0.0.0.0can expose it unless firewall rules restrict it). Official FAQ: https://docs.ollama.com/faq
If you’re on a recent WSL build (optional)
You can also try enabling WSL “mirrored networking” in %UserProfile%\.wslconfig and restarting WSL; that can make Windows localhost more seamless. Microsoft docs: https://learn.microsoft.com/en-us/windows/wsl/wsl-config
Step 2: Point OpenClaw (WSL) at the Windows Ollama URL
Once curl http://$WIN_HOST:11434/api/version works from WSL, configure OpenClaw’s Ollama provider baseUrl to http://<WIN_HOST>:11434 (no /v1).
OpenClaw Ollama docs: https://docs.openclaw.ai/providers/ollama
Important: once you set models.providers.ollama explicitly, OpenClaw stops auto-discovery and you’ll need to define the model(s) you want in config.
If you tell me what you get from this in WSL:
WIN_HOST=$(awk '/nameserver/ {print $2; exit}' /etc/resolv.conf)
curl -v "http://$WIN_HOST:11434/api/version"
…I can tell you whether you’re blocked at “Windows Ollama not reachable” vs “OpenClaw config”.
On WSL it loads RAM and it doesn't load VRAM, it looks like it's another model running on the same machine
What you’re describing (“it feels like another model on the same machine” + “WSL eats RAM and no VRAM”) is exactly what happens when you accidentally have two Ollama instances:
- one running in Windows
- one running in WSL
- and OpenClaw (running in WSL) is talking to the WSL one at
http://127.0.0.1:11434(WSL-local), not the Windows one.
Because WSL’s 127.0.0.1 is not Windows’ 127.0.0.1, both can exist and look “local”, but they’re different servers with different model caches and different GPU behavior.
1) Prove which Ollama OpenClaw is hitting (WSL vs Windows)
In WSL, run:
curl -s http://127.0.0.1:11434/api/tags | head
ps aux | grep -i ollama | head
On Windows (PowerShell), run:
iwr -useb http://127.0.0.1:11434/api/tags | select -First 20
If the model lists differ (or WSL shows an ollama process but Windows also has one), you’ve got two.
2) About “RAM vs VRAM”
Even when GPU is used, Ollama will still use a bunch of system RAM (model cache, KV cache, overhead). VRAM is additional, not a replacement.
But you should at least see a GPU process while generating. In WSL:
nvidia-smi -l 1
…and at the same time trigger a long response. If the process list stays empty during generation, that Ollama instance is CPU-only.
3) If your goal is “use Windows-native Ollama (GPU) and keep WSL RAM low”
That’s doable, but you need OpenClaw-in-WSL to target the Windows Ollama base URL (not WSL localhost). The clean way is:
- stop/disable the WSL Ollama so there’s no ambiguity
- make Windows Ollama reachable from WSL (via Windows-host IP / binding)
- set OpenClaw’s
models.providers.ollama.baseUrlto that Windows host address
Before we change anything, paste the outputs of the 3 commands above (WSL tags + WSL process + Windows tags). That will tell us in 30 seconds whether you’re running two Ollamas and which one is CPU-only.
NAME ID SIZE PROCESSOR CONTEXT UNTIL
qwen3.5:9b 6488c96fa5fa 21 GB 54%/46% CPU/GPU 262144 4 minutes from now
That output actually explains everything:
- SIZE 21 GB on an RTX 3060 12GB
- CONTEXT 262144 (262k!)
- PROCESSOR 54%/46% CPU/GPU → Ollama is offloading a big chunk to CPU/RAM because it can’t keep it all in VRAM.
So it’s not “not using VRAM” — it is (46%). It’s just also forced to use a ton of RAM.
Why RAM is huge
That 262k context is insanely memory-hungry. Ollama’s own docs say context length directly increases memory requirements, and 256k context is basically meant for very large VRAM boxes.
Ollama context docs: https://docs.ollama.com/context-length
How to make it mostly VRAM (and use far less RAM)
You need to lower the context length and/or use a smaller model/quantization so it fits in 12GB VRAM.
Start by lowering context (WSL example from Ollama docs):
# stop whatever is currently running ollama serve, then:
OLLAMA_CONTEXT_LENGTH=8192 ollama serve
Then run a prompt again and check:
ollama ps
You’re aiming for something like 100% GPU and a much smaller CONTEXT value.
If you truly need 262k context, you’ll basically need a much bigger GPU (or accept CPU/RAM offload).
If you tell me what RAM usage you’re seeing (and whether you actually need >8k/16k context), I can suggest a sane OLLAMA_CONTEXT_LENGTH target for your 3060.
Ok, which is the maximum context that i could configure with a RTX 3060 and 24GB RAM?
There isn’t a single fixed “max context” number for an RTX 3060 + 24GB RAM, because it depends heavily on:
- the model size / quantization (your
ollama psshowed 21GB, which already almost fills your 24GB RAM by itself), - how much of the model is on GPU vs CPU offload,
- and how much memory WSL + everything else is using.
But you can give a very practical answer:
Practical max (what’s likely to work well)
For an RTX 3060 (12GB VRAM):
- 4096 tokens: “always safe” baseline (this is also Ollama’s default for <24GiB VRAM)
- 8192 tokens: usually the sweet spot if you want a bit more room
- 16384 tokens: sometimes works, but often starts forcing CPU/RAM offload depending on the model
- 32768+ tokens: on a 12GB card, expect heavy offload and big RAM usage (usually not worth it unless you accept slowdown)
Given your specific case (a model showing 21GB), I’d start with 4096 or 8192. With only 24GB RAM total, 16k+ is very likely to push you into swap / OOM territory.
Ollama docs (matches this VRAM-based default guidance): https://docs.ollama.com/context-length
How to find your real max in 5 minutes
Increase until you see CPU offload or RAM pressure:
- Set and restart Ollama:
OLLAMA_CONTEXT_LENGTH=8192 ollama serve
- Run one longer prompt, then check:
ollama ps
free -h
nvidia-smi
- If it’s still stable and
PROCESSORis close to 100% GPU, try 16384.
If you see CPU% rising a lot (or RAM getting tight), back down.
If you tell me what ollama ps shows at 8192 (especially the CPU/GPU split) I can tell you whether 16384 is realistic on your setup.
64k context window is okay?
On an RTX 3060 (12GB VRAM) + 24GB RAM, 64k context is generally not “okay” in the sense of “fast + stable”.
Per Ollama’s own guidance, <24GiB VRAM defaults to 4k context, and higher context lengths quickly blow up memory usage (KV cache) and force CPU/RAM offload: https://docs.ollama.com/context-length
You might be able to force 64k, but expect one or more of:
- heavy CPU offload (your
ollama pswill show CPU% climbing) - huge RAM usage / swapping
- slow responses or outright OOM crashes
With the model you showed earlier (~21GB), 64k is basically a non-starter on 24GB RAM.
If you want a sane setup on that box:
- try 8k first, then 16k if it stays mostly GPU
- for true 64k+ agent/coding workflows, it’s usually better to use a cloud model (or a GPU with a lot more VRAM).
How much VRAM is recommended on a Qwen3.5:9B?
It depends a lot on quantization and the context length you run with, but for a 9B model the “GPU-only (no CPU offload)” rule of thumb looks like this:
Rough VRAM targets for Qwen3.5:9B
- ~8GB VRAM: can work if you’re using a heavy quant (e.g. ~4-bit) and keeping context modest (4k–8k).
- ~12GB VRAM (your RTX 3060): often OK for 4–5 bit at 4k–8k context, but you may still see some CPU offload depending on the exact build/options.
- ~16GB VRAM: comfortable for higher quality quants (5–6 bit) and/or 8k–16k context.
- ~24GB VRAM: comfortable for 8-bit or “almost full” runs, and for larger contexts without falling back to CPU so quickly.
- 64k+ context: typically wants a lot more VRAM (or you accept heavy CPU/RAM offload).
How to tell on your machine
ollama ps is the truth. If it shows 100% GPU you “fit”. If it shows a split like CPU/GPU, you don’t.
In your earlier output you had SIZE ~21GB and a huge context, so on a 12GB card it’s expected you’ll see CPU offload. If you want that model to stay mostly on GPU on a 3060, you’ll need to lower context and/or use a smaller quant so the “SIZE” comes down.
If you paste ollama list (just the size column + model name), I can tell you whether you’re running a quant that can realistically fit in 12GB.
If i do 128GB RAM without VRAM, how it will run?
If you have 128GB system RAM but no GPU (no VRAM), the model will run CPU-only.
What that means in practice:
- It will run (RAM is enough to load big weights and big KV caches).
- It will be much slower than GPU, because the bottleneck becomes CPU compute, not memory.
- You’ll also see latency spike hard as you increase context length (KV cache grows; attention gets expensive).
For something around 9B parameters:
- On a strong server CPU you might get “usable but not snappy” speeds.
- On a typical VPS CPU, it often feels sluggish (especially for agent/tool workflows).
So: 128GB RAM helps you not crash, but it doesn’t magically replace VRAM performance.
If your goal is OpenClaw doing tool-y things (exec/ssh, browsing, etc.) with decent responsiveness, CPU-only local inference is usually frustrating. The common practical setups are:
- keep local models for “cheap chat”, and
- use a cloud model (or a GPU box) when you need fast/strong agent behavior.