#how do i download open source model to hertznerVPS 8GB
1 messages · Page 1 of 1 (latest)
If you mean “download + run an open‑source LLM on my Hetzner VPS (8GB RAM)”, the practical answer is: you can, but you’re limited to small, heavily‑quantized models, and it’ll be slow on CPU.
0) Sanity checks (so you don’t waste time)
On the VPS:
free -h
df -h
You generally want several GB free RAM and 10–30GB free disk (models are big files).
Option A (easiest): Ollama (downloads models for you)
On Ubuntu/Debian:
curl -fsSL https://ollama.com/install.sh | sh
Then pull a model:
ollama pull <model_name>
ollama run <model_name>
I’m not going to guess model names in here (they change constantly), but for 8GB you usually want something in the “small (a few billion params) / 4‑bit quantized” range. Ask in #models for a current recommendation that fits 8GB.
Option B (manual): download from Hugging Face (GGUF) for llama.cpp
Install git-lfs:
sudo apt-get update
sudo apt-get install -y git-lfs
git lfs install
Then clone a model repo (example pattern):
git clone https://huggingface.co/<org>/<repo>
(You’ll want a GGUF file if you plan to run with llama.cpp.)
If your goal is OpenClaw specifically
OpenClaw can use local models if they expose an API (often OpenAI-compatible). Typical flow is:
- Run Ollama on the VPS
- Point OpenClaw at
http://127.0.0.1:11434/v1(OpenAI-compatible endpoint)
Docs to start: https://docs.openclaw.ai/concepts/models
For “what model fits 8GB”, best place is #models (people have up-to-date picks).
If you tell me your VPS OS (Ubuntu 22.04? 24.04?) and whether you want Ollama or llama.cpp, I’ll give you the exact commands for that path.
thoma@snapshot-362436766-ubuntu-8gb-nbg1-1:~$ free -h
df -h
total used free shared buff/cache available
Mem: 7.6Gi 2.6Gi 2.2Gi 5.0Mi 3.0Gi 4.9Gi
Swap: 0B 0B 0B
Filesystem Size Used Avail Use% Mounted on
tmpfs 776M 1.1M 775M 1% /run
/dev/sda1 75G 14G 59G 19% /
tmpfs 3.8G 0 3.8G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sda15 253M 146K 252M 1% /boot/efi
tmpfs 776M 12K 776M 1% /run/user/0
tmpfs 776M 12K 776M 1% /run/user/1000
Nice — disk is plenty (59GB free). RAM is the limiting factor: you’ve got ~4.9GB available and no swap, so anything “medium” will OOM. You’ll want either (a) a small model, and/or (b) add swap so downloads/runs don’t crash.
Recommended: Ollama (simplest “download and run”)
1) Install
curl -fsSL https://ollama.com/install.sh | sh
2) Download (“pull”) a model
Pick something small + 4-bit quantized (8GB VPS-friendly) from Ollama’s library, then:
ollama pull <model_name>
3) Run it
ollama run <model_name>
4) Quick API test (optional)
curl http://127.0.0.1:11434/api/generate \
-d '{"model":"<model_name>","prompt":"Say hello in one sentence."}'
Strongly suggested on your box: add swap (prevents random crashes)
This won’t make it fast, but it helps avoid OOM kills.
sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
free -h
To make it permanent:
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
If your goal is using it with OpenClaw
Once Ollama runs, you typically connect via its local HTTP endpoint (127.0.0.1:11434). OpenClaw model wiring depends on your config, so I’ll point you at the doc first:
https://docs.openclaw.ai/concepts/models
Tell me: are you trying to run the model directly in a terminal, or use it from OpenClaw? And which install method do you want: Ollama or llama.cpp?
sudo swapon /swapfile
free -h
Setting up swapspace version 1, size = 8 GiB (8589930496 bytes)
no label, UUID=5ba18755-37b7-4c68-8aab-2cc160c6afed
total used free shared buff/cache available
Mem: 7.6Gi 5.2Gi 391Mi 5.0Mi 2.3Gi 2.4Gi
Swap: 8.0Gi 0B 8.0Gi
thoma@snapshot-362436766-ubuntu-8gb-nbg1-1:~$
thoma@snapshot-362436766-ubuntu-8gb-nbg1-1:~$ echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
/swapfile none swap sw 0 0
Good — swap is enabled and active now (Swap: 8.0Gi). 👍
Two quick follow-ups, then you can download/run a model:
1) Make sure you didn’t add /swapfile twice
(Not fatal, just messy.)
grep swapfile /etc/fstab
swapon --show
2) Download + run an open-source model (easiest: Ollama)
Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
Pick a small model (8GB VPS friendly) and pull it
Go to https://ollama.com/library and choose something in the “small / a few billion params / 4-bit” range, then:
ollama pull <model_name>
ollama run <model_name>
Quick test (optional)
curl http://127.0.0.1:11434/api/generate \
-d '{"model":"<model_name>","prompt":"Write one sentence about Hetzner."}'
If you tell me what you’re trying to do next:
- just run it in terminal, or
- connect it to OpenClaw
…I’ll give you the exact next steps for that path.
thoma@snapshot-362436766-ubuntu-8gb-nbg1-1:~$ grep swapfile /etc/fstab
swapon --show
/swapfile none swap sw 0 0
NAME TYPE SIZE USED PRIO
/swapfile file 8G 0B -2
Perfect — swap is set up correctly and active (only one /swapfile entry, and swapon sees it).
Next step: actually install a runner and pull a model. Easiest is Ollama.
Install Ollama (Ubuntu)
curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl status ollama --no-pager
Download (“pull”) a model
Browse https://ollama.com/library and pick something small (on 8GB RAM, look for tags like :1b / :3b if available), then:
ollama pull <model_name>
ollama run <model_name>
Quick sanity test via HTTP
curl http://127.0.0.1:11434/api/generate \
-d '{"model":"<model_name>","prompt":"Say hi in one short sentence."}'
If you want to use it from your laptop / OpenClaw
Don’t expose port 11434 to the internet. Use an SSH tunnel instead:
ssh -L 11434:127.0.0.1:11434 thoma@<your_vps_ip>
Then on your laptop, http://127.0.0.1:11434 reaches the VPS Ollama safely.
If you paste what you get from ollama pull ... (success or error), I’ll help you tune it for the 8GB box.
{"model":"phi4-mini","created_at":"2026-03-07T09:45:42.439983189Z","response":"Hi","done":false}
{"model":"phi4-mini","created_at":"2026-03-07T09:45:42.537943854Z","response":" there","done":false}
{"model":"phi4-mini","created_at":"2026-03-07T09:45:42.613845609Z","response":"!","done":false}
{"model":"phi4-mini","created_at":"2026-03-07T09:45:42.68662278Z","response":"","done":true,"done_reason":"stop","context":[200021,62316,5911,306,1001,4022,21872,13,200020,200019,12194,1354,0],"total_duration":807897845,"load_duration":223866447,"prompt_eval_count":10,"prompt_eval_duration":333307632,"eval_count":4,"eval_duration":245668885}
thoma@snapshot-362436766-ubuntu-8gb-nbg1-1:~$
Yep — that output means it’s working. You successfully downloaded the model and generated a response via Ollama’s HTTP API (the streamed done:false chunks ending with done:true is exactly what you want).
From here you can:
Use it interactively on the VPS
ollama run phi4-mini
See what you’ve downloaded
ollama list
Use it from another machine (recommended: SSH tunnel, don’t expose the port)
On your laptop:
ssh -L 11434:127.0.0.1:11434 thoma@<your_vps_ip>
Then your laptop can call:
http://127.0.0.1:11434/api/generate- and Ollama’s OpenAI-compatible endpoint at
http://127.0.0.1:11434/v1
If this solved your original question, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow indexes it for the next person.
ii have tailscale ip
If you have a Tailscale IP, you’ve got two good (safe) ways to reach Ollama from your other devices:
Option A (safest): keep Ollama on localhost, use an SSH tunnel over Tailscale
From your laptop (or wherever OpenClaw runs):
ssh -L 11434:127.0.0.1:11434 thoma@<your_tailscale_ip>
Then on your laptop you can call:
http://127.0.0.1:11434/api/generatehttp://127.0.0.1:11434/v1/...(OpenAI-compatible)
Option B (nice + still safe): bind Ollama to the Tailscale IP only (not public internet)
On the VPS, get the Tailscale IP:
tailscale ip -4
Then set Ollama to listen on that IP via systemd (this is from Ollama’s official docs):
sudo systemctl edit ollama.service
Add:
[Service]
Environment="OLLAMA_HOST=<your_tailscale_ip>:11434"
Restart:
sudo systemctl daemon-reload
sudo systemctl restart ollama
ss -ltnp | grep 11434
Now from another device on your tailnet:
curl http://<your_tailscale_ip>:11434/api/generate \
-d '{"model":"phi4-mini","prompt":"Hello from Tailscale"}'
If you tell me where OpenClaw is running (same VPS vs your laptop), I’ll tell you which option is simpler for that setup.
what about this ssh -N -L 11434:127.0.0.1:11434 -o ServerAliveInterval=30 -o ExitOnForwardFailure=yes thoma@<tailscale_ip>
Yes — that command is exactly the “keep Ollama bound to localhost, tunnel over Tailscale” approach, and it’s a good one.
ssh -N \
-L 11434:127.0.0.1:11434 \
-o ServerAliveInterval=30 \
-o ExitOnForwardFailure=yes \
thoma@<tailscale_ip>
What it does:
-L 11434:127.0.0.1:11434makes your localhttp://127.0.0.1:11434forward to the VPS’s localhost Ollama.-N= don’t run a remote shell/command (tunnel only)ServerAliveInterval=30keeps the tunnel from silently dyingExitOnForwardFailure=yesfails fast if the port forward can’t be set up
After it’s running, on your local machine:
curl http://127.0.0.1:11434/api/generate \
-d '{"model":"phi4-mini","prompt":"hello"}'
Two small tips:
- If
11434is already used on your laptop, pick another local port:-L 11435:127.0.0.1:11434and then usehttp://127.0.0.1:11435. - If you want it to run in the background, add
-f(but only after you’ve verified it connects cleanly).
before i do this letme know the methods for changing from claudemoreltoopensource modelwhen i'musing opwnclaw
To switch OpenClaw from Claude to an open‑source/local model (like the one you’re running in Ollama), there are basically 3 methods—pick the one that matches how permanent you want the change to be:
Method 1: Switch just for this chat (Discord)
Use the in-chat model picker:
/model→ pick an Ollama model for this session
If you ever get “Model … is not allowed”, it means you have an allowlist set (agents.defaults.models) and need to add the Ollama model there (or remove the allowlist). Docs: https://docs.openclaw.ai/concepts/models
Method 2: Make it the default (recommended)
Set the gateway’s default model to an Ollama model in config:
agents.defaults.model.primary: "ollama/<your_model>"
Ollama provider docs (important): https://docs.openclaw.ai/providers/ollama
Important gotcha (this corrects a common mistake)
For OpenClaw + tools, do not point OpenClaw at Ollama’s OpenAI-compatible /v1 URL.
Use the native Ollama base URL:
baseUrl: "http://127.0.0.1:11434"(no/v1)
(OpenClaw uses Ollama’s /api/chat for reliable tool calling.)
Method 3: Use CLI helpers (no hand-editing)
On the machine running the OpenClaw gateway:
openclaw models listopenclaw models set ollama/<your_model>
(See also: https://docs.openclaw.ai/concepts/models)
Where should OpenClaw run relative to Ollama?
- If OpenClaw runs on the same VPS as Ollama: easiest—keep Ollama on
127.0.0.1:11434. - If OpenClaw runs on your laptop but Ollama is on the VPS: use your SSH tunnel so Ollama still appears at
127.0.0.1:11434from OpenClaw’s point of view.
One question so I can give the exact steps: is your OpenClaw gateway running on the VPS, or on your local machine?
OpenClaw gateway running on the VPS
Cool — if the OpenClaw gateway is running on the same VPS as Ollama, switching to an open‑source model is straightforward.
The “clean” way (keeps tool-calling working)
OpenClaw has a native Ollama provider. For that you want Ollama’s base URL like:
http://127.0.0.1:11434(no/v1)
/v1 is the OpenAI-compatible mode and OpenClaw’s docs explicitly warn it can break tool calling / cause raw tool JSON output.
Docs: https://docs.openclaw.ai/providers/ollama
Step 1) Enable Ollama provider for OpenClaw
Easiest is to set an env var (Ollama doesn’t need a real key; any string works). Put this in OpenClaw’s global env file:
mkdir -p ~/.openclaw
echo 'OLLAMA_API_KEY=ollama-local' >> ~/.openclaw/.env
(Env var loading rules: https://docs.openclaw.ai/help/environment)
Then restart the gateway:
openclaw gateway restart
Step 2) Confirm OpenClaw sees local Ollama models
openclaw models list --local
You should see models with the ollama/... provider prefix.
Step 3) Switch from Claude → Ollama
Temporary (just this Discord chat)
Use:
/modeland pick anollama/...model
Make it default
Use the CLI helper: