how do i download open source model to hertznerVPS 8GB | Friends of the Crustacean 🦞🤝 | Page 1

royal basin Mar 7, 2026, 9:28 AM

#

howdo i download open source model to hertzner vpsdownload open source model

oblique capeBOT Mar 7, 2026, 9:28 AM

#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

bold narwhalBOT Mar 7, 2026, 9:28 AM

#

If you mean “download + run an open‑source LLM on my Hetzner VPS (8GB RAM)”, the practical answer is: you can, but you’re limited to small, heavily‑quantized models, and it’ll be slow on CPU.

0) Sanity checks (so you don’t waste time)

On the VPS:

free -h
df -h

You generally want several GB free RAM and 10–30GB free disk (models are big files).

Option A (easiest): Ollama (downloads models for you)

On Ubuntu/Debian:

curl -fsSL https://ollama.com/install.sh | sh

Then pull a model:

#

ollama pull <model_name>
ollama run <model_name>

I’m not going to guess model names in here (they change constantly), but for 8GB you usually want something in the “small (a few billion params) / 4‑bit quantized” range. Ask in #models for a current recommendation that fits 8GB.

Option B (manual): download from Hugging Face (GGUF) for llama.cpp

Install git-lfs:

sudo apt-get update
sudo apt-get install -y git-lfs
git lfs install

Then clone a model repo (example pattern):

#

git clone https://huggingface.co/<org>/<repo>

(You’ll want a GGUF file if you plan to run with llama.cpp.)

If your goal is OpenClaw specifically

OpenClaw can use local models if they expose an API (often OpenAI-compatible). Typical flow is:

Run Ollama on the VPS
Point OpenClaw at http://127.0.0.1:11434/v1 (OpenAI-compatible endpoint)

Docs to start: https://docs.openclaw.ai/concepts/models
For “what model fits 8GB”, best place is #models (people have up-to-date picks).

If you tell me your VPS OS (Ubuntu 22.04? 24.04?) and whether you want Ollama or llama.cpp, I’ll give you the exact commands for that path.

royal basin Mar 7, 2026, 9:29 AM

#

thoma@snapshot-362436766-ubuntu-8gb-nbg1-1:~$ free -h
df -h
total used free shared buff/cache available
Mem: 7.6Gi 2.6Gi 2.2Gi 5.0Mi 3.0Gi 4.9Gi
Swap: 0B 0B 0B
Filesystem Size Used Avail Use% Mounted on
tmpfs 776M 1.1M 775M 1% /run
/dev/sda1 75G 14G 59G 19% /
tmpfs 3.8G 0 3.8G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sda15 253M 146K 252M 1% /boot/efi
tmpfs 776M 12K 776M 1% /run/user/0
tmpfs 776M 12K 776M 1% /run/user/1000

bold narwhalBOT Mar 7, 2026, 9:29 AM

#

Nice — disk is plenty (59GB free). RAM is the limiting factor: you’ve got ~4.9GB available and no swap, so anything “medium” will OOM. You’ll want either (a) a small model, and/or (b) add swap so downloads/runs don’t crash.

Recommended: Ollama (simplest “download and run”)

1) Install

curl -fsSL https://ollama.com/install.sh | sh

2) Download (“pull”) a model

Pick something small + 4-bit quantized (8GB VPS-friendly) from Ollama’s library, then:

ollama pull <model_name>

3) Run it

#

ollama run <model_name>

4) Quick API test (optional)

curl http://127.0.0.1:11434/api/generate \
  -d '{"model":"<model_name>","prompt":"Say hello in one sentence."}'

Strongly suggested on your box: add swap (prevents random crashes)

This won’t make it fast, but it helps avoid OOM kills.

sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile

#

sudo mkswap /swapfile
sudo swapon /swapfile
free -h

To make it permanent:

echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

If your goal is using it with OpenClaw

Once Ollama runs, you typically connect via its local HTTP endpoint (127.0.0.1:11434). OpenClaw model wiring depends on your config, so I’ll point you at the doc first:
https://docs.openclaw.ai/concepts/models

Tell me: are you trying to run the model directly in a terminal, or use it from OpenClaw? And which install method do you want: Ollama or llama.cpp?

royal basin Mar 7, 2026, 9:38 AM

#

sudo swapon /swapfile
free -h
Setting up swapspace version 1, size = 8 GiB (8589930496 bytes)
no label, UUID=5ba18755-37b7-4c68-8aab-2cc160c6afed
total used free shared buff/cache available
Mem: 7.6Gi 5.2Gi 391Mi 5.0Mi 2.3Gi 2.4Gi
Swap: 8.0Gi 0B 8.0Gi
thoma@snapshot-362436766-ubuntu-8gb-nbg1-1:~$

thoma@snapshot-362436766-ubuntu-8gb-nbg1-1:~$ echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
/swapfile none swap sw 0 0

bold narwhalBOT Mar 7, 2026, 9:39 AM

#

Good — swap is enabled and active now (Swap: 8.0Gi). 👍

Two quick follow-ups, then you can download/run a model:

1) Make sure you didn’t add `/swapfile` twice

(Not fatal, just messy.)

grep swapfile /etc/fstab
swapon --show

2) Download + run an open-source model (easiest: Ollama)

Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

#

Pick a small model (8GB VPS friendly) and pull it

Go to https://ollama.com/library and choose something in the “small / a few billion params / 4-bit” range, then:

ollama pull <model_name>
ollama run <model_name>

Quick test (optional)

curl http://127.0.0.1:11434/api/generate \
  -d '{"model":"<model_name>","prompt":"Write one sentence about Hetzner."}'

If you tell me what you’re trying to do next:

just run it in terminal, or
connect it to OpenClaw
…I’ll give you the exact next steps for that path.

royal basin Mar 7, 2026, 9:39 AM

#

thoma@snapshot-362436766-ubuntu-8gb-nbg1-1:~$ grep swapfile /etc/fstab
swapon --show
/swapfile none swap sw 0 0
NAME TYPE SIZE USED PRIO
/swapfile file 8G 0B -2

bold narwhalBOT Mar 7, 2026, 9:40 AM

#

Perfect — swap is set up correctly and active (only one /swapfile entry, and swapon sees it).

Next step: actually install a runner and pull a model. Easiest is Ollama.

Install Ollama (Ubuntu)

curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl status ollama --no-pager

Download (“pull”) a model

Browse https://ollama.com/library and pick something small (on 8GB RAM, look for tags like :1b / :3b if available), then:

ollama pull <model_name>
ollama run <model_name>

#

Quick sanity test via HTTP

curl http://127.0.0.1:11434/api/generate \
  -d '{"model":"<model_name>","prompt":"Say hi in one short sentence."}'

If you want to use it from your laptop / OpenClaw

Don’t expose port 11434 to the internet. Use an SSH tunnel instead:

ssh -L 11434:127.0.0.1:11434 thoma@<your_vps_ip>

Then on your laptop, http://127.0.0.1:11434 reaches the VPS Ollama safely.

If you paste what you get from ollama pull ... (success or error), I’ll help you tune it for the 8GB box.

royal basin Mar 7, 2026, 9:46 AM

#

{"model":"phi4-mini","created_at":"2026-03-07T09:45:42.439983189Z","response":"Hi","done":false}
{"model":"phi4-mini","created_at":"2026-03-07T09:45:42.537943854Z","response":" there","done":false}
{"model":"phi4-mini","created_at":"2026-03-07T09:45:42.613845609Z","response":"!","done":false}
{"model":"phi4-mini","created_at":"2026-03-07T09:45:42.68662278Z","response":"","done":true,"done_reason":"stop","context":[200021,62316,5911,306,1001,4022,21872,13,200020,200019,12194,1354,0],"total_duration":807897845,"load_duration":223866447,"prompt_eval_count":10,"prompt_eval_duration":333307632,"eval_count":4,"eval_duration":245668885}
thoma@snapshot-362436766-ubuntu-8gb-nbg1-1:~$

bold narwhalBOT Mar 7, 2026, 9:46 AM

#

Yep — that output means it’s working. You successfully downloaded the model and generated a response via Ollama’s HTTP API (the streamed done:false chunks ending with done:true is exactly what you want).

From here you can:

Use it interactively on the VPS

ollama run phi4-mini

See what you’ve downloaded

ollama list

Use it from another machine (recommended: SSH tunnel, don’t expose the port)

On your laptop:

#

ssh -L 11434:127.0.0.1:11434 thoma@<your_vps_ip>

Then your laptop can call:

http://127.0.0.1:11434/api/generate
and Ollama’s OpenAI-compatible endpoint at http://127.0.0.1:11434/v1

If this solved your original question, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow indexes it for the next person.

royal basin Mar 7, 2026, 10:03 AM

#

ii have tailscale ip

bold narwhalBOT Mar 7, 2026, 10:04 AM

#

If you have a Tailscale IP, you’ve got two good (safe) ways to reach Ollama from your other devices:

Option A (safest): keep Ollama on localhost, use an SSH tunnel over Tailscale

From your laptop (or wherever OpenClaw runs):

ssh -L 11434:127.0.0.1:11434 thoma@<your_tailscale_ip>

Then on your laptop you can call:

http://127.0.0.1:11434/api/generate
http://127.0.0.1:11434/v1/... (OpenAI-compatible)

Option B (nice + still safe): bind Ollama to the Tailscale IP only (not public internet)

On the VPS, get the Tailscale IP:

tailscale ip -4

#

Then set Ollama to listen on that IP via systemd (this is from Ollama’s official docs):

sudo systemctl edit ollama.service

Add:

[Service]
Environment="OLLAMA_HOST=<your_tailscale_ip>:11434"

Restart:

sudo systemctl daemon-reload
sudo systemctl restart ollama
ss -ltnp | grep 11434

#

Now from another device on your tailnet:

curl http://<your_tailscale_ip>:11434/api/generate \
  -d '{"model":"phi4-mini","prompt":"Hello from Tailscale"}'

If you tell me where OpenClaw is running (same VPS vs your laptop), I’ll tell you which option is simpler for that setup.

royal basin Mar 7, 2026, 10:06 AM

#

what about this ssh -N -L 11434:127.0.0.1:11434 -o ServerAliveInterval=30 -o ExitOnForwardFailure=yes thoma@<tailscale_ip>

bold narwhalBOT Mar 7, 2026, 10:06 AM

#

Yes — that command is exactly the “keep Ollama bound to localhost, tunnel over Tailscale” approach, and it’s a good one.

ssh -N \
  -L 11434:127.0.0.1:11434 \
  -o ServerAliveInterval=30 \
  -o ExitOnForwardFailure=yes \
  thoma@<tailscale_ip>

What it does:

-L 11434:127.0.0.1:11434 makes your local http://127.0.0.1:11434 forward to the VPS’s localhost Ollama.
-N = don’t run a remote shell/command (tunnel only)
ServerAliveInterval=30 keeps the tunnel from silently dying
ExitOnForwardFailure=yes fails fast if the port forward can’t be set up

#

After it’s running, on your local machine:

curl http://127.0.0.1:11434/api/generate \
  -d '{"model":"phi4-mini","prompt":"hello"}'

Two small tips:

If 11434 is already used on your laptop, pick another local port: -L 11435:127.0.0.1:11434 and then use http://127.0.0.1:11435.
If you want it to run in the background, add -f (but only after you’ve verified it connects cleanly).

royal basin Mar 7, 2026, 10:15 AM

#

before i do this letme know the methods for changing from claudemoreltoopensource modelwhen i'musing opwnclaw

bold narwhalBOT Mar 7, 2026, 10:17 AM

#

To switch OpenClaw from Claude to an open‑source/local model (like the one you’re running in Ollama), there are basically 3 methods—pick the one that matches how permanent you want the change to be:

Method 1: Switch just for this chat (Discord)

Use the in-chat model picker:

/model → pick an Ollama model for this session

If you ever get “Model … is not allowed”, it means you have an allowlist set (agents.defaults.models) and need to add the Ollama model there (or remove the allowlist). Docs: https://docs.openclaw.ai/concepts/models

Method 2: Make it the default (recommended)

Set the gateway’s default model to an Ollama model in config:

agents.defaults.model.primary: "ollama/<your_model>"

Ollama provider docs (important): https://docs.openclaw.ai/providers/ollama

Important gotcha (this corrects a common mistake)

For OpenClaw + tools, do not point OpenClaw at Ollama’s OpenAI-compatible /v1 URL.
Use the native Ollama base URL:

#

baseUrl: "http://127.0.0.1:11434" (no /v1)

(OpenClaw uses Ollama’s /api/chat for reliable tool calling.)

Method 3: Use CLI helpers (no hand-editing)

On the machine running the OpenClaw gateway:

openclaw models list
openclaw models set ollama/<your_model>

Where should OpenClaw run relative to Ollama?

If OpenClaw runs on the same VPS as Ollama: easiest—keep Ollama on 127.0.0.1:11434.
If OpenClaw runs on your laptop but Ollama is on the VPS: use your SSH tunnel so Ollama still appears at 127.0.0.1:11434 from OpenClaw’s point of view.

#

One question so I can give the exact steps: is your OpenClaw gateway running on the VPS, or on your local machine?

royal basin Mar 7, 2026, 10:19 AM

#

OpenClaw gateway running on the VPS

bold narwhalBOT Mar 7, 2026, 10:20 AM

#

Cool — if the OpenClaw gateway is running on the same VPS as Ollama, switching to an open‑source model is straightforward.

The “clean” way (keeps tool-calling working)

OpenClaw has a native Ollama provider. For that you want Ollama’s base URL like:

http://127.0.0.1:11434 (no /v1)

/v1 is the OpenAI-compatible mode and OpenClaw’s docs explicitly warn it can break tool calling / cause raw tool JSON output.

Docs: https://docs.openclaw.ai/providers/ollama

Step 1) Enable Ollama provider for OpenClaw

Easiest is to set an env var (Ollama doesn’t need a real key; any string works). Put this in OpenClaw’s global env file:

#

mkdir -p ~/.openclaw
echo 'OLLAMA_API_KEY=ollama-local' >> ~/.openclaw/.env

(Env var loading rules: https://docs.openclaw.ai/help/environment)

Then restart the gateway:

openclaw gateway restart

Step 2) Confirm OpenClaw sees local Ollama models

#

openclaw models list --local

You should see models with the ollama/... provider prefix.

Step 3) Switch from Claude → Ollama

Temporary (just this Discord chat)

Use:

/model and pick an ollama/... model

Make it default

Use the CLI helper:

#how do i download open source model to hertznerVPS 8GB