#how do i download open source model to hertznerVPS 8GB

1 messages · Page 1 of 1 (latest)

royal basin
#

howdo i download open source model to hertzner vpsdownload open source model

oblique capeBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

bold narwhalBOT
#

If you mean “download + run an open‑source LLM on my Hetzner VPS (8GB RAM)”, the practical answer is: you can, but you’re limited to small, heavily‑quantized models, and it’ll be slow on CPU.

0) Sanity checks (so you don’t waste time)

On the VPS:

free -h
df -h

You generally want several GB free RAM and 10–30GB free disk (models are big files).

Option A (easiest): Ollama (downloads models for you)

On Ubuntu/Debian:

curl -fsSL https://ollama.com/install.sh | sh

Then pull a model:

#
ollama pull <model_name>
ollama run <model_name>

I’m not going to guess model names in here (they change constantly), but for 8GB you usually want something in the “small (a few billion params) / 4‑bit quantized” range. Ask in #models for a current recommendation that fits 8GB.

Option B (manual): download from Hugging Face (GGUF) for llama.cpp

Install git-lfs:

sudo apt-get update
sudo apt-get install -y git-lfs
git lfs install

Then clone a model repo (example pattern):

#
git clone https://huggingface.co/<org>/<repo>

(You’ll want a GGUF file if you plan to run with llama.cpp.)

If your goal is OpenClaw specifically

OpenClaw can use local models if they expose an API (often OpenAI-compatible). Typical flow is:

  1. Run Ollama on the VPS
  2. Point OpenClaw at http://127.0.0.1:11434/v1 (OpenAI-compatible endpoint)

Docs to start: https://docs.openclaw.ai/concepts/models
For “what model fits 8GB”, best place is #models (people have up-to-date picks).

If you tell me your VPS OS (Ubuntu 22.04? 24.04?) and whether you want Ollama or llama.cpp, I’ll give you the exact commands for that path.

royal basin
#

thoma@snapshot-362436766-ubuntu-8gb-nbg1-1:~$ free -h
df -h
total used free shared buff/cache available
Mem: 7.6Gi 2.6Gi 2.2Gi 5.0Mi 3.0Gi 4.9Gi
Swap: 0B 0B 0B
Filesystem Size Used Avail Use% Mounted on
tmpfs 776M 1.1M 775M 1% /run
/dev/sda1 75G 14G 59G 19% /
tmpfs 3.8G 0 3.8G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sda15 253M 146K 252M 1% /boot/efi
tmpfs 776M 12K 776M 1% /run/user/0
tmpfs 776M 12K 776M 1% /run/user/1000

bold narwhalBOT
#

Nice — disk is plenty (59GB free). RAM is the limiting factor: you’ve got ~4.9GB available and no swap, so anything “medium” will OOM. You’ll want either (a) a small model, and/or (b) add swap so downloads/runs don’t crash.

Recommended: Ollama (simplest “download and run”)

1) Install

curl -fsSL https://ollama.com/install.sh | sh

2) Download (“pull”) a model

Pick something small + 4-bit quantized (8GB VPS-friendly) from Ollama’s library, then:

ollama pull <model_name>

3) Run it

#
ollama run <model_name>

4) Quick API test (optional)

curl http://127.0.0.1:11434/api/generate \
  -d '{"model":"<model_name>","prompt":"Say hello in one sentence."}'

Strongly suggested on your box: add swap (prevents random crashes)

This won’t make it fast, but it helps avoid OOM kills.

sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
#
sudo mkswap /swapfile
sudo swapon /swapfile
free -h

To make it permanent:

echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

If your goal is using it with OpenClaw

Once Ollama runs, you typically connect via its local HTTP endpoint (127.0.0.1:11434). OpenClaw model wiring depends on your config, so I’ll point you at the doc first:
https://docs.openclaw.ai/concepts/models

Tell me: are you trying to run the model directly in a terminal, or use it from OpenClaw? And which install method do you want: Ollama or llama.cpp?

royal basin
#

sudo swapon /swapfile
free -h
Setting up swapspace version 1, size = 8 GiB (8589930496 bytes)
no label, UUID=5ba18755-37b7-4c68-8aab-2cc160c6afed
total used free shared buff/cache available
Mem: 7.6Gi 5.2Gi 391Mi 5.0Mi 2.3Gi 2.4Gi
Swap: 8.0Gi 0B 8.0Gi
thoma@snapshot-362436766-ubuntu-8gb-nbg1-1:~$

thoma@snapshot-362436766-ubuntu-8gb-nbg1-1:~$ echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
/swapfile none swap sw 0 0

bold narwhalBOT
#

Good — swap is enabled and active now (Swap: 8.0Gi). 👍

Two quick follow-ups, then you can download/run a model:

1) Make sure you didn’t add /swapfile twice

(Not fatal, just messy.)

grep swapfile /etc/fstab
swapon --show

2) Download + run an open-source model (easiest: Ollama)

Install Ollama

curl -fsSL https://ollama.com/install.sh | sh
#

Pick a small model (8GB VPS friendly) and pull it

Go to https://ollama.com/library and choose something in the “small / a few billion params / 4-bit” range, then:

ollama pull <model_name>
ollama run <model_name>

Quick test (optional)

curl http://127.0.0.1:11434/api/generate \
  -d '{"model":"<model_name>","prompt":"Write one sentence about Hetzner."}'

If you tell me what you’re trying to do next:

  • just run it in terminal, or
  • connect it to OpenClaw
    …I’ll give you the exact next steps for that path.
royal basin
#

thoma@snapshot-362436766-ubuntu-8gb-nbg1-1:~$ grep swapfile /etc/fstab
swapon --show
/swapfile none swap sw 0 0
NAME TYPE SIZE USED PRIO
/swapfile file 8G 0B -2

bold narwhalBOT
#

Perfect — swap is set up correctly and active (only one /swapfile entry, and swapon sees it).

Next step: actually install a runner and pull a model. Easiest is Ollama.

Install Ollama (Ubuntu)

curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl status ollama --no-pager

Download (“pull”) a model

Browse https://ollama.com/library and pick something small (on 8GB RAM, look for tags like :1b / :3b if available), then:

ollama pull <model_name>
ollama run <model_name>
#

Quick sanity test via HTTP

curl http://127.0.0.1:11434/api/generate \
  -d '{"model":"<model_name>","prompt":"Say hi in one short sentence."}'

If you want to use it from your laptop / OpenClaw

Don’t expose port 11434 to the internet. Use an SSH tunnel instead:

ssh -L 11434:127.0.0.1:11434 thoma@<your_vps_ip>

Then on your laptop, http://127.0.0.1:11434 reaches the VPS Ollama safely.

If you paste what you get from ollama pull ... (success or error), I’ll help you tune it for the 8GB box.

royal basin
#

{"model":"phi4-mini","created_at":"2026-03-07T09:45:42.439983189Z","response":"Hi","done":false}
{"model":"phi4-mini","created_at":"2026-03-07T09:45:42.537943854Z","response":" there","done":false}
{"model":"phi4-mini","created_at":"2026-03-07T09:45:42.613845609Z","response":"!","done":false}
{"model":"phi4-mini","created_at":"2026-03-07T09:45:42.68662278Z","response":"","done":true,"done_reason":"stop","context":[200021,62316,5911,306,1001,4022,21872,13,200020,200019,12194,1354,0],"total_duration":807897845,"load_duration":223866447,"prompt_eval_count":10,"prompt_eval_duration":333307632,"eval_count":4,"eval_duration":245668885}
thoma@snapshot-362436766-ubuntu-8gb-nbg1-1:~$

bold narwhalBOT
#

Yep — that output means it’s working. You successfully downloaded the model and generated a response via Ollama’s HTTP API (the streamed done:false chunks ending with done:true is exactly what you want).

From here you can:

Use it interactively on the VPS

ollama run phi4-mini

See what you’ve downloaded

ollama list

Use it from another machine (recommended: SSH tunnel, don’t expose the port)

On your laptop:

#
ssh -L 11434:127.0.0.1:11434 thoma@<your_vps_ip>

Then your laptop can call:

  • http://127.0.0.1:11434/api/generate
  • and Ollama’s OpenAI-compatible endpoint at http://127.0.0.1:11434/v1

If this solved your original question, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow indexes it for the next person.

royal basin
#

ii have tailscale ip

bold narwhalBOT
#

If you have a Tailscale IP, you’ve got two good (safe) ways to reach Ollama from your other devices:

Option A (safest): keep Ollama on localhost, use an SSH tunnel over Tailscale

From your laptop (or wherever OpenClaw runs):

ssh -L 11434:127.0.0.1:11434 thoma@<your_tailscale_ip>

Then on your laptop you can call:

  • http://127.0.0.1:11434/api/generate
  • http://127.0.0.1:11434/v1/... (OpenAI-compatible)

Option B (nice + still safe): bind Ollama to the Tailscale IP only (not public internet)

On the VPS, get the Tailscale IP:

tailscale ip -4
#

Then set Ollama to listen on that IP via systemd (this is from Ollama’s official docs):

sudo systemctl edit ollama.service

Add:

[Service]
Environment="OLLAMA_HOST=<your_tailscale_ip>:11434"

Restart:

sudo systemctl daemon-reload
sudo systemctl restart ollama
ss -ltnp | grep 11434
#

Now from another device on your tailnet:

curl http://<your_tailscale_ip>:11434/api/generate \
  -d '{"model":"phi4-mini","prompt":"Hello from Tailscale"}'

If you tell me where OpenClaw is running (same VPS vs your laptop), I’ll tell you which option is simpler for that setup.

royal basin
#

what about this ssh -N -L 11434:127.0.0.1:11434 -o ServerAliveInterval=30 -o ExitOnForwardFailure=yes thoma@<tailscale_ip>

bold narwhalBOT
#

Yes — that command is exactly the “keep Ollama bound to localhost, tunnel over Tailscale” approach, and it’s a good one.

ssh -N \
  -L 11434:127.0.0.1:11434 \
  -o ServerAliveInterval=30 \
  -o ExitOnForwardFailure=yes \
  thoma@<tailscale_ip>

What it does:

  • -L 11434:127.0.0.1:11434 makes your local http://127.0.0.1:11434 forward to the VPS’s localhost Ollama.
  • -N = don’t run a remote shell/command (tunnel only)
  • ServerAliveInterval=30 keeps the tunnel from silently dying
  • ExitOnForwardFailure=yes fails fast if the port forward can’t be set up
#

After it’s running, on your local machine:

curl http://127.0.0.1:11434/api/generate \
  -d '{"model":"phi4-mini","prompt":"hello"}'

Two small tips:

  • If 11434 is already used on your laptop, pick another local port: -L 11435:127.0.0.1:11434 and then use http://127.0.0.1:11435.
  • If you want it to run in the background, add -f (but only after you’ve verified it connects cleanly).
royal basin
#

before i do this letme know the methods for changing from claudemoreltoopensource modelwhen i'musing opwnclaw

bold narwhalBOT
#

To switch OpenClaw from Claude to an open‑source/local model (like the one you’re running in Ollama), there are basically 3 methods—pick the one that matches how permanent you want the change to be:

Method 1: Switch just for this chat (Discord)

Use the in-chat model picker:

  • /model → pick an Ollama model for this session

If you ever get “Model … is not allowed”, it means you have an allowlist set (agents.defaults.models) and need to add the Ollama model there (or remove the allowlist). Docs: https://docs.openclaw.ai/concepts/models

Method 2: Make it the default (recommended)

Set the gateway’s default model to an Ollama model in config:

  • agents.defaults.model.primary: "ollama/<your_model>"

Ollama provider docs (important): https://docs.openclaw.ai/providers/ollama

Important gotcha (this corrects a common mistake)

For OpenClaw + tools, do not point OpenClaw at Ollama’s OpenAI-compatible /v1 URL.
Use the native Ollama base URL:

#
  • baseUrl: "http://127.0.0.1:11434" (no /v1)

(OpenClaw uses Ollama’s /api/chat for reliable tool calling.)

Method 3: Use CLI helpers (no hand-editing)

On the machine running the OpenClaw gateway:

  • openclaw models list
  • openclaw models set ollama/<your_model>

(See also: https://docs.openclaw.ai/concepts/models)


Where should OpenClaw run relative to Ollama?

  • If OpenClaw runs on the same VPS as Ollama: easiest—keep Ollama on 127.0.0.1:11434.
  • If OpenClaw runs on your laptop but Ollama is on the VPS: use your SSH tunnel so Ollama still appears at 127.0.0.1:11434 from OpenClaw’s point of view.
#

One question so I can give the exact steps: is your OpenClaw gateway running on the VPS, or on your local machine?

royal basin
#

OpenClaw gateway running on the VPS

bold narwhalBOT
#

Cool — if the OpenClaw gateway is running on the same VPS as Ollama, switching to an open‑source model is straightforward.

The “clean” way (keeps tool-calling working)

OpenClaw has a native Ollama provider. For that you want Ollama’s base URL like:

  • http://127.0.0.1:11434 (no /v1)

/v1 is the OpenAI-compatible mode and OpenClaw’s docs explicitly warn it can break tool calling / cause raw tool JSON output.

Docs: https://docs.openclaw.ai/providers/ollama


Step 1) Enable Ollama provider for OpenClaw

Easiest is to set an env var (Ollama doesn’t need a real key; any string works). Put this in OpenClaw’s global env file:

#
mkdir -p ~/.openclaw
echo 'OLLAMA_API_KEY=ollama-local' >> ~/.openclaw/.env

(Env var loading rules: https://docs.openclaw.ai/help/environment)

Then restart the gateway:

openclaw gateway restart

Step 2) Confirm OpenClaw sees local Ollama models

#
openclaw models list --local

You should see models with the ollama/... provider prefix.


Step 3) Switch from Claude → Ollama

Temporary (just this Discord chat)

Use:

  • /model and pick an ollama/... model

Make it default

Use the CLI helper: