#Is it possible to run Hermes Agent off of LMlink/Similar?

1 messages · Page 1 of 1 (latest)

upper heart
#

I have a macbook pro (48gb unified memory) and a macbook air (16gigs unified memory)

I run hermes agent on my macbook air and treat it as a sort of burner laptop with a separate apple ID and let hermes agent operate quite freely without the risks of setting it loose on my main computer.

Currently this setup uses Openrouter API with qwen3.6 plus, i was hoping to go off openrouter cloud API To some “local API” type setup if that makes sense, with qwen3.6 27B running on my macbook pro 48gb unified memory powering the hermes agent running on my macbook air.

I hope i am able to express the concept of what im trying to accomplish.

Is this setup possible with LMlink or something similar?

meager cave
#

@craggy quail

craggy quailBOT
#

📚 skill_view: "hermes-agent"

craggy quailBOT
# meager cave <@1498667414873968700>

yeah, this is totally doable. the setup you're describing is a custom OpenAI-compatible endpoint — Hermes Air points at an LLM server running on Macbook Pro, using the custom endpoint provider.

here's exactly how to wire it up:

on the Macbook Pro (48GB, the server):

run an OpenAI-compatible local server that loads Qwen3 27B. the two main options:

  • Ollama — easiest install, exposes http://0.0.0.0:11434/v1 out of the box
  • LM Studio — GUI-first, local server mode with OpenAI-compatible API
  • llama.cpp server — more control, same API shape

Qwen3 27B at Q4 quantization needs ~16-18GB, so 48GB unified handles it comfortably.

make sure the server is listening on your LAN IP (not just localhost), otherwise the Air can't reach it. in Ollama that's:

OLLAMA_HOST=0.0.0.0 ollama serve

then note your Pro's Tailscale or LAN IP.

on the Macbook Air (running Hermes):

configure a custom provider pointing at the Pro:

hermes config set model.provider custom
hermes config set model.base_url http://<MACBOOK_PRO_IP>:11434/v1
hermes config set model.api_key ollama
hermes config set model.default qwen3:27b

or if you prefer editing directly:

hermes config edit

add to config.yaml:

model:
  provider: custom
  base_url: http://<MACBOOK_PRO_IP>:11434/v1
  api_key: "ollama"
  default: "qwen3:27b"

the api_key can be anything — Ollama doesn't actually check it, but the field is required.

LMlink specifically — I haven't personally validated it, but if it exposes an OpenAI-compatible /v1/chat/completions endpoint the exact same config works, just swap the base_url. check their docs for the local server URL format.

verify it works:

hermes chat -q "hello" --provider custom

if you get a response, you're good. if you get a connection error, it's usually the IP or the server not binding to 0.0.0.0.

vagrant light
#

yes, you can set it up with lmlink. just install lmstudio on both machines and link them like you normally would, enable the dev server on the machine you want to use it on and and then use the loopback http://127.0.0.1:1234/v1 as a custom provider in hermes model on the machine after it's set up to point at the other via lmlink. i tested this out real quick just to verify, so ignore the model selection for now, that's just what i had laying around. you'll want to load and set these models to a minimum context size of 64000 (more for better headroom for compression, etc.) to avoid any issues and use good recent tool using agentic model.

#

that bot response really liked ollama... huh... and almost knew what you were talking about...