no response | Nous Research | Page 1

#

Can you clarify the specs of the machine you're running Ollama on?

#

A 4B model on four ARM cores can't run an agent loop. ollama run is a single chat turn — stateless, one-shot. Hermes chains 5–20 reasoning steps per task, each with tool selection, error handling, and self-correction. On CPU that's minutes per step, not seconds. Three hours to process "hi" is the expected behavior, not a glitch.

The 2% progress bar is because the model is still推理ing through the first planning turn. It hasn't hung — it's just that slow. The timeouts in the logs (title generation, vision fallback) are the system correctly giving up on requests that the CPU can't serve in time.

There's no fix for this on that hardware. An external API for the primary model, or a GPU, are the only paths. Running Hermes on a 4B CPU model is like trying to tow a boat with a scooter — technically the engine is running, but it's not going anywhere useful.

#

My Kimi dropped some Chinese in there for you lmao

#

But yeah, I’d have expected this to not work at all.

#

You’d be better off using free models.

#no response