#ollama (llama3.2:1b) is really slow in HA?

1 messages ยท Page 1 of 1 (latest)

orchid wind
#

so when i use HA to call ollama, it says the /api/chat calls take 3-5 seconds, but when i call it directly via docker exec, its super fast?

its a GTX 1650, i am confused

orchid wind
#

it always hangs for like 2 seconds on something like

time=2026-02-19T00:34:15.938Z level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=2589 prompt=2344 used=1559 remaining=785```
#

can i change the massive block of text its sending to ollama i think its causing the issue

#

it sends this "static context" that has so much info and i want to Change it

orchid wind
#

seems the llama models just sug and qwen is a bit better

#

intriguing

orchid wind
tawny inlet
# orchid wind so when i use HA to call ollama, it says the /api/chat calls take 3-5 seconds, b...

The prompt is there to tell the LLM how to interact with HA in order to be able to call tools etc...
It is needed for things to function.

You may be able to improve speed by reducing the number of entities exposed to the LLM in the voice assistant settings but the instructions prompt should not be changed.

Qwen3 is generally the recommended option and is now the default with the Ollama integration.

orchid wind
#

this stupid thing is what i want to edit

#

so at the very least its encoded a bit more efficiently

#

but if i unexpose the entities then i dont think it can even modify them

#

like why isnt it a template clueless

tawny inlet
tawny inlet
orchid wind
#

-# i mean isnt the entire philosiphy of HA that you can do that if you really want

tawny inlet
orchid wind
#

that is what i am going to look into

#

just trying to figure out how this code is even structured to begin with

tawny inlet