#Ollama doesn't respond when "Control Home Assistant " set to "assist"

1 messages Β· Page 1 of 1 (latest)

icy forge
#

I'v ebeen trying to get a local Ollama server (v0.5.11, which is working fine on it's own with various models) connected to a local HA server (v2025.2.3) on the same local network.
I've added the Ollama integration and chosen a model, and when I set "Control Home Assistant" to "no control", it works. My assist conversations go straight to Ollama and I get the answers back.
But when I set it to "assist" instead, the request gets sent and it usually just times out.
(although, the last time I ran it I got this long json string instead for some reason...
'''
{"type":"function","function":{"name": "HassTurnOn", "parameters": {"area": "all devices of type media_player", "device_class": "["media_player"]", "domain": "all", "floor": "", "name": ""}}; {"type":"function","function":{"name": "HassLightSet", "parameters": {"area": "Kitchen",
'''

Can someone please point me in the direction of where to troubleshoot this? Is it an Ollama server setting that I've got wrong? I've installed the "Home Assistant Tool" in Ollama for this model, and I thought I configured it right... but did I need to? Or is there a recommended one?
The debugging on the ollama server usualy just shows it stalling at "ollama[14485]: llama_model_load: vocab only - skipping tensors"
But when I try to watch the debug from the HA end, it just shows a spinning icon while it's waiting for a response, I guess...

Any help would be appreciated!
Thanks πŸ™‚
David

odd dust
#

Some questions:

  • what model do you use?
  • how many entities are exposed to Assist?
  • what is the hardware for Ollama?
  • how fast model responds without control?
icy forge
#

πŸ™‚
I'm just testing with the llama3.2:1b model. It responds quite quickly (3-5 seconds) within the Ollama interface, and also when reached via HA-assist with 'no control' selected. I've also tried the llama3.2:latest model (3b) and the results are slower (10 sec?) but consistently successful when 'no control' is selected. It also doesn't work when the 'assist' is turned on.
The hardware isn't too great: 8GB ram on 6cores, but the monitoring of that doesn't look like it's maxing out either CPU or RAM.
I just checked and there's 30 entities exposed at the moment. I'll turn them all off but a couple and see if it makes any different.

#

Thanks!

#

some progress, maybe πŸ˜‰
with only one entity exposed, when I enable 'assist' I am now getting json responses back, like this:
{"type":"function","function":"HassTurnOff","parameters":{"area":"kitchen","device_class":["ovens","electric"],"domain":["home", "energy", "efficiency"],"floor":"utility", "name": "boil an egg"}}
to my query "how do I boil an egg" (it's what I've been using to test if Ollama is answering general knowledge queries).

But also sometimes real text responses, but no actual HA actions happen...
Here's a sample of a recent conversation:
'''
how do I boil an egg?
{"type":"function","function":"HassTurnOff","parameters":{"area":"kitchen","device_class":["ovens","electric"],"domain":["home", "energy", "efficiency"],"floor":"utility", "name": "boil an egg"}}

turn off the lights in the den
I can't provide instructions on how to boil an egg. Is there anything else I can help you with?

turn off the lights in the den
{"type":"function", "parameters": {"area": "Den Lights", "domain": "["home", "energy", "efficiency", "utility"]"}}again{"type":"function", "parameters": {"area": "Den Lights", "domain": "["home", "energy", "efficiency", "utility"]"}}

turn off the lights in the den
You can't turn off all devices of a specific type. Would you like to specify an area instead?
'''

Each response comes back in about 8 seconds...
Any ideas?
Thanks!

odd dust
#

The main trouble here is the model performance. I don't think it will be able to work out correctly. 1b and even 7b models aren't capable of providing consistent results with tooling. And your hardware is limited. You need GPU to run that at least somewhat faster. It's easier with chat and "no control" requests - but Assist option makes huge amount of info transferred to model context, and that will make model think too long (and hallucinate even more).

icy forge
#

I'll try to test with a larger model and see if it at least functions. Theoretically, though, should I not at least be getting something consistent back from similar prompts? and what was that json string supposed to be?
Anyway - I appreciate your help. I'll try a big model and see if I can make it responds even slowly.

odd dust
#

You'll hit timeout.
Also put -1 to keep alive field, so model doesn't gets unloaded.

late cloak
#

Might also need to increase context size if you expose allot of scripts and entities

odd dust
icy forge
#

You'll be pleased to know that with the quen2.5:7b model, it functions as expected. The only problem is I have to wait about 45 seconds for the model to process and react :))
So... while I suppose we could call that working, it's not really gonna do what I'd hoped.
Thanks again for the help!

late cloak
#

I've run that model on a 4060ti and it responds in a few seconds. If you want those smart models to work at a reasonable speed GPU is a must unfortunately πŸ˜…

solid snow
#

What models is everyone using, I find anything under 14b parameters is too unreliable

late cloak
#

the models I have seen do the best are qwen2.5:7b and 14b. Quantization is a factor too. In general what I have observed/heard is if you can, run the highest parameters you can without going below q4 quantization. So theoretically qwen2.5:14b at q4 would be a little smarter than qwen2.5:7b at q8. If you can tun qwen2.5:14b at q8 that's probably about the best local model for HA control you can get now, based on the leaderboards. Unless you are super GPU rich and can run 70b models πŸ˜„

#

Also to add to the above, it also depends on how many entities you have exposed. If you have more than 20-30 entities, you will probably need to bump up your context window. For me with over 160 entities, I had to bump my context window to 32768 (from the default of 8192) for the model to not lose it's mind from the context getting overflowed/truncated.

solid snow
#

Thanks for the info.

unkempt ice
# late cloak the models I have seen do the best are qwen2.5:7b and 14b. Quantization is a fac...

I'm using Granite3.1 dense 2b (which is 2.5b in reality) and it's better than Llama3.2:3b on all fronts. But I'm still getting json responses when I try to turn off lights or set colors on them. If I type the simple request to turn off the light it doesn't get passed on to the LLM and acts quickly.
Do you have any suggestions for me? Maybe increasing the context lenght might help? I typed in the prompt to not answer me with HA internal commands but it still does

late cloak
#

These smaller models tend to struggle with this kind of thing, but it could be the context window is too small depending on how many entities and scripts you have exposed

unkempt ice
#

Unfortunately I don't have the firepower for a bigger model

late cloak
#

With that many entities and scripts you probably need to double the context window from the default of 8192, maybe even more depending on how many messages you have it set to remember

#

The model itself may also just struggle with long context

unkempt ice
solid snow
#

Updating the context size helped me a lot. Also I’ve found qwen2.5:7b at Q8 to be pretty good.

unkempt ice
solid snow
#

Basic Ryzen 5 system with an NVIDIA 3060 12gb. I find single commands are fine and reliable. If you ask it to do two things like, turn off this light and turn on the fan. It’s hit and miss.

unkempt ice
raw pollen
late cloak
#

Would want to check your ollama logs, probably with debug on, and see how much it's using. But it's a bit of trial and error. Basically I'd set the context to as high as you can comfortably fit into gpu memory i guess, more context allows for longer memory in conversations πŸ™‚

#

There's not really context size for x entities because entities vary from one environment to the next

#

And can take up different amounts of context depending on how long their names are, what data about them has to be stored, etc

raw pollen
#

Im using qwen2.5:3b for homeassist. I didn’t see a relation between Ram amount and context size, yet. I thought Ram size is fixed to chosen model.

raw pollen
odd dust
late cloak
#

Prompt is the same size regardless of the model

#

HA sends the same data in the prompt, plus whatever custom prompt directives you add

solid snow
#

I find no matter what model it’s pretty unreliable. Hopefully newer models get better

odd dust
solid snow
#

On my 3060 12gb, a 14b model won’t fit if I have kokoro tts and whisper going lol. May have to get another one

odd dust
solid snow
raw pollen
raw pollen
late cloak
#

That might be the template prompt or initial prompt that gets loaded in before the llm is interacted with

#

But that's usually not relatively big compared to the HA prompt

raw pollen
#

Yes, template is what I meant.

#

In my testing (2month ago), the long llama3.1 template was interpret at each interaction.
OR the template is too complicated for the llama3.1:3B version.
One way or another: the stuff (token) thrown into the model can differ a lot depending on the model.
I can send screenshots tomorrow if someone want to see.

#

I did have trouble with many models for HA control and ended up with qwen.
I did not try to apply the qwen template on another model, maybe that would do the trick. πŸ€·β€β™‚οΈ
Or my model qantization,size,…/hardware simply is too bad of course.

median estuary
#

Hi. Does it make sense to test some LLM using CPU, yhen figure how much VRAM to chose the right GPU.
I mean en output should be the same! no? ( πŸ™‚ )
On my side i'm always falling back to ollama3.2 for the "overall":

  • multi language ( en + fr)
  • output quality
  • Time and space accuracy ( current time at xxx including DST and timezone)
  • global knowledge when answerring cooking o
odd dust
unkempt ice
unkempt ice
unkempt ice
unkempt ice
unkempt ice
raw pollen
#

Yes. My assumption is that the template works better with HA tools at small models.
But im no expert.

odd dust
unkempt ice
odd dust