I've been trying to get a voice assistant based on llama3.1 to work but I can't get it to control any of my devices. I have HA running on a NAS and an ollama server running on a rig with RTX3090. The model is running on <IP>:11434 and when assist is ticked off I get a response I just the way I expect from a conversation agent. However when I turn assist on the speech reponse I get is something like the following.
Query: "I'd like to watch tv"
response: {"name": "MediaTurnOn", "parameters": {"area": "Woonkamer, Living room", "domain": ["media_player"], "name": "TV Woonkamer"}}.
It clearly has information about the entities I exposed but I expect a response more in the lines of: "I've turned on the TV for you" and it then actually happening.
After looking in the debug log I see that the speech indeed has the mediaplayer turn on command but the data target is empty. Is this correct behaviour and am I missing something obvious?
Raw debug response is attached to the post.
I've already tried the following:
I picked the llama3.1 model because it is the one that is suggested on the HA website but I have similar behaviour with other models that support tools.
I've reduced the amount of exposed entities to 25 but it makes no difference. Even with 100 entities exposed the response I get is snappy and from the information I do get back it seems to pick a correct action (or at least the right intent to said action). I've also tried to increase the context window size but that made no difference either.
I've also removed my custom sentences and tried with prefer local processing off and on. The local processing seems to work but I've turned it off for debugging purposes right now.
Hopefully someone can point me in the correct direction. I feel like I'm 95% of the way there.