#Looking for consistent and explainable behavior

1 messages · Page 1 of 1 (latest)

exotic gale
#

I have a fresh HA install with Voice PE sattelite, Ollama and Whisper hosted on a local GPU enabled server. Fully local voice assistant is configured and working.
I pulled a bunch of models such as qwen2.5, llama3.1, mistral-small, of different sizes and quantizations, but all of them are tools/instruct versions as required by HA.
They all generally work as expected, but I'm observing some quirks I can't explain. I'm a newbie to all of this, but have a decent technical background.
I'm hoping to pick some brain here from more experienced folks.
I've currently settled on model llama3.1:8b-instruct-q8_0 as it seems to be a good balance in speed and accuracy for my system.
Larger models are a bit slower, but don't seem to offer any obvious benefit. All my models run fully in GPUs, but larger ones are split between 2 GPUs, which I believe makes it a bit slower.

Question 1. I swear at some point the assistant was often asking follow up questions and was waiting for my next command without having to start with a wake word again, which was very convenient and natural. Somewhere during swapping models and tweaking prompts this behavior changed and now it stops after each command and I have to wake it up again for follow ups. I tried adding this line to the prompt "Wait for follow up commands or questions", but it doesn't do anything. What controls the follow up behavior?

Question 2. I added this line to the default prompt "Be sassy, but brief and to the point", just to add some fun to the conversation. However, I noted sometimes there are some sassy remarks, but other times there isn't and I can't explain why.

My current HA Ollama settings are as follows:
System Prompt:
You are a voice assistant for Home Assistant.
Answer questions about the world truthfully.
Answer in plain text.
Be sassy, but brief and to the point.
Wait for follow up commands or questions.

Context Window = 16384
Max History = 20
Keep alive = -1

I only have 7 entities exposed to assistant.

topaz aurora
# exotic gale I have a fresh HA install with Voice PE sattelite, Ollama and Whisper hosted on ...

to Q1, look at the the debug and see what the message is when its asking you a question and if there is a question mark at the end of the message?

to Q2, when assist is on a lot of models do seem to struggle with keeping their "personality" my experiments with qwen3 over 2.5 (14b/Q4_K_M/32k context) seemed to be a lot better though. although its kinda hacky to get it working currently because of the think stuff. however this issue will be resolved with new options in 2025.7. so waiting a couple of days updating and then trying qwen3 is something i would recommend on this front.

exotic gale
#

How do I enable debug? I asked in another thread and was told to load a forked integration with debug settings, which is above my head. How come Ollama integration doesn't have a UI enabled debug feature?

topaz aurora
#

In the voice assistant settings on the pipeline settings you can go to debug and see the trace of a command and see the pipeline activity

exotic gale
#

Any advice on my prompt content? Any useful tips on how to make it better or more interesting?

topaz aurora
#

i have found that its mostly a matter of trial and error. just gotta play with stuff a bit until you get something you are happy with

exotic gale
#

After updating HA to 2025.7 and Voice PE firmware it started saying the word "parameters" at the beginning of some responses, but not all. The word "parameters" in not listed in the debug session.

exotic gale
#

found this in Ollama integration

topaz aurora
#

interesting, its reporting its process in seperate messages. i thoght VA was only supposed to announce the last one. i havent had a chance so mess too much with it since update yet.

exotic gale
#

I noted that 2025.7 update added debug logging option to Ollama integration, but I can't find how to access the log. Please help a newbie 🥺 . I started playing with qwen3 model per your advice.

topaz aurora
exotic gale
#

actually, qwen3_14b is working perfectly for me so far, at least in the first 10 minutes after setting it up, responds quickly, asks follow up questions, the issue with "parameters" from yesterday is gone too. I'll keep playing with it. Let me know if there are any specific test scenarios I can help with.

topaz aurora
topaz aurora
#

turned out that the modelfile template from ollama pulled models (which needed removing and repulling if had old version) works great. but the template modelfile on huggingface models doesnt work correctly.
this can be gotten around by customising the template and spinning up a custom one that pulls the modelfile you want from hf without the template.

eternal raven
#

I might try my hand at 8b

eternal raven
#

I know disabling the thinking option in ha speeds it up 2 fold easily. which sucks cause i wanted to try that

topaz aurora
eternal raven
#

action: conversation.process metadata: {} data: agent_id: conversation.qwen3_8b text: >- Rephrase the following text and mix it up a little [a person has been spotted on the front yard security camera.] response_variable: response

#

llama3.2 would get too inventive but qwen3 seems to be a little too accurate and not very inventive

topaz aurora
#

yeah i have noticed that with qwen, i get it to add jokes and stuff and it does tend to repeat itself a bit.
for notification generation i have a seperate conversation agent which i set to 0 message history. this helps a little but you do usuaally get very simalar things

meager pond
#

I've been running QWEN3 8B Q8 with great success for a few weeks ( Since release actually)

I get it to play Plex shows on my TVs (By calling a script) and to start music through MA.

IT is also able to call Qwen 2.5VL through scripting. Overall satisfied with it

I DID pimp me prompt tho:

#

I basically ingest a lil intent documentation in each call.

Total Context size is about 11,000

topaz aurora
meager pond
#

The exact Vision model I use is qwen2.5vl:3b-q8_0

topaz aurora
#

cool, even at 3b its works decently?

meager pond
#

I feel like since it's just describing stuff, it doesn't need as much parameters?

#

Vram usage is high tho:

qwen2.5vl:3b-q8_0 21858d9f5230 7.9 GB 100% GPU Forever
qwen3:8b-q8_0 e56358ca25dd 11 GB 100% GPU Forever

topaz aurora
#

interesting

meager pond
#

Here are examples of descriptions

#

from frigate:

#
The sequence of images shows a person walking down a set of stairs in an outdoor setting, likely a residential area. The person is wearing a green t-shirt, gray pants, and a backpack. The individual appears to be moving steadily and purposefully, suggesting they might be heading to or from a specific destination.

The person's actions and movement indicate a routine or familiar activity, possibly related to commuting or running errands. The setting, with its greenery and residential architecture, suggests a calm and peaceful environment. The person's steady pace and focused demeanor imply they are comfortable and familiar with the area.

Given the context, the person might be:

1. **Commuting**: Walking to or from work or school.
2. **Running Errands**: Going to or from a store, post office, or another nearby location.
3. **Walking to a Meeting**: Heading to a scheduled appointment or gathering.
4. **Visiting a Neighbor**: Walking to visit a friend or family member.

The person's intent or behavior is likely to be moving towards a specific destination, possibly related to daily activities or errands. The environment and the person's demeanor suggest a routine and purposeful movement.
#
The sequence of images shows a person walking down a set of stairs. The person is wearing a black dress and carrying a striped bag. The setting appears to be an outdoor area with a brick wall, a metal railing, and some greenery in the background. The person is moving steadily down the stairs, and the camera captures their movement from a fixed position.

Based on the actions and movement observed, the person seems to be in a casual, everyday scenario, possibly heading to or from a destination. The person's steady pace and the way they carry their bag suggest they are comfortable and familiar with the environment. The presence of the metal railing and the brick wall indicates that this is likely a residential or urban area.

Given the context, the person might be going to or from a nearby building, possibly their home or an office. The striped bag suggests they might be carrying personal items or groceries. The overall behavior indicates a routine activity, likely related to daily commuting or errands.
topaz aurora
#

cool

meager pond
topaz aurora
#

ah cool, i used "ollama vision" for my first test but llm vision has been on the list of things to try

meager pond
#

is 'ollama vision' an integration?

topaz aurora
#

yeah its on hacs

meager pond
#

Does not seem to be as active

topaz aurora
#

yeah perhaps not, tbh i didnt spend a whole lot of time on the "project" was just experimenting with something a bit

#

i dont have a huge use for it currently but maybe some stuff for the future...

meager pond
#

For me it's really just a timer/media querier

eternal raven
#

I tried vision for recognizing who is at the camera and gave it a few pics per person but honestly it never got anyone right. Was just trying it out though. I cant really run larger then 8b otherwise the lag doesnt make sense to use it

#

Thats off of one 2060 super with 12gb vram

#

Id suspect to have timely responses with that youd have to have multiple gpus

meager pond
#

It's meant to describe images

#

What you want is something like Compreface