I have read the documentations and I can't see anything relating to this but when I query through 'Assist' using my local Ollama agent or OpenAI Home Assistant uses an astronomical amount of tokens. For example when asking "What is the date?" it used 7,000 tokens. I have tried to access the logs for this on OpenAI but they aren't available. Likewise when I use Ollama it practically shuts down. Does Home Assistant open 'threads' relating to every single device/entity or something for each request?
#Huge ticket count
1 messages · Page 1 of 1 (latest)
How many entities are exposed to Assist? This can be found on the Voice Assistants page at the bottom of the Assist card.
237
HA recommends a much smaller number of exposed entities for local models (definitely less than 100) - unless you have some really beefy GPUs. For each exposed entity HA sends a lot more info than just the name and the state. Also note that this context is in addition to your model. So if you have a 10GB model and a 12GB card, the context could be adding another 1 - 5 GB and causing slower memory to be used.