#Best and fastest model for Ollama
1 messages · Page 1 of 1 (latest)
There is no other choice, as qwen2.5 is currently the most accurate and fast open-source LLM. It may not provide good answers to common questions, but it is better at controlling devices
Thanks .. I was hoping there was a bit more responsive models out in the wild world 😄
greetings folks. im tagging onto this thread because im running into some ollama issues......i have a dedicated ollama linux server, and it works and its reasonably quick when using the CLI. however, with HA, it is ungodly slow. It will take 45-120 seconds to turn on or off a switch. If i use the built in HA assistant, its nearly instantaneous. i was originally running 3.2, however, with the comments here i created a new VA in HA for 2.5. it didnt really change anything. I can see the server chugging along at 100% cpu for the entire time its trying to turn on a switch.....but I dont feel it should be that intensive. this is all through the chat popup in HA, not even through voice to text just yet.
How many entities are exposed to assist? How many intent scripts, automations, scripts, etc do you have? These all add to the context going to the llm, and the larger the context the more work the llm has to do in the neural network to process the request, which would explain the longer time.
Yup. @timber quail if you put it to "No control" in configuration, you will see it working well. But no exposed entities and stuff.
So there were 60 devices exposed....all by default I assume. I am in the yolink ecosystem, so my temp monitors and water leak sensors can be pulled from there.....but i feel like 60 "isnt that much."......is that naive?
ok i pruned out the unnecessary devices, down to 19 now....seems to be picking up on things a bit. it was 30-40 seconds for the first commandm but gets faster as its used more and more. it was 20 seconds to undo the first turn on, and then 12 seconds to do another turn on, and 10 seconds to undo that....
That's likely due to context caching
my ollama server is no slouch though...its 16gb ram and 16 core CPU....it responds pretty quick on the CLI... i guess i just expected a little better. but that was with zero experience on it...... i may just stick with the HA VA...it works for my needs generally speaking i think
That's what I do - I left LLM for questions that HA Assist cannot understand, and turned off "Assist" in Ollama. Created bunch of intent scripts instead to meet my needs.
It's still not time to use local LLM for home management, I think. Those models you can run at home even with 24GB 4090, will hallucinate because of quantization, and those not hallucinating require 80+GB of VRAM ...
I dont want to spend a ton of cash is a spinning PC electricity just in-case I want to talk to it. It seems madness
I've mentioned this in other threads but gpu only uses power while inferencing, and even then it's a quick spike for a second or two before going right back to idle. Most of the time my gpu is sitting at around 5w.
Also gpu will always beat the pants off cpu. Even a threadripper can be bested by something as simple as a gtx 1070 or a good gtx 3xxx or 4xxx series gpu.
I dont know where you get a computer to idle at 5W.. maybe GPU - but a NUC uses 15 Watts doing nothing
I am using a 40 TOPS Jetson Nano and HA takes about 20 seconds to react to voice. Using VOSK (As whisper is not great)
I'm talking gpu itself, not the other components 🙂