#Best and fastest model for Ollama

1 messages · Page 1 of 1 (latest)

digital rapids
#

Yo all, I am using Ollama on a Jetson Orin and its working -adequately-. What is recommended model to use? I get horrible results Llama 3.2 but good results (alibeit slower) from Qwen2.5
Is there a good fast model?

solid pasture
#

There is no other choice, as qwen2.5 is currently the most accurate and fast open-source LLM. It may not provide good answers to common questions, but it is better at controlling devices

digital rapids
#

Thanks .. I was hoping there was a bit more responsive models out in the wild world 😄

timber quail
#

greetings folks. im tagging onto this thread because im running into some ollama issues......i have a dedicated ollama linux server, and it works and its reasonably quick when using the CLI. however, with HA, it is ungodly slow. It will take 45-120 seconds to turn on or off a switch. If i use the built in HA assistant, its nearly instantaneous. i was originally running 3.2, however, with the comments here i created a new VA in HA for 2.5. it didnt really change anything. I can see the server chugging along at 100% cpu for the entire time its trying to turn on a switch.....but I dont feel it should be that intensive. this is all through the chat popup in HA, not even through voice to text just yet.

hybrid compass
#

How many entities are exposed to assist? How many intent scripts, automations, scripts, etc do you have? These all add to the context going to the llm, and the larger the context the more work the llm has to do in the neural network to process the request, which would explain the longer time.

formal lance
#

Yup. @timber quail if you put it to "No control" in configuration, you will see it working well. But no exposed entities and stuff.

timber quail
#

ok i pruned out the unnecessary devices, down to 19 now....seems to be picking up on things a bit. it was 30-40 seconds for the first commandm but gets faster as its used more and more. it was 20 seconds to undo the first turn on, and then 12 seconds to do another turn on, and 10 seconds to undo that....

hybrid compass
#

That's likely due to context caching

timber quail
#

my ollama server is no slouch though...its 16gb ram and 16 core CPU....it responds pretty quick on the CLI... i guess i just expected a little better. but that was with zero experience on it...... i may just stick with the HA VA...it works for my needs generally speaking i think

formal lance
digital rapids
#

I dont want to spend a ton of cash is a spinning PC electricity just in-case I want to talk to it. It seems madness

hybrid compass
#

I've mentioned this in other threads but gpu only uses power while inferencing, and even then it's a quick spike for a second or two before going right back to idle. Most of the time my gpu is sitting at around 5w.

#

Also gpu will always beat the pants off cpu. Even a threadripper can be bested by something as simple as a gtx 1070 or a good gtx 3xxx or 4xxx series gpu.

digital rapids
#

I dont know where you get a computer to idle at 5W.. maybe GPU - but a NUC uses 15 Watts doing nothing

digital rapids
#

I am using a 40 TOPS Jetson Nano and HA takes about 20 seconds to react to voice. Using VOSK (As whisper is not great)

hybrid compass
#

I'm talking gpu itself, not the other components 🙂