#Ollama model recommendations
1 messages · Page 1 of 1 (latest)
I'd give the Qwen models a try. Don't expect too much from small models or your hardware.
I can't remember the specs offhand, but maybe it's time to dust off my old laptop instead...
I tried I think Qwen 2.5 and it was trying to use more memory than llama 3 was
Which tags?
No clue. I didn't see an option to select a tag. I guess "latest". I set everything up from the UI.
See here: https://ollama.com/library
Llama3.2's default is 3b while Qwen2.5's default is 7b.
You can use qwen2.5:3b for example.
I don't see these tags in this drop down on the integration. I have ollana running as an add-on. I figured I could exec into the docker container, but when I run login, in the terminal addon, any command I run was asking me for a password. This is where I got stuck. I was thinking that sshing into it from another computer might help get around this, but I didn't have a chance to try that yet.
I don't use the addon but try to just type the exact model identifier from the website just like my example.
If you really want to fiddle with containers and the OS and such give this a look: https://gist.github.com/Impact123/e9a4a07b184eb393d2ff762e3b1b0a05*
I didn't think just typing it would work 🤣 I guess the drop down is just a list of suggestions then.
I got it to do something, albeit the responses seem to vary even if I give it the same prompt. I guess I just need to work on the prompting or perhaps adjust the template.
Pretty cool though. Thanks for the tip!
I currently use Qwen2.5:14b with 32K context and even that is not always that smart. Llama often had the issue that it said it did x but never did.
Now it won't do anything 🥹 this hardware just ain't cutting it
I installed Ubuntu on my laptop with an i5 processor, 8 cores, 16 GB RAM. No GPU. There's some latency, but it is doing something
I’ve basically given up with a 8GB M1 Mac mini, so I don’t think this is gonna fly. It’s frustrating as the mini running llama:3.1 8B is a fine conversationalist and even helped with some coding stuff, but it’s useless at controlling HA. I think I’d need a model 10 times as big to stand much chance. It’s a shame.
What's the context size you use? Qwen2.5 works better for me than llama.
Normally 32768. I’ll give Qwen a go, thanks. One I haven’t looked at.
I finally switched to running it on my PC that actually has GPU, and the difference is insane in terms of speed. About a minute per request instead of 5-8. I also have found with llama that after a while it stops controlling devices, no matter how hard I press it to do so.
I’ve been pondering a 32GB Mac mini just for running ollama. Seems the most cost effective approach.
I only use llms for conversational response now due to the long times. If it takes longer then 6 seconds to respond to a command..i don't need it.
Everything that I was trying to accomplish I finally just gave up on and wrote out the automations for without it
I set it up just for the experience of working with an LLM not expecting too much. I ended up using it mostly for making old school telephones ring and tell jokes or ghost stories when answered.
It seems that turning lights on and off should be easier the generating me C code, so I’ll continue on with HA+ollama. It’s kinda fun trying to get it working. It’s annoying how no one has the magic bullet for why it works for them though. And in the meantime ollama on my old mini is fantastic for pretty much everything else I’d need it for.
What I hate most is the videos on YouTube showing them working perfectly like clock work and somehow responding instantly
Yup. I can find very little information supporting the idea that HA+ollama is anything other then a total PITA. Compared to openAI at least, which is trivial in comparison. Perhaps these folks with quick responses have a basement full of Mac minis.
I run qwne3:4b as my voice assistant. Works pretty well albeit most of my request are 'prefer handle local' but when it fallsback to qwen it does a good job just slowly. Running an M4 mac mini with 32gb.
it also runs llava-phi3 for image description requests on frigate snapshots which are almost instant
For conversation response I'm using Mistral Nemo 11 or 14b cant remember
Responses are within 1.5seconds but that's without ha control
You’ll be able to run qwen3:32b on that machine. I’m running qwen3:8b on my 8GB Mini. I’d say it’s useable but not quick. 🙂 Still, try a bit bigger. I would I have had more RAM in my mini…
I’m sure I can but its goes much slower.