#Ollama model recommendations

1 messages · Page 1 of 1 (latest)

dire granite
#

I want to play around with Ollama. The best piece of hardware I have is the same RPi I'm running Home Assistant on. It's an RPi 5, 4 GB RAM (of which I currently use about 1.5).

Does anyone know of an ollama model that supports Tools that has a small enough memory footprint for me to use?

kindred stag
#

I'd give the Qwen models a try. Don't expect too much from small models or your hardware.

dire granite
#

I can't remember the specs offhand, but maybe it's time to dust off my old laptop instead...

#

I tried I think Qwen 2.5 and it was trying to use more memory than llama 3 was

kindred stag
#

Which tags?

dire granite
#

No clue. I didn't see an option to select a tag. I guess "latest". I set everything up from the UI.

kindred stag
dire granite
#

I don't see these tags in this drop down on the integration. I have ollana running as an add-on. I figured I could exec into the docker container, but when I run login, in the terminal addon, any command I run was asking me for a password. This is where I got stuck. I was thinking that sshing into it from another computer might help get around this, but I didn't have a chance to try that yet.

kindred stag
#

I don't use the addon but try to just type the exact model identifier from the website just like my example.

dire granite
#

I didn't think just typing it would work 🤣 I guess the drop down is just a list of suggestions then.

#

I got it to do something, albeit the responses seem to vary even if I give it the same prompt. I guess I just need to work on the prompting or perhaps adjust the template.

Pretty cool though. Thanks for the tip!

kindred stag
#

I currently use Qwen2.5:14b with 32K context and even that is not always that smart. Llama often had the issue that it said it did x but never did.

dire granite
#

Now it won't do anything 🥹 this hardware just ain't cutting it

dire granite
#

I installed Ubuntu on my laptop with an i5 processor, 8 cores, 16 GB RAM. No GPU. There's some latency, but it is doing something

vestal grotto
#

I’ve basically given up with a 8GB M1 Mac mini, so I don’t think this is gonna fly. It’s frustrating as the mini running llama:3.1 8B is a fine conversationalist and even helped with some coding stuff, but it’s useless at controlling HA. I think I’d need a model 10 times as big to stand much chance. It’s a shame.

kindred stag
#

What's the context size you use? Qwen2.5 works better for me than llama.

vestal grotto
#

Normally 32768. I’ll give Qwen a go, thanks. One I haven’t looked at.

dire granite
#

I finally switched to running it on my PC that actually has GPU, and the difference is insane in terms of speed. About a minute per request instead of 5-8. I also have found with llama that after a while it stops controlling devices, no matter how hard I press it to do so.

vestal grotto
#

I’ve been pondering a 32GB Mac mini just for running ollama. Seems the most cost effective approach.

fickle parrot
#

I only use llms for conversational response now due to the long times. If it takes longer then 6 seconds to respond to a command..i don't need it.

dire granite
#

Everything that I was trying to accomplish I finally just gave up on and wrote out the automations for without it

shut moon
vestal grotto
#

It seems that turning lights on and off should be easier the generating me C code, so I’ll continue on with HA+ollama. It’s kinda fun trying to get it working. It’s annoying how no one has the magic bullet for why it works for them though. And in the meantime ollama on my old mini is fantastic for pretty much everything else I’d need it for.

fickle parrot
#

What I hate most is the videos on YouTube showing them working perfectly like clock work and somehow responding instantly

vestal grotto
#

Yup. I can find very little information supporting the idea that HA+ollama is anything other then a total PITA. Compared to openAI at least, which is trivial in comparison. Perhaps these folks with quick responses have a basement full of Mac minis.

fallow slate
#

I run qwne3:4b as my voice assistant. Works pretty well albeit most of my request are 'prefer handle local' but when it fallsback to qwen it does a good job just slowly. Running an M4 mac mini with 32gb.

#

it also runs llava-phi3 for image description requests on frigate snapshots which are almost instant

fickle parrot
#

For conversation response I'm using Mistral Nemo 11 or 14b cant remember

#

Responses are within 1.5seconds but that's without ha control

vestal grotto
fallow slate
#

I’m sure I can but its goes much slower.