#Ollama Help!

1 messages · Page 1 of 1 (latest)

slim lodge
#

I have been trying for a few days now to configure Ollama in Home Assistant. I have done everything according to the documentation. When I go to configure and enter the URL no matter what I try it won’t work. I have no clue what I’m doing wrong as I have read the documentation, watched videos, and read the documentation some more. I just can’t figure out what’s wrong. Home assistant is on on a HA Green I am running Ollama in a docker container and have open we up set up as well. Can someone help me?

eager cradle
#

Did you check that Ollama API is accessible via HTTP? You can do it with example cURL from Ollama setup page.

slim lodge
old fulcrum
#

Will also throw it out there, if you have ollama on a separate VM, you need to allow access from 0.0.0.0 I had this when I first set it up (Proxmox LXC running ollama, separate device running HA) and it was a PITA to figure out the first time around.

Added this to /etc/systemd/system/ollama.service and it ran fine after.

[Service]
Environment="OLLAMA_HOST=0.0.0.0"

Probably setting OLLAMA_HOST in your docker config will do the trick

slim lodge
eager cradle
#

yes

#

But not localhost, try it from other computer, from terminal, using LAN IP address of Ollama PC

slim lodge
#

I can confirm that I can access it from anotther computer using the IP address and proper port of the machine the server is on

eager cradle
#

If it gives you back correct response from model, then you can connect it to HA.

thick bison
#

if you just enter in the ip address:port in a browser, it'll give you back a message to indicate connectivity

#

would be a simple page that has the message "Ollama is running"

#

I'd also make sure the HA server and the ollama server are on the same subnet, or are on subnets that are allowed to talk to eachother. 🙂

ancient token
#

Install Namap and scan all port numbers on the Olama server to see if 11434 is open. I just didn't add Environment="OLLAMA_HOST=0.0.0". Resulting in inability to connect. Normal after adding

slim lodge
#

guess I forgot to mention I have ollama running on windows

#

Well I think it was a simple user error. I think I forgot the / at the end of the url when configuring in home assistant

#

Thanks yall for the help

vapid sparrow
#

Hi Guys,
I setup Ollama integration with latest 3.2 1B and 3B models running on a Pi 4.
Results are really long to come up, like 30 seconds.
Do you have also this kind of performance ?

eager cradle
vapid sparrow
#

I was just being curious, no real expectations 🙂

#

I still hope we could have one day local model running without requiring a huge computing device like a graphic card or a powerfull cpu.

eager cradle
#

No we can't. Unfortunately. I want it too - but with current state of "AI", you need a lot of fast memory...

thick bison
#

Yeah it's not even just the ram, it's the parallel vector/floating point calculators needed. a GPU has tens of thousands of processors that can run these calculations, where as even the fastest threadripper CPU can't hope to keep up. For now it seems GPU is the name of the game for the consumer. On the enterprise level there are custom processors (the tensor processors Google uses, some custom ones like Groq has, and of course the entirely new idea of neromorphic processors :D)

eager cradle
dusty rune
#

It's not cheap, but it's not as bad as you say 🙂

#
  • with 4x less tensor cores (still this is a big number)
  • 5,5x slower (same as above)
  • 5,5x consume less power
eager cradle
long ibex
#

I'm not sure I'd bother at all Nic, Im trying to use it at the moment and although it's super fast (as I'm using a 16GB 4060 Ti) Home Assistant is beyond useless with it.

#

It works about 2% of the time even with the simplest of commands.

#

It doesn't matter what I ask it to do, the chance of it working is so incredibly low.

long ibex
eager cradle
long ibex
#

The videos and everything made me think now that the "year of the voice" (or whatever it was called) was done, it was all good now and a fair replacement for Alexa, when in reality it feels like the whole thing is just... unusable.

#

Am I missing something to make this not garbage?

#

The AI seems to understand what I'm asking, but something's going wrong with the tool call it looks like.

eager cradle
# long ibex The videos and everything made me think now that the "year of the voice" (or wha...

Year of the Voice was dedicated to kick-off the voice integration, not completely replace Alexa and Google. That would be pretty ambitious thing, wouldn't it? Multi-million budgets for decade versus open-source project for 1 year.
The LLM integration was done in response to custom integrations and very high user's demand. But Paulus was telling it from scratch: it will take a lot of time for LLM+HA to work good.
First, all models are different, and interacting with them will differ too. Second, implementing Tools feature is on model side, not HA side. And lastly, based on chaotic nature of text generators, there's simply no way to guarantee 100% working LLM-HA communication. Model can just start hallucinating on you every moment.
Not to mention the restrictions on operation price, of course 🙂

ancient token
fluid dock
#

Not sure if I should open another thread, but I think someone might know the answer. @eager cradle maybe?
Using Assist in browser behaves different from using Assist in iOS App.
Specifically the prompt that ollama receives is 3-4x larger (loaded with crap).
= much longer response time

eager cradle
fluid dock
#

Thanks for your response ☺️
It’s all the same settings.
Wyoming faster-whisper sst -> ollama -> Wyoming piper tts
I‘m currently pulling out my Android tablet and will test it there, too.
I do debugging/logging directly watching the ollama console logs.

#

With Android it’s identical:

  • Browser (PC) 1m10s
  • iOS App 3m30s
  • android 3m30s
#

The reason is most definitely that the prompt that ollama recieves from

  • the apps is 1272 token
  • the browser is 620 token
eager cradle
#

Yeah that's what i mean - probably in one case you're sending more context to the model. It might depend on where you're using it from (but unsure)

fluid dock
#

Ollama always gets the input from HA regardless from where I trigger Assist.
That’s what’s makes me wondering 🤷‍♂️

#

Do you have a working setup
(SST -> ollama -> TTS)?
Which LLM model do you use?

eager cradle
#

But I use it only for general info, no Assist in settings.

fluid dock
#

You mean no control of devices / Entity‘s?
I tried many llms, also llama3.2, but device control didn’t really work, even if they are capable of tools and function calls.
BUT I found it works well with ‚qwen2.5‘ because its template seems to understand the tool call functions from HA. Whereas others I tryed don’t.
If you also struggled getting device control to work, this could be worth a try.

eager cradle
#

I tried it with llama3.2, it seems to work okay-ish, but not good enough, and quickly degrades if I add more and more entities.

ancient token
fluid dock
#

So the timings are understandable and ok.
Im only wondering why the app call throws a lot of crap into ollama, which then needs very long to understand the crap.

ancient token
fluid dock
#

The logs are very helpful as you can see the whole prompt that ollama recieves and it shows timings.

#

Also helpful for selecting a working model.

fluid dock
ancient token
#

Yeah, I tried many free and open-source models, and qwen gave me a good experience. Other open-source models are really stupid😂

fluid dock
#

Honestly I found qwen not that great generally, especially it seems to have problems with grammar. For me it only works good for tools / devices.

eager cradle
ancient token
fluid dock
#

Probably true ^^
I remember seeing some Chinese symbols when trying a smaller qwen model.

eager cradle
ancient token