I have been trying for a few days now to configure Ollama in Home Assistant. I have done everything according to the documentation. When I go to configure and enter the URL no matter what I try it won’t work. I have no clue what I’m doing wrong as I have read the documentation, watched videos, and read the documentation some more. I just can’t figure out what’s wrong. Home assistant is on on a HA Green I am running Ollama in a docker container and have open we up set up as well. Can someone help me?
#Ollama Help!
1 messages · Page 1 of 1 (latest)
Did you check that Ollama API is accessible via HTTP? You can do it with example cURL from Ollama setup page.
I will try this when I am home and let you know what happens
Will also throw it out there, if you have ollama on a separate VM, you need to allow access from 0.0.0.0 I had this when I first set it up (Proxmox LXC running ollama, separate device running HA) and it was a PITA to figure out the first time around.
Added this to /etc/systemd/system/ollama.service and it ran fine after.
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Probably setting OLLAMA_HOST in your docker config will do the trick
something like this:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?"
}'
yes
But not localhost, try it from other computer, from terminal, using LAN IP address of Ollama PC
alright so it gave me back: {"detail":"Method Not Allowed"}curl: (3) URL rejected: Port number was not a decimal number between 0 and 65535
curl: (3) URL rejected: Bad hostname
curl: (3) URL rejected: Port number was not a decimal number between 0 and 65535
curl: (3) URL rejected: Malformed input to a URL function
curl: (3) unmatched close brace/bracket in URL position 1:
I can confirm that I can access it from anotther computer using the IP address and proper port of the machine the server is on
If it gives you back correct response from model, then you can connect it to HA.
if you just enter in the ip address:port in a browser, it'll give you back a message to indicate connectivity
would be a simple page that has the message "Ollama is running"
I'd also make sure the HA server and the ollama server are on the same subnet, or are on subnets that are allowed to talk to eachother. 🙂
Install Namap and scan all port numbers on the Olama server to see if 11434 is open. I just didn't add Environment="OLLAMA_HOST=0.0.0". Resulting in inability to connect. Normal after adding
guess I forgot to mention I have ollama running on windows
Well I think it was a simple user error. I think I forgot the / at the end of the url when configuring in home assistant
Thanks yall for the help
Hi Guys,
I setup Ollama integration with latest 3.2 1B and 3B models running on a Pi 4.
Results are really long to come up, like 30 seconds.
Do you have also this kind of performance ?
What do you expect from it? Raspberry Pi doesn't have enough resources to process LLM. To have it usable in any way, you need Nvidia VRAM - thus, dedicated graphics card.
I was just being curious, no real expectations 🙂
I still hope we could have one day local model running without requiring a huge computing device like a graphic card or a powerfull cpu.
No we can't. Unfortunately. I want it too - but with current state of "AI", you need a lot of fast memory...
Yeah it's not even just the ram, it's the parallel vector/floating point calculators needed. a GPU has tens of thousands of processors that can run these calculations, where as even the fastest threadripper CPU can't hope to keep up. For now it seems GPU is the name of the game for the consumer. On the enterprise level there are custom processors (the tensor processors Google uses, some custom ones like Groq has, and of course the entirely new idea of neromorphic processors :D)
Yup. Nvidia has Jetson Orin, "dedicated for AI", but it's 30x slower than my 3080, with 10x less Tensor cores, and LPDDR5 instead of GDDR6. Shame.
I'm desperately waiting for AMD to come out with some good solution to AI, breaking Nvidia hegemony. Without competition, Nvidia won't give us affordable AI hardware for reasonable proce.
Comparing RTX 3080 with Jetson AGX Orin 64 GB: technical specs, games and benchmarks.
It's not cheap, but it's not as bad as you say 🙂
- with 4x less tensor cores (still this is a big number)
- 5,5x slower (same as above)
- 5,5x consume less power
Right, sorry, there's different kinds of Orin. I was looking on 8GB one, Orin Nano.
I'm not sure I'd bother at all Nic, Im trying to use it at the moment and although it's super fast (as I'm using a 16GB 4060 Ti) Home Assistant is beyond useless with it.
It works about 2% of the time even with the simplest of commands.
It doesn't matter what I ask it to do, the chance of it working is so incredibly low.
Yeah the Nano is only 40 Tops so pretty low. And still nearly £500 which is insane for that.
Yup, pretty useless. GPT-4O-MINI is better, but still not enough to use it daily.
I prefer building my own "dumb" assist - the only thing I lack now is the way to ask follow-up questions for some additional information (and also the things Alexa Actionable Notifications could do, like asking a question and awaiting for string/number answer).
The videos and everything made me think now that the "year of the voice" (or whatever it was called) was done, it was all good now and a fair replacement for Alexa, when in reality it feels like the whole thing is just... unusable.
Am I missing something to make this not garbage?
The AI seems to understand what I'm asking, but something's going wrong with the tool call it looks like.
Year of the Voice was dedicated to kick-off the voice integration, not completely replace Alexa and Google. That would be pretty ambitious thing, wouldn't it? Multi-million budgets for decade versus open-source project for 1 year.
The LLM integration was done in response to custom integrations and very high user's demand. But Paulus was telling it from scratch: it will take a lot of time for LLM+HA to work good.
First, all models are different, and interacting with them will differ too. Second, implementing Tools feature is on model side, not HA side. And lastly, based on chaotic nature of text generators, there's simply no way to guarantee 100% working LLM-HA communication. Model can just start hallucinating on you every moment.
Not to mention the restrictions on operation price, of course 🙂
Not sure if I should open another thread, but I think someone might know the answer. @eager cradle maybe?
Using Assist in browser behaves different from using Assist in iOS App.
Specifically the prompt that ollama receives is 3-4x larger (loaded with crap).
= much longer response time
Huh, is it the same pipeline?
Probably some bug with context data. Can't say, don't use iOS. Will check with Android app later today.
Where do you use Assist in each case? Launching debug, or from Dev tools, or...?
Thanks for your response ☺️
It’s all the same settings.
Wyoming faster-whisper sst -> ollama -> Wyoming piper tts
I‘m currently pulling out my Android tablet and will test it there, too.
I do debugging/logging directly watching the ollama console logs.
With Android it’s identical:
- Browser (PC) 1m10s
- iOS App 3m30s
- android 3m30s
The reason is most definitely that the prompt that ollama recieves from
- the apps is 1272 token
- the browser is 620 token
Yeah that's what i mean - probably in one case you're sending more context to the model. It might depend on where you're using it from (but unsure)
Ollama always gets the input from HA regardless from where I trigger Assist.
That’s what’s makes me wondering 🤷♂️
Do you have a working setup
(SST -> ollama -> TTS)?
Which LLM model do you use?
I do, I use llama3.2
But I use it only for general info, no Assist in settings.
You mean no control of devices / Entity‘s?
I tried many llms, also llama3.2, but device control didn’t really work, even if they are capable of tools and function calls.
BUT I found it works well with ‚qwen2.5‘ because its template seems to understand the tool call functions from HA. Whereas others I tryed don’t.
If you also struggled getting device control to work, this could be worth a try.
No, I just don't trust it that much. I have very extensive Assist with dozens of custom intent scripts, and use LLM only for general answers, and for getting info from user input. 🙂
I tried it with llama3.2, it seems to work okay-ish, but not good enough, and quickly degrades if I add more and more entities.
I added in the prompt: Quick device control
This has some effect on me
I am also using qwen2.5-7b
Thanks for sharing.
My long response times above are because i run the llm in ram, not grafic card.
The timings above (1min/3min) are only the first call because ollama has to understand the devices (RAG).
The following calls in the same session take 7sec.
So the timings are understandable and ok.
Im only wondering why the app call throws a lot of crap into ollama, which then needs very long to understand the crap.
Oh, I use it on GPU. The browser dialog box has the fastest typing speed, followed by the HA mobile app (with little difference between typing and voice), and the slowest is esp32. I don't know what caused this. I also did not check the Olama logs. But it seems that the situation is the same as yours
The logs are very helpful as you can see the whole prompt that ollama recieves and it shows timings.
Also helpful for selecting a working model.
Why did you choose qwen? Where other models buggy or slow?
Yeah, I tried many free and open-source models, and qwen gave me a good experience. Other open-source models are really stupid😂
Honestly I found qwen not that great generally, especially it seems to have problems with grammar. For me it only works good for tools / devices.
The slowest ti.e for ESP32 has something to do with the fact that you're talking with satellite, which has area assigned. This somehow should be added to the context, I guess.
Perhaps it's because we all come from China😁 . It may be more friendly to me. When I use Llama3.2 for Chinese communication, Llama3.2 occasionally responds with a mixture of Chinese and English. Various strange grammars.But chatgpt won't, it's very powerful.😂
Probably true ^^
I remember seeing some Chinese symbols when trying a smaller qwen model.
Comparing llama3.2 and GPT-4o (i think you refer to it, right?) makes little sense. Former has 3.21B parameters and runs on crap. 🙂 4o-mini has 8B, and is derived from GPT-4, having 1.8 TRILLION params. It runs on supercomputers with thousands of Tesla GPUs.
I mean chatgpt has enough parameters to make it less prone to hallucinations. They are not on the same level