Hello - I am just getting into this. But I have Ollama running as a service on a Linux machine. It is running llama3.2:3b.
I have everything connected in home assistant and it is working. However, I have noticed that it takes a long time for responses. This seems to be caused by something (home assistant?) stopping the current running model and then restarting it. I see these ollama runner processes starting and subsequently running on different ports than 11434. It eventually starts to run out of RAM.
Any ideas on why is HA not using the running model and leaving it running? Even if set to forever in the config.