I run HA in a VM and I have a second VM next to it in which I have some containers running with ollama, whisper, piper and Open-Web UI. When I ask ollama a question via Open-Web UI, all 4 CPUs of the VM hit the roof and a few seconds later I get an answer and CPU usage drops to 0.
In HA I have the assistant pointed to that remote VM via the assistant settings, the configuration, conversation agent, stt and tts. When I now ask the same question via the assistant, I see all 4 CPUs go to the roof, but I never get an answer and CPU remains high until I kick the ollama container or wait a few minutes for "timeout running pipeline" message in HA assistant.
Could it be ollama doesn't know how to find the way back to home assistant?