I run ollama 3.2 in a separate VM (same box as HAOS). Running on ITX with 64G ram and AMD 4600G CPU.
When I run it from inside the VM, it's quite responsive and I get a reply almost immediately.
However, when run via HA assist, answer is taking at least 30 seconds and often I get an error "Timeout running pipeline".
What may be the reason for this? Too many exposed entities (~200)? Anything else?
Also sometimes I'm getting nonsense replies, for example I write "hello", and it replies it could not switch off lights.