Hi, I've just started playing with local LLMs and spun up ollama on my Jetson Nano, now trying to integrate it to HA using API. First thing I tried is official Ollama integration, but quickly noticed its deficiencies and slowness to respond: it doesn't use streaming API:
https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-chat-completion
here is the piece of code that does (or does not) that
https://github.com/home-assistant/core/blob/dev/homeassistant/components/ollama/conversation.py#L265
are there any reasons for such choice? are there any options for faster response time?