#reset conversation id
1 messages · Page 1 of 1 (latest)
Do you have the model set to never unload in ollama? Otherwise after a while ollama will unload the model until another request comes in i think unless you configure it not to do that
Conversation id is the memory/context of the present conversation, think by default in ha it will try to remember up to the last 20 messages as long as that can fit in the context window
But after a while of inactivity it gets reset since it's assumed that conversation is done.
Tip: to do it, in Configure section for model config entry set keep_alive to -1
Yeah i think ollama itself has a setting for it too, i always make sure to have that one set to not unload a well 😅
Ollama by default unloads in 5 minutes, at least when model launched from terminal.
I think this is the reason. I did set keep_alive to -1, but after the conversation ID is reset, it does cause the response time to become longer.
You may want to try turning on flash attention and kv quantization if you haven't already, that can speed things up in terms of the context window by reducing memory use, though may need to play with it and see how it affects the models performance.
Thanks, Nick. I will give it a try.
Hi @cloud patio, have you added both of these environment variables for qwen2.5:7b? What is your K/V cache quantization type? For me, I can't add either of them because it causes the LLM response time to increase by more than 10 seconds. I do see a reduction in VRAM usage, nearly by one-third, but it's almost unusable due to the long wait times for results.