#Ollama, HA and RAG...

1 messages · Page 1 of 1 (latest)

brazen yoke
#

As I have been working through my VA install, I have noted (as I am sure you all have) that everything local is REALLY slow. After much reading, and comments on these forums - The core problem seems to be context/entities, which are shared with our LLM with every prompt. This uses up memory, is slow, and not scaleable. Retrieval-Augmented Generation (RAG) would seem to be the solution to this problem. I am reading up on it, to see what steps are needed to implement it... But, before going to deep into the howto, I was wondering if anyone had started looking into (or even better already have) implementing such a function?

vague pecan