Does anyone know how to squeeze more performance out of locally hosted small language models? | AI HUB | Page 1

Hey. I'm ruzin, I'm the lead maintanter of StenoAI, a privacy focused AI meeting notetaker. It keeps your data fully private by using locally hosted small language models for transcription and summarisation. https://stenoai.co

Anyways, one of the issues I have really struggled with is getting more performance from my locally hosted deepseek 7b parameter model, I have hit the limits of prompt engineering. Atm, my summaries are getting to 60%, maybe 70% of the quality of an Large language model which is crazy but there are still reliability and accuracy issues!

So if anyone has any ideas or experience, please let me know.

#Does anyone know how to squeeze more performance out of locally hosted small language models?