Hey guys, I am building a smart home system for the first time, I am planning on making a smart ai assistant via a local llm in combination with home assistant. I'm wondering if I should have the local llm running on its own dedicated machine. Or if I should just run home assistant on the beefy computer I will be using to run the llm. If I'm not running it on the computer, should I get the home assistant green? Versus prices of raspberry pis nowadays. All my extra pis are busy being put to use or will become Wyoming devices (unless someone can tell me of an updated way of having a device be a diy Alexa unit)
#Local llm pc setup
1 messages · Page 1 of 1 (latest)
I'd recommend you virtualize and run both on the "beefy" machine.
My LLM is running on my daily driver laptop that is always here and on, unles I take it on a trip or something. But this has a 4060 in it that I don't game with, just edit videos so it's a perfect use in my case.
why virtualize vs 2 seperate machines?
Because virtualization is awesome: https://gist.github.com/Impact123/c23c36eafe1672ec056233e450a86ae2
And pis/greens are not: https://gist.github.com/Impact123/6ee130240bdc6a7fed2d5224616544a0
valid
do most local llms need windows anyways?
or is linux preferred ?
using nvidia hardware*
Ollama seems to be the most popular/trivial way to run LLMs and it works fine on linux.
I see no point of using windows for running pretty much anything server related.
i was looking at possibly doing deepseek r1 70b model myself
What hardware does your beefy machine have?
havent boughten anything yet, but am planning on putting some old tesla gpus in it to add vram to the system running a 2060
getting it up to 50+gb of vram
From what I heard they need very good/strong case colling and don't support some of the features of certain AI tools.
good to know
And much memory in your GPU...
(Or multiple GPU's)
70 is bigger that I can use at a decent speed with mine.
do you think its too lofty to attempt to make a "jarvis"? it seems like home assistant as added lots of apis and support for llms to do things
If I go beyond a 9b model I have about 2 words per second on reply.
My GPU runs out a ram I think, it runs but really slow with a 14b kinda model.
your using ram and not vram though? i plan on using all vram
That chip in on the laptop board, I don't know about the memory in reality.
What do you use to run your model?
I have Ollama running native, no vm.
I couldn't figure out the GPU passthru stuff and I finally said here's no point, why torture myself and it just sits there and runs in my Debian 12.
Then check ollama ps. It tells you the memory usage and where/how it's split among the RAM/GPU.
Try to enable flash attention and KV cache to save some. I'd probably recommend no higher than a 8B model for your GPU.
NAME ID SIZE PROCESSOR UNTIL
llama3.2:latest a80c4f17acd5 4.0 GB 100% GPU 4 minutes from now
This should run relatively fast.
I think It's a 9b and it does well.
3b actually.
I tried one of the bigger ones and not usable realtime.