#Local llm pc setup

1 messages · Page 1 of 1 (latest)

gentle pendant
#

Hey guys, I am building a smart home system for the first time, I am planning on making a smart ai assistant via a local llm in combination with home assistant. I'm wondering if I should have the local llm running on its own dedicated machine. Or if I should just run home assistant on the beefy computer I will be using to run the llm. If I'm not running it on the computer, should I get the home assistant green? Versus prices of raspberry pis nowadays. All my extra pis are busy being put to use or will become Wyoming devices (unless someone can tell me of an updated way of having a device be a diy Alexa unit)

lime gust
#

I'd recommend you virtualize and run both on the "beefy" machine.

rain hamlet
#

My LLM is running on my daily driver laptop that is always here and on, unles I take it on a trip or something. But this has a 4060 in it that I don't game with, just edit videos so it's a perfect use in my case.

gentle pendant
lime gust
gentle pendant
#

valid

#

do most local llms need windows anyways?

#

or is linux preferred ?

#

using nvidia hardware*

lime gust
#

Ollama seems to be the most popular/trivial way to run LLMs and it works fine on linux.
I see no point of using windows for running pretty much anything server related.

gentle pendant
lime gust
#

What hardware does your beefy machine have?

gentle pendant
#

havent boughten anything yet, but am planning on putting some old tesla gpus in it to add vram to the system running a 2060

#

getting it up to 50+gb of vram

lime gust
#

From what I heard they need very good/strong case colling and don't support some of the features of certain AI tools.

gentle pendant
#

good to know

rain hamlet
#

And much memory in your GPU...

#

(Or multiple GPU's)

#

70 is bigger that I can use at a decent speed with mine.

gentle pendant
#

do you think its too lofty to attempt to make a "jarvis"? it seems like home assistant as added lots of apis and support for llms to do things

rain hamlet
#

If I go beyond a 9b model I have about 2 words per second on reply.

#

My GPU runs out a ram I think, it runs but really slow with a 14b kinda model.

gentle pendant
#

your using ram and not vram though? i plan on using all vram

rain hamlet
#

That chip in on the laptop board, I don't know about the memory in reality.

lime gust
#

What do you use to run your model?

rain hamlet
#

I have Ollama running native, no vm.
I couldn't figure out the GPU passthru stuff and I finally said here's no point, why torture myself and it just sits there and runs in my Debian 12.

lime gust
#

Then check ollama ps. It tells you the memory usage and where/how it's split among the RAM/GPU.
Try to enable flash attention and KV cache to save some. I'd probably recommend no higher than a 8B model for your GPU.

rain hamlet
#

NAME ID SIZE PROCESSOR UNTIL
llama3.2:latest a80c4f17acd5 4.0 GB 100% GPU 4 minutes from now

lime gust
#

This should run relatively fast.

rain hamlet
#

I think It's a 9b and it does well.

lime gust
#

3b actually.

rain hamlet
#

I tried one of the bigger ones and not usable realtime.

lime gust
#

I always use the proper name so I can easily tell. I.e

ollama run --verbose llama3.2:3b-instruct-q4_K_M
#

Check ollama ps again when running a larger one.