#Local LLM

1 messages · Page 1 of 1 (latest)

fluid osprey
#

Would a machine like this below, be suitable to run a local llm for home assistant?

celest sphinx
#

3060 not the best. It will do, but don't expect miracles 🙂

umbral lance
celest sphinx
umbral lance
#

It sounds like I'm probably not going to be able to shove one of those into a 1L PC 😅

celest sphinx
#

Unfortunately....

That's my problem as well. I'm thinking of buying Jetson Orin Nano Super when it's available.

hasty pasture
#

One thing to think about with the nano super is it's just 60 or so TOPS, and that 8gb of RAM is shared memory, so the OS is gonna use some of that. So you may really only have like 7-6GB available for llm depending on the os.

celest sphinx
tame zodiac
# celest sphinx Exactly. Enough for tinkering, but not enough for using tools on Home Assistant ...

But that should still work if you only ask "enclosed" questions without using tools, right? Like if I feed in all my sensor data in my prompt it would still generate a usable output... just that it cant retrieve infos or perform actions on its own.

Really hope a consumer friendly machine will be available to do all this. I ordered a NVIDIA Jetson orin Nano Super for local LLM... didn't have tools in mind when I ordered it....

For running HA and local LLM I really wouldnt want to buy a loud and power hungry graphics card :/

sturdy cedar
#

For running a local LLM you have very few choices and they are all power hungry

celest sphinx
celest sphinx
crisp citrus
#

But yes, at idle more power hungry than a nano jetson or other

sturdy cedar
#

True, but a lack of power when needed will ruin the experience

fluid osprey
#

Well I ended up ordering 2 x 4090’s

opaque plover
#

remember that you can also run on both cpu and gpu. i run a granite model with 50/50 split between my 1080 and ryzen 3600. works fine and my desktop is still responsive while interacting with the llm, but the fans do turn on...

hasty pasture
#

What's the t/s on that kind of setup though?

opaque plover
#

right now i am running a too big model. 😄 bartowski/granite-3.0-8b-instruct-GGUF (Q8_0) in lm-studio. my gpu also drives my 49" ultrawide screen and i have a ton of browser tabs open. so not exactly idle. I tried the query "make a list of 100 countries and their currencies in MD table use a column for numbering" as a kind of benchmark and i get "4.45 tok/sec • 1485 tokens • 2.61s to first token • Stop: eosFound"

#

that model seems really good for coding advice though

#

with Q6_K instead I get 8.23 tokens/second

#

and with Q4_K_M i get 26.11 tokens/second

#

btw, granite models are from IBM/Watson. they are licensed under apache 2.0 license. ibm also discloses their training methodology and training data set.