#Is it possible to run a llm on a nuc with 16 gb ddr5 and a n150 cpu?

1 messages · Page 1 of 1 (latest)

languid rover
#

so i want to run a local llm because i want to keep my privacy but i read that running a llm is not possible because its resource intensive but can i do it? any suggestions?

valid radish
#

I'm running small LLM with no control in fallback mode and for some manual usage in automations. It's fine. With control it will be painfully slow, if working at all.

languid rover
#

im not understanding could you simplify?

charred lodge
coarse quarry
#

If you have 1 channel, as specified on the Intel website, the throughput will be around 30 GB/s.
You can try running 4B models, but the result will still be less than 10 tokens per seconds.
Ollama added support for Gemma 3n yesterday, so you can try the 2B variant.

languid rover
#

So it is possible if I run a small model like the 2b and the 4b model

languid rover
#

what would then be ideal?

charred lodge
#

running on something with a gpu for example

coarse quarry
# languid rover what would then be ideal?

In actuality, since HA has added streaming TTS, you do not require a significant number of tokens per second for conversation with LLM. The 14B model on a Mac mini M4 (120 GB/s, and I like its idle power consumption) can provide sufficient speed for this purpose. Another case is if you want to control your home with llm, then the requirements increase significantly. GPU is the only option for this. Alternatively, you can use cloud solutions.

languid rover
#

okay so either i use a api of some sort or i have to buy a gpu for the machine which will probably consume a lot more energy right?

valid radish
#

(and have feasible response time) 🙂

languid rover
#

ok thanks

#

so either i buy a pc with a gpu which is probably gonna cost a bit in energy and i have the benefit of privacy or i use a api and i dont have to use money but i also dont have privacy

valid radish
#

It's just cheaper in short term.

languid rover
#

doesnt gemeni and chatgpt have the free versions of the api with like 15 tokens a minute

valid radish
languid rover
#

wait how do you do it then?

upper yoke
valid radish
# languid rover wait how do you do it then?

I have some credit in OpenAI. 10 bucks. But since I have gemma3 locally, I don't use that anymore. My LLM isn't controlling, it's fallback for general questions, and is used in some automations directly (for example, to pull artist and song from free speech user request via intent script).