#Is it possible to run a llm on a nuc with 16 gb ddr5 and a n150 cpu?
1 messages · Page 1 of 1 (latest)
I'm running small LLM with no control in fallback mode and for some manual usage in automations. It's fine. With control it will be painfully slow, if working at all.
im not understanding could you simplify?
in theory, yes but it will be super slow.
so for practical purposes... no
If you have 1 channel, as specified on the Intel website, the throughput will be around 30 GB/s.
You can try running 4B models, but the result will still be less than 10 tokens per seconds.
Ollama added support for Gemma 3n yesterday, so you can try the 2B variant.
So it is possible if I run a small model like the 2b and the 4b model
"possible"... yes
what would then be ideal?
running on something with a gpu for example
In actuality, since HA has added streaming TTS, you do not require a significant number of tokens per second for conversation with LLM. The 14B model on a Mac mini M4 (120 GB/s, and I like its idle power consumption) can provide sufficient speed for this purpose. Another case is if you want to control your home with llm, then the requirements increase significantly. GPU is the only option for this. Alternatively, you can use cloud solutions.
okay so either i use a api of some sort or i have to buy a gpu for the machine which will probably consume a lot more energy right?
If you want it to control your HA, then yes.
(and have feasible response time) 🙂
ok thanks
so either i buy a pc with a gpu which is probably gonna cost a bit in energy and i have the benefit of privacy or i use a api and i dont have to use money but i also dont have privacy
You will use money for cloud solutions too. They require subscription or tokens to access API.
It's just cheaper in short term.
doesnt gemeni and chatgpt have the free versions of the api with like 15 tokens a minute
Honestly I have no idea, probably. My trial ended far ago.
wait how do you do it then?
I think gemini does still offer free tokens. OpenAi doesn't, but it's dirt cheap: I use the gpt-4o-mini model as fallback for voice assist and some general questions and I spend $0.27 this month for 484 requests / 2,4k tokens🤷♂️
I have some credit in OpenAI. 10 bucks. But since I have gemma3 locally, I don't use that anymore. My LLM isn't controlling, it's fallback for general questions, and is used in some automations directly (for example, to pull artist and song from free speech user request via intent script).