#Recommendations for a mini pc able to run HA with a local AI and Music Assistant

1 messages · Page 1 of 1 (latest)

pure widget
#

I have an Intel NUC 11 N6005 with a 1TB SSD and 8G RAM. I am thinking about upgrading to be able to run a local LLM (Ollama?) and Music Assistant as addons in HA. And I want some room for future projects...

I did some research, but ended up being overwhelmed. I came up with something like this:

Asus NUC 15 Pro (Slim) Intel Core Ultra 7 255H with 2TB SSD and 32G RAM

Any recommendations or opinions?

chrome peak
#

Ish, i'll be direct and honest. Running an LLM requires a GPU. Something that most NUCs are unable to have installed on them due to their small formfactor.

Is running a remote LLM on a online service acceptable for you?

#

I personally run a 3090 for my LLM, while i grabbed it for the large amount of VRAM (24gb) You should be able to get away with 12gb..

A NVIDIA GPU from the 4/5**** serie with 12gb of vram or more would be ideal for a small model along Whisper/Piper on the GPU running along the LLM.

I use Qwen 3 myself and this is my ram usage:

#
Sun Nov 23 15:05:08 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        Off |   00000000:01:00.0 Off |                  N/A |
| 69%   56C    P2            132W /  370W |    9525MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1739      C   python3                                4052MiB |
|    0   N/A  N/A          131117      C   /usr/local/bin/ollama                  5458MiB |
+-----------------------------------------------------------------------------------------+

Using qwen3:4b-instruct-2507-q4_K_M

#

That python3 container is the Speech to text container

#

You COULD get an 8gb GPU and use a smaller Speech-to-text model and an even smaller LLM model.

But i feel like that would bring diminishing returns for the saved cash.

I am using WhiperX with the large-v3 model, which does take more ram than the smaller ones. But those are unilingual english so that comes with a drawback when asking for music with other languages in the title!

novel junco
#

Why is your power consumption so high at idle? You should enable the persistence service.

chrome peak