#Recommended hardware for local AI

1 messages · Page 1 of 1 (latest)

high flame
#

Okay so I am trying again,

Basically, Siri is pissing me off, I will say "Hey Siri, Set bedlamp to 10%" and siri responds "Okay, for what time" (this is the set a timer response, NOT the set a light response) and Siri remains stuck in that.

So....
I want to replace Siri with ideally something local but also low cost, what is the best hardware to get local AI that is fast and accurate?

fallen owl
high flame
#

HA is a Yellow,
I am considering a mac

I have currently a pc with nvidia quadro p2200

fallen owl
# high flame HA is a Yellow, I am considering a mac I have currently a pc with nvidia quadro...

ok so the yellow is not running any AI business because its underpowered.

you can probably run a smaller model on the p2200. maybe qwen3:4b (with q4km quantisation) with ollama and a gpu accelerated whisper too but it might be tight on VRAM with only 5GB available.
how quick this would be is anyone's guess.
getting a newer gpu with more vram would help open up some options.

as for a new using a mac as a AI box. Its probably not the worst idea in the world. the mini's AI-Power/Cost Ratio is actually reasonable. however I dislike apple so would definitely go for a linux box with a nvidia gpu instead...

remember "fast" is going to be down to interpretation. its realistically not going to be as fast as some mainstream products with datacentres powering them.

then use a Voice-PE to interact with it.

high flame
#

For me fast = low response time

fallen owl
high flame
#

Less than a second,

fallen owl
#

you would be looking at potentially multiple top of the line GPU's

#

you could use a smaller model to help with speed but then you would loose accuracy.

#

would you be considering industrial gpu's or just consumer grade?

#

although maybe the new NVIDIA DGX Spark might be a good option at that point. but i havent seen many real world applications of that yet.

lavish sonnet
#

The RTX 5090 will be faster than the Mac Mini M4 Pro and NVIDIA DGX Spark for running LLMs under 32B, but it can not run large models (ie 120B) like the Mac and Spark, But usually the larger LLM, the slower response. There are many AI/LLM comparisons on the Internet. Here is a link to one: https://lmsys.org/blog/2025-10-13-nvidia-dgx-spark/
I am personally running a RTX 3090 that I purchased as a refurbished unit about 1.5 years ago. I am running with the gpt-oss:20b model and I get response times of 1.5 to 5 secs with a STT of <0.2 seconds with Parakeet STT. For faster response times and more accurate responses I have created Sentence trigger automations.

<p>Thanks to NVIDIA’s early access program, we are thrilled to get our hands on the NVIDIA DGX™ Spark. It’s quite an unconventional system, as NVIDIA rarely ...

high flame
#

Is there any way to get HA to stalk talking before the response is fully generated?

I.e using ollama and locally typing the request it starts responding fairly quickly but HA waits for the entire response to generate before the TTS starts