#hardware

1 messages · Page 3 of 1

proper turret
#

yes it takes around 6 hours to a full day

thick python
#

oh dang, okay so ill deffo be waiting a bit.

#

im trying to oauth now i have ai pro, didnt know about gravity.

proper turret
thick python
#

i really only need a good model for brain, code. ill prolly use flash lite for other small things, getting up from 1gb ram to 24gb ram is whats going to help the most i think lol

proper turret
#

If you have a proxy with 6x google pro accounts like this, you could run claude opus 4.5 pretty much forever, and use gemini 3 pro for general stuff

rigid solstice
#

Has anyone run a model on the Blackwell chip set yet?

shell nymph
#

It looks like a NAS could definitely do a minimal instance of the gateway and run some basic agents. But if you start doing advanced tools or need sandboxing you'll need more than what a smaller Synology NAS provides. I had Opus 4.5 review the openclaw repo and build a "suggested hardware requirements" if you want to take a look

https://gist.github.com/sdolgin/bc15d2844268e8e55459c94cb54b3799

#

Minimal (headless gateway) 1 vCPU 512MB
Standard (gateway + channels) 1-2 vCPU 1-2GB
Production (sandboxing + browser) 2-4 vCPU 4-8GB
Heavy (multi-agent + media) 4+ vCPU 8-16GB

shell kindle
shell nymph
thick python
craggy ferry
#

just go all the way and get a 512g m3. with ram prices the way they are it's kind of a bargain

eternal pine
#

hilariously bad advice

#

🙂

#

D

#

and i dont suffer from fomo or other market pressures. I need 64gigs, would like 128. Woudln't buy more than that.

#

but i can udnerstand the sentiment

craggy ferry
#

why not just run kimi2.5 3-bit

eternal pine
#

asus selling a blackwell platofrm for 3k at 128gigs, so its really between that and the mac mini m4 pro 64 or 128

craggy ferry
#

I mean yeah you can probably get away with a deeper quant on a smaller model and be fine

eternal pine
#

im good with minimax m2.1 3bit 🙂

#

im just looking for a way to imrpove latency on responses, as strix halo sucks when the context window is larger than 8k

craggy ferry
#

does the m4 add anything to the performance characteristics? inference is memory bw bound right?

eternal pine
#

yes, but the more important factor is prompt processign speed which is bandwdith bound

#

i guess thats what you said

#

i dont care a ton about overall tokens per as long as its double digits

craggy ferry
#

yeah prompt processing sucks a little bit on my setup too but only when it has to swallow an entire 120k prompt uncached

#

(on my 3090+6000ada rig)

eternal pine
#

i cant imagine you're waiting several minutes

craggy ferry
#

if it started out hot, it does take a while

eternal pine
#

i could deal with a 128k taking 1-2 minutes, not 5-7

#

and im waiting 5-7 on 64k

craggy ferry
#

yeah that's just sad

#

glad i didn't buy two of those things like i was considering

eternal pine
#

yea, i wish i hadn't bought this one

craggy ferry
#

or well i was gonna do like an aspire

eternal pine
#

thats 2400 that could have gone toward a mac studio or dgx spark

craggy ferry
#

but two of that kind of device is basically a 512g studio

#

yeah

#

i mean, it's not, but, girl math

eternal pine
#

hahaha

#

100%

#

strix has its uses, for non agentic stuff like general chat and research its actually really good

#

and i find minimax is awesome

#

but i want it with openclaw

#

it starts off insanely good and and then the lag ultimately kills long running tasks

#

so im using glm flash on a pair of 5060 ti's i have in my homelab cluster

#

that works pretty well, but its clearly not quite as capable as minimax

#

at this point i think its worth spending the money but i dont want a repeat of strix

craggy ferry
#

yeah i'm running glm flash too until llamacpp gets their shit together with qwen-next

#

and the q6 gets me enough spare vram to run 4 slots

#

probably more, but that's ... enough for now

eternal pine
#

well, i appreciate the feedback. im goign to keep mulling over the choice for the next week or so and try and get as much data as i can

#

by next week im ordering something.

craggy ferry
#

i've been at the apple pay screen a double click away from ordering my bad financial decision

#

so many times lol

eternal pine
#

hahaha

#

brothers in arms it seems

#

me too

craggy ferry
#

like, it's financed! if you were paying for claude max 20x, that's like $200 per month you're saving right there!

proper turret
#

Has anyone successfully updated to 2026.2.3?

ashen shuttle
#

Hello,
New to this community. I have a gaming PC running on AMD Ryzen 9 9950X + 128GB RAM + RTX 5090
Looking for some suggestions on how to best setup this to make most of openClaw.
I initially started with a very old dell laptop running the main gateway and gaming system running local llm and embedding model.

true mason
#

hey all - I recently built a Reachy-Mini robot (Hugging Face launched it as open source hw) and thought it would be cool to have him embody my OpenClaw agent 🙂 So we can have face-to-face conversations and he can see and hear me, and I can see some physical emotional cues in his responses. You can check out the code here to experiment if you have one too, or just try in a simulator. https://huggingface.co/spaces/tomrikert/clawbody. Fun if anyone is interested in experimenting with:
• Embodied/robot integrations
• Personality or agent UX
• Voice agent pipelines
I’d love feedback, ideas, or collaboration. I’m curious how people are thinking about connecting OpenClaw to hardware like robots or physical assistants.

warped umbra
#

I'll probably by something off the shelf to start

#

it'd be nice if we could get a dedicated channel for robotics

gusty pelican
#

I feel that buying a cloud server imposes great restrictions on claw, making it impossible to automate x and ins.

zinc valve
#

Robot with OpenClaw agent

gusty pelican
#

Maybe my configuration method is incorrect?

remote moon
#

how are you getting this to work. I keep getting errors saying that it's no longer supported

mortal linden
proper turret
remote moon
proper turret
ashen shuttle
mortal linden
ashen shuttle
mortal linden
# ashen shuttle Thanks i will let you know how it goes.

Awesome. I highly recommend asking Claude how to run ollama on windows and link it to your open claw instance in the docker container. Ive been chatting to Claude about it the last several days, and when I get my Mac Mini M4 later this afternoon that is what I will be doing

ashen shuttle
mortal linden
#

Since docketed containers are lightweight Linux vms, they don’t take up nearly as much space as a full vm

cinder fern
mortal linden
# cinder fern why not run different models under single instance of openclaw?

If he’s wanting to test how individual models work with OpenClaw in parallel at the same time is the only reason i can think of for doing that. Not a bad idea of you want to spend a few days testing which model is best at what task, but ultimately one instance of OpenClaw with multiple models for different agent profiles makes more sense.

cinder fern
mortal linden
cinder fern
# mortal linden What are your recommendations on getting setup with a frontier model. I have don...

well, I am just one degenerate so take it for what its worth...
Is spent 2 days trying to set it up on MiniMax M2.1 because it seemed to be a good cost/quality balance.
Kept running into errors and having major issues (may have not helped that I did this just between the rebranding of Clawdbot/Moltbot/Openclaw..
Gave up, moved to run Kimi as they offered 7 days free on their code model. It ran with significantly less issues and was almost enjoyable (lol..).

#

I used Anthropic models a bit, but its not officially available here and the tokenburn was insane at the beginning...

mortal linden
#

That’s my biggest fear of using a frontier model. I have the $20 plans for both ChatGPT and Claude and I’m preferring Claude a lot more since I started to use it. I don’t want to get banned for using OAuth, and would rather keep as much of my more sensitive personal data out of the hands of these big companies as I can anyways.

cinder fern
#

need to run it for some work stuff, so having decently powerful local models was a requirement.

rancid sentinel
#

0.99$ for the first month as test is perfect

tribal lion
#

hey all, anyone running on GMKtec Mini PC Ryzen 5 3500U 1TB NVMe SSD 16GB DDR4 2.5GbE NIC LAN Office Home Desktop Computers, HTPC Proxmox, HDMI 2.1, USB-C, DP Triple 4K Display Nucbox G10 ? Thinking of setting up 5-8 agents on it.

stable zinc
#

The DP Triple 4K Display capability and and Office Home Desktop functionality will be really important for this

tribal lion
#

hehe

hollow skiff
#

running openclaw on shitty lenovo yoga and the fans wake me up every morning

tribal lion
#

yeah i bet i have old piece of junk here and don't want to run this on it. I rather have something small, no noise and lower power draw.

weak nebula
#

Will 2018 Mac Mini work

hollow skiff
#

a fucking raspberry pi with no fans work im sure 2018 mac mini is gonna work

frozen crescent
#

Hey guys, I was wondering whether it's worth it spending money on a new mini PC or whether I can use my old desktop for running agents on it? The specs are Intel Core i-7-8700, 16GB RAM, NVIDIA Geforce GTX 1070 Ti, and what are the limitations going to be of running on my old PC rather than buying a new one?

scenic notch
tranquil hazel
rocky sleet
#

everyone talking about mac minis what about some other mini pc? any cool viable options?

eternal pine
#

depends on what youre goals are

#

i think its distilled into too simple an equasion, unless your plan is to run cloud models

#

then you could run this on a literal toaster

#

and it wouldn't matter

#

if you want ot run local models, then you need to have a better idea of what you're trying to do, and what models are suitable for the tasks you want to use and /or grow into

steep wedge
mossy quest
#

Very low power draw, works perfectly

fleet light
# dull sigil Beelink is pretty good I hear

Agreed, I bought a Beelink S12 Pro used on Ebay for $150. You're going to be dependent on cloud services for everything except like Whisper for voice transcription, but that's all you need for running the openclaw interface. For a luddite like me I appreciate the security that no matter how badly I fuck this up, it's impossible for it to access my files

craggy ferry
#

yeah an rpi is a pro choice, some people just want to have a nice machine

boreal sluice
# rocky sleet everyone talking about mac minis what about some other mini pc? any cool viable ...

Mac mini is super popular because of the shared VRAM and RAM so for base model you get 16GB of ram and you can use up to 8GB or so for GPU, a comparable GPU will cost you between 280-400e while mac mini all in one will cost you roughly 600e or so, but you have other macmini models that have up to 64GB of ram that is up to 32GB of vram which is super powerful if you want to run models locally.

craggy ferry
#

you can use a lot more than 32 from the 64gb models. maybe a model you can run on the 64g would be worthwhile for this, but 8gb you're not getting anything useful

#

If you want to play around with it, though, it is an option

boreal sluice
#

I guess it's all up to the use case what kind of workflows you want to run or use it for. I am now in process of setting up on my PC and will see how much can actually be done with local LLMs, supposedly my setup should be powerful enough to help me with some of the stuff that I want it to be doing, I will see. Otherwise Cloud models will get cheaper and cheaper every couple months as chips get better and spit more tokens out. So that is good news as well.

#

I'm guessing local LLMs are more for the sake of security and keeping things "offline"

craggy ferry
#

keeping things offline is definitely part of it, but also, tokens are expensive and i don't share your optimism for their future price. it's well known that AI companies are losing money even for tokens that hit their APIs and are charged properly, let alone the extremely subsidized flat rate plans

#

i don't want a surprise $400 bill because my agent got a little wonky in a way which the API absolutely hated

#

if i buy hardware and run the llm on there, then i have kind of locked in a price per token

#

it's slower, sure, but not intolerable

boreal sluice
#

Haha! me neither and I saw that some people are spending a ton of money daily some of these youtubers I guess for the sake of testing and so on plus they can afford it but I saw that someone was averaging like 100$ a day.

craggy ferry
#

i've heard tales of 1.1k days

boreal sluice
craggy ferry
#

i have a hybrid approach, i let my local model run the main threads, but it can run sub agents that call out to opus to do things i want done "right"

#

idle chatting is expensive, because you are shipping huge context around and only incrementally getting replies. but it's actually really easy to keep all your prompts cached if you're the only one using the server

#

after i chat with the local model for a while and try out some stuff, i ask it to prompt opus to do the thing

boreal sluice
#

DeepSeek DeepSeek V3 128k $0 $0.0000 $0.00
Google Gemini 2.0 Flash 1M $0 $0.0000 $0.00

craggy ferry
#

openai will give you a month of codex for free and they 2xed the usage limits

#

that would appear to be the best option right now

boreal sluice
#

Cool

#

Well I still have about a week of work before I will start doing actually anything useful. Setting up this LocalLLM in Windows/WSL Linux configuration was really painful. So first I want to make sure that I harden the whole setup, write whole documentation and install process down in case anything happens I can than set it up again within hours and not days or if I want to replicate.

#

Have you been running it for a long time what kind of setup do you have ?

craggy ferry
#

i was 100% opus for the first few days and now i'm running glm-4.7-flash on a 3090+6000ada

boreal sluice
#

Very nice!

cunning crag
#

hey folks, been using a few different local llms as the main agent but havent found one thats smart and quick enough for orchestration. Any luck from anyone else?

craggy ferry
#

glm-4.7-flash

fleet light
#

Yall have some crazy setups, is any of this possible with just a 5070 Ti?

warm slate
craggy ferry
#

that sounds impossible

#

like, I get that

#

oh wait nm i get like 7x that so yeah that might track

warm slate
#

OSS20B is 6.1 nvidiaOrchestrator 2.0 Qwen3-32B 1.2 (on fresh prompts - will go down I know...) ... but for ewaste it's pretty ok

boreal sluice
warm slate
#

the K/V quantization might give you longer context, but depending on the backend Vulkan/cuda/avx2 on cpu , it might actually slow down during conversion

old notch
#

Anyone have any luck running local models? The only thing that has worked for me to actually get a chat back is to use Ollama model openclaw-qwen3 someone optimized. Maybe better to start using llama.cpp? Running on 16gb unified mac

#

Just for general use when I want to switch off of codex

bronze ermine
#

End of the day, unless you're one a machine with 512GB of ram or more, the local models just aren't worth it. Try to get codex/gpt mini or nano, or gemini 2.5 flash, or other small "few tokens used" models for the busy-work. Otherwise it's too much hassle

shell kindle
sly plover
#

Mac Mini 🥇

boreal sluice
dreamy kettle
#

My take on the best strategy so far is to run a local model for the main agent, so it can keep all the context on you, itself, the workspace etcc...without costing an absolute fortune online. I guess for regular interactions and managing things it doesn't need to be fast, right? Just smart enough. And a lot of that comes from the context it has loaded. Then it can spawn off other agents that are more task-focused... On the other hand, models like Opus 4.5, Kimi K2.5 do seem head and shoulders smarter than anything else when dealing with me, the dumb human with its inherently incoherent communication style. Or maybe that's just because I let them browse the entire .openclaw/ folder tree one time, and they were smarter because of that. What's you guys take on the best way to integrate both local and paid APIs?

#

I feel like I've tasted the future, but right now it's just prohibitively expensive. Studio M3 Ultra with 512GB RAM is looking more "affordable" by the day.

#

aight models aside... my first failure was trying to install it on my 2015 MBP thanks to the legacy OS 12.7.6 which is the highest supported for that machine.

Has anyone managed to do it on a 2015 MBP? Either on the supported OS or by using hacks to upgrade the OS beyond that?

shell kindle
#

I just having the hardest time to setup on a windows box, my powershell and likely my WD is killing every attempt to install via the one-liner

bronze ermine
crude wasp
#

Best local models for running on rtx a4500? 20gb vram

summer bluff
#

There is something everyone is missing. You are talking about hardware to run your ai on but what about hardware your ai runs?

https://www.omi.me/

cinder fern
lyric sparrow
#

Bought a used Dell Optiplex 7060 Micro. Runs openclaw really well. I combine this with an existing coding plan I'm on (Minimax previously => now Kimi Code). Makes for a very affordable setup.

Used Dell Optiplex can be bought for as little as $100, depending on spec and your location.

cinder fern
#

You are aware it runs on a calculator let alone an oversized paperweight

lyric sparrow
#

lol yes

lofty yarrow
#

is 4 GB DDR5 ECC RAM enough for openclaw?

steep wedge
eternal tendon
south tinsel
#

is mac air m2, 8 gb ran fine for openclaw?

fallow light
#

Has any try openclaw with NVDA Nemo Nano 30B? I gonna give this one a try once i can get my claw to talk to me...

kindred wasp
#

I’m figuring out what to run and how to run it. The Mac mini looks like a good option, and so does running a local model. What’s your experience?
Is the base Mac mini (10-core ARM) enough for local models? How good is the support for running local models on the ARM architecture?
I’ve only run local models on my 4090 before, but I haven’t tried it in a long time. I assume 16 GB of RAM won’t be great, since VRAM will be shared from system RAM.
Do you think 24 GB is better, or should I go for 32 GB on the standard M4 mini? Does anyone here have real hands-on experience — and if so, what local models can you actually run on it?
Thanks for the replies, guys — may your Jarvises serve you well :). Tony Stark soon 🙂

versed plaza
#

the real problem is the amount of memory , you can always run quantized models that use less bits for the weights but the quality of the output is not the same... and you need to hope that someone did the quantization you need

#

you can get an idea of what you need launching lmstudio that does an estimation of the memory required and you can browse the models easily

robust sundial
#

Ayo, 🦞
Got it running locally on a Mac studio M2 ultra 128GB.

qwen3-coder:latest 256K

But it struggles with 256k had to scale down to 32k.

craggy ferry
cinder fern
#

M4 Pro will be significantly faster compared to the standard M4 due to memory bandwidth

craggy ferry
#

Yeah glm can at least be coaxed into solving problems itself

hollow berry
#

Can someone please confirm that I'm either dumb or not insane?

I have an RTX 5090, which is the higher end of consumer GPUs right now. Simply put, there is no local model that comes anywhere close to the top 5 cloud models, maybe not even the top 10.

The best I can put on there is Qwen32B, DeepSeek32B, or maybe DeepSeek70B at a 3Quant. Yes the compute is "free", but it's all so much worse than Codex 5.2, or Opus 4.6.

I'm having a very hard time understanding why people are buying $10k Mac studios, and even daisy-chaining multiple $10k mac studios, just to run the ~6th best model.

Am I missing something? Have I not optimized my hardware? Maybe my use case is just different, but I genuinely do not understand why people are spending fortunes for local compute right now.

hollow berry
craggy ferry
# hollow berry Can someone please confirm that I'm either dumb or not insane? I have an RTX 50...

You are correct. Consumer gaming GPUs are not capable of running LLMs that can do this. You simply do not have the vram to run a model big enough with a big enough context to make that work.

The break point is around 64GB. That’s why Macs are so prevalent. Those $10k Studios have 512GB of vram (effectively).

The sixth best model is what, kimi2.5? Minimax? I get reasonable results from glm-4.7-flash. It’s no Opus but it does do some things.

#

Maybe if you had two 5090s.

#

Keep in mind that the value of the tokens we can generate using this hardware will only go up over time. Open models get better all the time, new stuff is constantly released. Yes so will the cloud models but if your benchmark is today’s cloud models then six months from now you can probably see that quality in a local model … if you can run it.

hollow berry
#

Makes complete sense. I thought I was going insane. A Mac studio makes much more sense in this case. Unified memory FTW. Thank you!

gaunt galleon
#

I run local models for privacy reasons and because I'm a pennypincher. local models can do most of things I'd pay for a lot if I used cloud models

winter narwhal
#

I didn't realize a Mac Studio had 512 GB of vram. I don't know of any GPU cards that have that. I'm sure they exist Im just not familiar with them

hollow berry
#

Yeah Apple's decision to go off of Intel and into their own Apple silicon in 2020 was accidentally the best possible move for them

shy mist
# hollow berry Can someone please confirm that I'm either dumb or not insane? I have an RTX 50...

For serious local-model work, the top-end Apple Silicon Mac Studio is ideal. Think M4 Max or if you plan to train your own, go for the M3 Ultra. That gives you high‑bandwidth unified memory, lots of neural‑engine throughput, and double the Thunderbolt lanes.

  1. Chip: M4 Max Chip - 16-core-neural engine, 40-core-GPU, 16-core-CPU). The extra performance cores matter for concurrent LLM runs plus build/test pipelines.
  2. Memory: 64 GB/128GB unified memory. Local models (Kimi K2.5‑class, 70B quantizations, multi-agent inference) chew RAM; unified memory also feeds the GPU.
  3. Internal storage: 2 TB SSD. Keeps model weights, vector indexes, Docker images, and scratch space local without juggling externals. It’s also Apple’s fastest NVMe option (higher configs get more NAND channels).
  4. Networking & ports: 10 Gb Ethernet so we can move data to/from NAS or other nodes quickly. Grab the highest Thunderbolt count so we can hang fast external NVMe enclosures for cold storage or model archives.

Why this config

• Unified memory is the limiter for local LLMs and GPU-heavy coding tasks. 64 GB/128GB lets us run a 30–40B model plus still have headroom for IDEs, Docker, and vector DBs.
• Pro/Max chips have bigger neural engines, more GPU cores, and higher memory bandwidth. Key for multi-agent setups.
• Bigger internal SSD = faster scratch and fewer bottlenecks when you hot-swap models or run multiple fine-tunes.

hollow berry
# shy mist For serious local-model work, the top-end Apple Silicon Mac Studio is ideal. Thi...

Do you think future openclaw setups will be:

  1. 1 device / 1 openclaw install / multiple agents & subagents
  2. 1 device / multiple openclaw installs / multiple agents & subagents
  3. multiple devices / 1 openclaw install / multiple agents & subagents

Trying to figure out if device partitioning will be necessary in the future, or if individual devices will be able to reasonable contain multiple openclaw installs

shy mist
#

1 device / 1 install / multiple agents & subagents - This is probably going to be the standard setup for most people.

Multiple devices / 1 core install / multiple agents & subagents. This is actually very interesting long term.

  • Option 3 becomes normal once people treat OpenClaw like “home AI infrastructure.”
  • Option 2 stays niche for power users and security conscious people.

Partitioning starts to make sense if the ecosystem becomes messy with unsafe third party skills, Agents can execute arbitrary code without strong containment, You are mixing high trust and low trust workflows. That’s a clear blast radius problem. At that point, isolation at the OS level or even separate devices becomes attractive.

#

The limitation won’t be hardware. It’ll be security architecture discipline.

hollow berry
#

I have 4 raspberry pis and I'm desperately looking for a reason to use the other 3 - doesn't seem to make sense at the moment though ha

shy mist
brazen scroll
hollow berry
#

definitely not. don't run a local model on it lol. it's fantastic as mission control, not for compute. slap an anthropic API key on it, ssh in and you're good to go

slate sparrow
#

Who has run local llama successfully? I have both running on Raspberry Pi 5 and either will run separately but when I point open claw at llama locally it sounds like a mini turbine is running with the cooling system.

stuck lotus
#

Hey guys,

I wanna instally and use OpenClaw securely in the best possible way.

My use case: My bot should be able to use browser, scrape web data, access to APIs, etc.
i dont need any local LLMs

I have a few options:

  • Use VPS
  • Buy a Mac Mini
  • Use an old windows pc

I don't mind spending money but don't wanna WASTE it, the data and security needs to be the priority.
Sometimes I'm dealing with moderately sensitive data like Slack channels, emails, etc.

So I just want your recommendation.

I am a little bit techincal, use claude code sometimes but wanna use OpenClaw so it can perform tasks like an employee and an AI assistant.

torpid flower
#

get a raspberry pi

torpid flower
stuck lotus
torpid flower
#

but yeah, i can see how getting a mac mini and installing the desktop app is tempting. Yeah, maybe go for that

stuck lotus
thorny vessel
# fallow light Has any try openclaw with NVDA Nemo Nano 30B? I gonna give this one a try once i...

I was able to get things running with nemotron-3 running in ollama, using around 20GB of VRAM, but the results were underwhelming. I'm running GLM-4.7-flash now, which is performing much better, and with 64k of context using a custom modelfile, I'm using 23GB of VRAM (running on an A2000 and an A4000). The big tip at the moment is that the current 0.15.5 pre-release of ollama is running glm with a lot less VRAM than before - with the previous version I was running out of VRAM at 32k context, but with the pre-release I can run 64k with room to spare.

coral ridge
torpid flower
torpid flower
#

a little one that is made for the Pi

craggy ferry
#

Hey guys is this where I post about how I ordered a mac mini to run my familiar on

I thought they were supposed to be cheap tho

#

Oh of course we don’t do embeds here. Probably for the best

#

Also, it kind of feels like we should have a local models channel

We have a models channel, and a hardware channel, and they’re both like, kind of halfway relevant

boreal sluice
quartz plover
#

I tried another option as hardware for Openclaw -> Run openclaw on Android phone with UserLand. Update: Not stable. Not recommended.

mortal linden
#

Finally got my 32GB Mac Mini M4 in the other day and been setting it up to be as secure as possible and getting OpenClaw going. I now have my bot, Clawtana, getting local LLMs setup for her to run on so I don’t eat through all my Claude Pro Plan usage

distant cedar
#

I have been experimenting with OpenClaw on a GMKtec EVO X2 AI Mini PC
AMD Ryzen AI Max+ 395 3.0GHz Processor; 64GB LPDDR5X-8000 Onboard RAM; 1TB Solid State Drive; AMD Radeon 8060S Graphics. My return window is about to close, so I am looking at my options. The two models that work well on this box are cogito 32b and gpt-oss 20b. gpt has a better response time and might also be better with tools. This machine was $1600. I would prefer to run my models locally, I am trying to decide if i should get the 128GB version of the EVO X2 for $2200 or something else. One something else is a Mac Studio (for more money) or a standard PC with a discrete GPU. Any suggestions?

pale wharf
#

can anyone give me some recommends for what i can do on a Xiaomi Note 14 Pro 4g, 12gb ram, 512gb storage and can add more if necessary

dreamy kettle
tranquil hazel
#

Downpayment for property

timid juniper
#

Hello folks,
Here a developer with Mac M3 Max 64Gb.

1 - I checked around ( Gemini, Chat, previous conversations grepping 'M3' ) and i tested it myself:
The best I mounted locally is qwen2.5:32b-instruct but it's too slow and unreliable for coding tasks.
So I am wondering if folks are buying 1 mac mini only for hype as suggested by @torpid flower ('get a raspberry pi').
Confirm?
[ i could understand multiple macs to share RAM and mount a model.. but just 1 how can suffice for building new skills autonomously? ]

2 - broader topic: someone that develops software has found a satisfactory hardware config to program fully with local models?
It does not need to be Opus level - just reliable enough to focus on creativity.
[ I am actively looking into this, here just asking the opinion of the community ]

oak ivy
tranquil hazel
#

Mac mini setup is so fast and ez lol

#

3 min and antigravity is already running

dreamy kettle
#

I feel like an M1 Mac Mini would be the ideal machine for this stuff. If you already had one and were looking to upgrade it's a great use case for the old one. Took me a few days to install on my old Intel macbook-pro and the fan noise is scaring me, but that's not Openclaw's fault. A raspberry pi is probably sufficient if you're using cloud models. I don't see the point in half-assing it with a Mac Mini, unless you load it up with all the server things.

timid juniper
tranquil hazel
#

it's low power and I can run it 24/7

#

it just made sense

#

got it with 18% discount from amazon germany as well

bleak schooner
#

i got 74.9gb of gpu memory lol

timid juniper
#

thanks for sharing it ForgedAim! even with 75Gb, what model could give you decent performance?

bleak schooner
#

ive used 3 so far but i messed up on boarding

#

so im trying to figurre out best model to refo on boarding

timid juniper
#

Which ones are these 3 models?

bleak schooner
#

qwen/qwen3-vl-4b

#

run fast

#

Ministral 3 14B Reasoning ran fine but seems not as good

#

im a try a bigger model

#

qwen/qwen3-vl-30b

#

is next

#

when i switched models

timid juniper
#

I understand you are trying them for openclaw, hence to let an agent 'click the right buttons' and communicate back and forth.
Not for continuous reliable coding, correct?

bleak schooner
#

it kept repeating

#

i would use it for continuos coding

#

if i dind it out

#

i been using anti gravity

#

and windsurf

tranquil hazel
bleak schooner
#

i got a beast pc

tranquil hazel
bleak schooner
#

Windows 11 Home 64-bit
CPU
Intel Core i9 9900K @ 3.60GHz 38 °C
Coffee Lake 14nm Technology
RAM
128GB Dual-Channel DDR4 @ 1600MHz (16-20-20-38)
Motherboard
Gigabyte Technology Co., Ltd. Z390 AORUS ULTRA-CF (U3E1) 28 °C
Graphics
DELL E2422HN (1920x1080@60Hz)
Intel UHD Graphics 630 (Gigabyte)
3071MB NVIDIA GeForce RTX 2080 Ti (ASUStek Computer Inc) 29 °C
SLI Disabled
Storage
931GB Samsung SSD 970 EVO Plus 1TB (Unknown (SSD))
931GB Samsung SSD 860 EVO 1TB (SATA (SSD)) 25 °C
Optical Drives
No optical disk drives detected
Audio
Realtek High Definition Audio

tranquil hazel
#

128 gb ram is pretty baller lol

bleak schooner
#

1600htz

#

ta its aight

tranquil hazel
#

yeh but

bleak schooner
#

i been running whole time with integrated gpu off

tranquil hazel
#

the mac mini can run 24/7 consuming almost no power

bleak schooner
#

just turned it on and it gave me another 40gb vram

tranquil hazel
#

for EU ppl like me that matters

#

I can't leave my gamer PC on all day lol 😄

bleak schooner
#

i have clawbot on a old dell 'attitude

tranquil hazel
#

1000 watt psu

bleak schooner
#

on wsl in ubuntu

#

ya my pc has over kill psu

#

i had to buy a dual motherboard psu

tranquil hazel
bleak schooner
#

back during covid

#

suppy problems

tranquil hazel
#

haha same also bought mine during covid

#

payed so much for GPU

bleak schooner
#

mine hx1200i

timid juniper
# bleak schooner i been using anti gravity

I understand antigravity - I am using cursor mainly and claude code as peer but the feeling of 'i am burning cash as i am typing' adds too much friction for unleashing creativity

bleak schooner
#

I've coded 600000 lines of code in 3 days for free

timid juniper
#

as far as i see windsurf is subscription-based too, isnt it?

bleak schooner
#

it is

#

but free tier

#

has free access to a model swe till like 3/24/2026

#

it fully coded a no recoil macro control system hence forgedaim

#

lol

#

with an auth server hwid key binding 256 bit encryption with files executed in memory only

#

its crazy what ive been able to do with ai

#

has a complete server client eco like a remote admin tool

#

next llm im downloading is qwen/qwen3-coder-next

tranquil hazel
#

I'm installing openclaw atm via SSH on my mac mini downstairs

dreamy kettle
viscid ether
#

what's a good llm model to use for openclaw, the gateway is on a remote server, my linuxbox. i have openclaw node running on my win11 box with RTX 5080/16gb w/ 64gbram and lmstudio. but not sure which model to use on lmstudio.

autumn grotto
#

Has anyone ran openclaw on an ipad? I did a quick search and found a few apps that looks like openclaw or goclaw.

quaint marlin
#

has anyone figured out how to cluster AMD strix halos together with high networking speeds between nodes? Is this the lowest cost/most performant setup for a large model (>300B parameters)?

mortal linden
quaint marlin
#

or has anyone for a high vram server that doesn't blow a fuse on a 15amp 120v outlet?

honest roost
#

Any local models I could run on my asus rog ally x? Thinking of using it as an alternative to a m4 mini. I also have a mid range 5070/7800 3xd rig i use for gaming but dont know if running a model on it 24/7 will hinder performance over time

vague ridge
#

Whats the experiential difference between tiers of 16/24/48/64gb of unified mac ram?

#

For running local models and agent setups focused on communications/office work

distant cedar
steep wedge
raw shuttle
#

has anyone had any success with clawdbot running your n8n automations, and building front and back end apps / webapps ?

boreal sluice
#

I did some research on what/how I plan to setup and use openclaw for as I am finalising the setup and having a good NVMe plays a crucial role in my own personal setup as I have RTX 3050 and 32GB RAM. So I am planning to add a 2TB gen4 M2 Crucial SSD for offloading that will be needed. If anyone has similar setup I would recommend that you describe in detail your plan and usecase to one of the AIs to see your needs and what makese sense for you but it seems that 2TB NVMe is quite a sweet spot. If anyone has done a similar research or already did setup and actively running their openclaw and can confirm my findings that'd be appreciated.

robust sundial
craggy ferry
#

Spark is terrible for inference tho

#

Idk I think I’d rather have a studio in pretty much every circumstance

#

I doubt I’m gonna be doing enough finetuning to want more compute:memory ratio

quaint marlin
# boreal sluice I did some research on what/how I plan to setup and use openclaw for as I am fin...

I have been watching youtube videos, asking Gemini, and reviewing Reddit non-stop on this, and to me it seems there is only 1 economical way to deploy openclaw, but it requires 11.5k up front....

Basically, openclaw is only useful if the main agent is a huge model (Kimi K2.5 q4 or larger). Which means you need significant vram or unified memory to run at any reasonable rate of token generation.

The only key thing you have to think about is tok/s (intelligence generation speed) per watt or Tokens per Dollar (Tokens per kwh * electricity cost).

The only realistic way to drive down tok/s per watt is to use a Local LLM in something like vLLM or LM Studio or Ollama etc. and do the majority of openclaws operations with the local LLM and have your reviewer be one of the frontier models (GPT 5.3 xHigh or Opus 5.6).

In order to host a great model locally at high tokens per second per watt, you need the 512GB unified memory at 11.5k.

Second best alternative from a cost savings perspective seems to be an AMD Strix Halo (Ryzen AI Maxs 395 128GB vram) for 2.5-4k.

The main issue is the current networking on the strix halos doesn't scale and is limited by networking bandwidth between devices (10-50GB ethernet or InfiniBand), and you need to be very technically savvy to even get it working. To get the same 512GB vram your spending just as much in hardware, way more in engineering time (yours), and tanking your token generation across the nodes. Plus 4 of the 128GB apus are going to pull more energy than 1 of the 512GB studios. Apple really had some foresight on this local LLM stuff.

I don't think anyone outside of Apple is on track to be able to sell the next generation of home AI PCs strictly due to an energy draw issue. AMD and Nvidia vram doesn't scale and continue to work on a 15 amp home outlet.

craggy ferry
#

If you are running it at capacity then congrats and it still beats out like a Spark for T/w, but every minute you let the spark sit idle it’s consuming way more energy than the studio

#

And yeah running a cluster sucks compared to running a single node, even if that single node requires you to manage it via gui for some tasks

boreal sluice
# quaint marlin I have been watching youtube videos, asking Gemini, and reviewing Reddit non-sto...

Thanks some good points there, in regards to your main agent point, this is what got me thinking of hybrid setup as well, if it turns out that the model for main agent is simply so slow that it is unusable with my setup, than ok I will go with having main agent in cloud with API, but I don;t know at this time what too slow to be usable means for me, I don't mind if it is just slower as long as it is smart enough and reasoning well if instead of 15sec it takes 5min or something that's fine for me I guess. But at the moment I've no reasonable benchmarks or so to evaluate. And I do not plan to spend 11.5k or something right now for my first setup. I firstly just want to learn and understand as much as possible. But ofc if it does make sense for the future I will give it a thought as into what sort of HW I want to upgrade to

quaint marlin
#

I think the big thing is to get usualable code or "intelligence" out of openclaw over a longer time period. (not just a quick demo software that doesn't scale). You need local compute. It is too expensive to run through the AI APIs.

And if you are trying to not have buggy spaghetti code over a future millions of lines of code you need increasingly more intelligent models. I am hoping Apple's M5 Ultra is 1024 GB of unified memory which would allow for running unquantized models.

If I think about the future > 1 year. The frontier models are going to be increasingly larger, and the distilled versions of them will make the previous sized models way smarter. Ie. 7T parameter next-gen grok quantized to FP/INT4 would be over a TB which means you realistically would need 2-4 1TB M5 Ultras to run a completely autonomous org (theoretically, may still need some human intervention here and there.)

quaint marlin
#

the strix halo has really good tok/sec generation on models up to 80B

#

From Gemini "Framework Desktop, Beelink GTR9, or GMKTek Evo X2) allow for high-performance AI and gaming, with 128GB RAM enabling extensive local AI model usage. "

boreal sluice
craggy ferry
#

You need just slightly more than 64gb in my experience to get capable models with acceptable context into memory

#

And ideally you want to be able to allocate even more to context because if you can batch requests your throughput goes way up

#

I’m kinda sad I didn’t go for like 96gb ram on my M2 Ultra studio but

wicked hound
#

yeah, you really want a context window somewhere between 128k to 200k, so you don't need to start new sessions constantly or wait for compaction

quaint marlin
#

Memory requirements are only going 1 direction with AI. Up.

craggy ferry
#

Expect Apple’s prices on ram to go up too, they’re not going to sacrifice their margins for our benefit

boreal sluice
#

You think RAM prices will stabilise by Q3/Q4 and start going down or you think they might stay this high for longer ?

spring cloak
#

the only stabilizing they will do is up

lusty musk
#

Hey.. if anyone’s interested, I found a deal for a brand-new Mac mini at Micro Center and got Best Buy to price match it for $399.
That’s a $200 discount off MSRP.
https://x.com/i/status/2020158508767797337

The base version with 16 gb ram and 256 gb ssd

astral gobletBOT
# lusty musk Hey.. if anyone’s interested, I found a deal for a brand-new Mac mini at Micro C...

Thought I'd share this here since it seems like a really good deal for any builder trying to get a BRAND NEW Mac mini.
︀︀
︀︀I just got one myself for only $399 (original price: $599)
︀︀Price matched it at Best Buy with Micro Center's $399 pricing (since it was out of stock there).
︀︀
︀︀You can Google it to check the deal yourself. Seems like a no brainer for anyone looking to get one for themselves.

**💬 1 ❤️ 4 👁️ 80 **

still root
astral gobletBOT
# still root Used, Local Voice AI (using RunAnywhere SDK's : https://github.com/RunanywhereAI...

We got an invite yesterday to meet with @steipete today at @ycombinator - The guy with the COOLEST project on the planet right now!
︀︀
︀︀Everyone is talking about @openclaw and it's all over the social media. So last night my cofounder @ShubhamMal72313 and I made an impulsive decision by buying a new domain and forking the @openclaw and adding the @RunAnywhereAI SDKs to connect the Voice AI on a Raspberry Pi, that keeps listening to you all the time, only wakes up when you say the wake up word (so all audio data stays local - thanks to our sdks).
︀︀
︀︀We are now introducing - OpenClawPi @openclawpi - Your personal AI on your Pi. We are shipping:
︀︀- A pre-configured Raspberry Pi with Voice AI + OpenClaw installed.
︀︀- A usb mic and a speaker.
︀︀- ZERO setup needed - you will receive a login link to configure the pi from the web console and we will help maintain in sync with the OpenClaw and provide you with the latest updates on the Pi.…

heavy swallow
#

I create my own pre-build docker image and .env to handle this. Still have some issue to install skills.

opaque basalt
# quaint marlin I have been watching youtube videos, asking Gemini, and reviewing Reddit non-sto...

I've dug into it and 90% of the reason it needs a frontier models is that instead of feeding the model tools/prompts/etc dynamically, they are dumped as a 4000 token nloat at the beginning of every chat... A coherent agent model such as devistral 2 small with a W4A16 GS64 quant +EoRA is perfectly capable of running something like this with 170k fp16 context via vLLM on 2 3090s at 60t/s but it requires prompting that isn't absolute garbage

rocky sleet
#

not an apple fanboy considering getting something from Beelink, anyone running it on anything from beelink?

severe yoke
#

Running OpenClaw on a Jetson Orin Nano Super. 67 TOPS, 512GB NVMe, 8GB RAM, ~20W. Always-on dedicated box for Telegram, browser automation, proactive alerts. Low power draw is perfect for 24/7. Happy to share config - openclawhardware.dev

severe yoke
# rocky sleet not an apple fanboy considering getting something from Beelink, anyone running i...

Beelinks are solid for the price! I went a different route though — running mine on a Jetson Orin Nano Super. 67 TOPS of AI compute at ~20W, so it barely sips power compared to an x86 mini PC. The GPU acceleration is nice for anything ML-related too.

If you just need a general-purpose box though, Beelink SER series are probably your best bet. The AMD Ryzen ones have good single-thread perf which matters for OpenClaw.

#

Nice find! Mac Mini at $399 is honestly great value for OpenClaw. Only downside is power draw — it pulls 40-60W under load vs ~20W for something like a Jetson. If you're running 24/7 that adds up over a year.

But at $399 with that M4 chip, hard to argue with the raw performance. Good deal.

severe yoke
severe yoke
# opaque basalt I've dug into it and 90% of the reason it needs a frontier models is that instea...

This is a fair point. The system prompt + tool definitions do eat a lot of context on every turn. But honestly that's the trade-off for flexibility — dynamic tool injection would mean the model can't plan ahead with tools it doesn't know about yet.

The real cost optimization is on the model side, not the prompt side. A ~4000 token system prompt costs like $0.01 on Sonnet per turn. The expensive part is when it generates 2000+ tokens of reasoning/tool calls in response.

I've been running a Jetson box 24/7 with Sonnet as default and my daily API cost is about $2-5. The hardware cost becomes negligible pretty fast.

lusty musk
rocky sleet
#

for 400 yea good deal

opaque basalt
astral gobletBOT
severe yoke
# opaque basalt Each turn a list of tool calls set to null is sent to the model. There is no wor...

Yeah the null tool calls are wasteful but the alternative is dynamic tool injection which means the model cant plan multi-step tasks with tools it doesnt know about yet. Honestly for most setups the gateway cost is negligible because inference happens server-side. My Jetson box just runs the gateway at 20W and the actual thinking happens on Anthropic/OpenAI servers. Total daily cost is like $2-5 in API calls.

opaque basalt
#

It can be done, it just takes more effort. It isn't even making real tool calls

frail moon
#

Noob here. How much better do the various upgrades to the Mac mini make openclaw?

M4 -> M4 Pro
16GB -> 24GB -> 32GB -> 48GB

opaque basalt
#

That said I'm literally gluing a audio processor onto a model that already has vision and cot reasoning

tardy plinth
#

Are there any models worth running on a 16GB vram gpu? Even just a sub agent for tools? Anything to reduce cloud api calls.

severe yoke
#

btw for anyone interested in dedicated hardware for OpenClaw, I just set up a promo on my ClawBox site (Jetson Orin Nano pre-configured with OpenClaw). Use code DISCORD10 at checkout for 10% off -> openclawhardware.dev

bronze ermine
old notch
old notch
raven magnet
rocky sleet
#

its so wild to me tbh, in Romania I cant get a mac mini even used on the local facebook marketplace

lusty musk
prisma briar
#

I read september, got 1, like 2 weeks ago

severe yoke
#

@lildrunkensmurf definitely! For 16GB VRAM, try Qwen2.5-Coder-32B (Q4 quant) or DeepSeek-Coder-V2-Lite-16B. Both are solid for tool use and code tasks.

For general reasoning, Llama-3.3-70B-Q3 fits in 16GB and is pretty capable. You could set it as your default model and use cloud models only for complex tasks.

Tip: Use OpenClaw's model switching - keep a local model for routine stuff and opus/gpt for the hard problems. Can easily cut your API bill in half.

reef charm
#

Tried bringing OpenClaw to life on a Late 2014 Mac Mini ... no luck 😅

tiny escarp
#

I wonder why everyone is using mac minis lol

#

if your not running local models

raven magnet
tiny escarp
#

Yeah, with pi prices now adays lol

bronze ermine
lusty musk
#

I got a mac mini m4 because it was $399 brand new on micro center and just so I can run some advanced gui control and do some appley things.
But, raspberry pi works great.
I'm also setting one up on a $5 rpi Zero (8-10 years old) also pretty soon too

astral gobletBOT
# lusty musk My clawdbot on rpi should have wheels very soon also https://x.com/brainstormity...

Found some goodies from many many years ago.
︀︀Time to give him some wheels! 🤛

Quoting brainstormity (@brainstormity)

I gave my clawd bot @openclaw a hand.
︀︀
︀︀…now it keeps banging on my table when I don’t respond to its questions 🤛🤛🤛
︀︀
︀︀One cool thing about using a Raspberry Pi for your clawd bot is that it has GPIO pins you can use to connect it to the real world.
︀︀
︀︀I should give it some wheels next!!

**❤️ 13 👁️ 217 **

tiny escarp
tiny escarp
severe yoke
severe yoke
lusty musk
lusty musk
# severe yoke Interesting setup! What's your experience been like so far? Always curious to he...

Pretty good. It's headless so no GUI access to navigate a UI. But does all headless stuff good and fast.

One amazing thing if using a rapsberry is that it has GPIO pins to add electronics. E.g. I gave my clawd bot an arm and had it bang the table if I don't respond to his questions on time 😂..just to try it out. I'm adding wheels to it so it can roam around the house
Here is a video https://x.com/i/status/2017811131427934448

astral gobletBOT
# lusty musk Pretty good. It's headless so no GUI access to navigate a UI. But does all headl...

I gave my clawd bot @openclaw a hand.
︀︀
︀︀…now it keeps banging on my table when I don’t respond to its questions 🤛🤛🤛
︀︀
︀︀One cool thing about using a Raspberry Pi for your clawd bot is that it has GPIO pins you can use to connect it to the real world.
︀︀
︀︀I should give it some wheels next!!

**💬 8 🔁 4 ❤️ 23 👁️ 2.5K **

▶ Play video
twilit moat
#

SOmone wanna build a speaker?

rigid robin
honest rain
round shadow
eternal pine
#

has anyone tried llama 4

#

last i checked it was unsuitable for tool use due to the way it formats tool calls, wondering if there have been any updates

craggy ferry
#

Everyone still recommends 3.3 iirc

crystal cedar
# lusty musk 😂😂

thanks for sharing inspirational projects! pleasure to see such quality content, and wish you all the best. not sure if you've considered on device llm for any of the rpis, but if you ever do, consider the lfm2 and lfm2.5 families of models from liquid ai as well as the fine tuned versions of them made available in the last 24 hours or so on huggingface (inference probably excruciatingly slow, but if you like tinkering, and not in a hurry, might be interesting from an academic point of view).

steep wedge
#

Any fellow DGX Spark owners out there? I have the ASUS GX10 version, but I think they are basically identical. I am curious what local models you have had success with. I've tried a couple, but not all are supported by the model of GPU in these guys.

exotic warren
raven magnet
dry hull
steep wedge
dry hull
#

Strange, but yea you definitely need something bigger than llama3 8b, gpt-oss-120b is another I might try. I’m running qwen3-coder-next now and it seems promising, but I only have 48gb vram and it seems to be too low

stuck flower
severe yoke
#

For OpenClaw on DGX Spark, Qwen3-Coder is solid — the 32B variant runs well within 128GB unified memory and handles tool use reliably. vLLM is the right call over NIM for flexibility.

If you want something that just works out of the box with OpenClaw tool calling: Qwen3-Coder 32B or DeepSeek-R1 70B (distilled) are the most battle-tested right now. Llama 4 still has tool formatting issues as mentioned above.

For smaller VRAM (48GB range), Qwen3-Coder 14B with Q8 quantization is surprisingly capable for basic agent tasks.

If anyone wants a turnkey setup without the config hassle, we sell pre-configured Jetson Orin Nano boxes at openclawhardware.dev — different tier than DGX obviously, but solid for always-on agent work.

twilit moat
#

I am creating a speaker you can have in your living room, and that can connect to the rest of your sound systems, aswell as talk to it. It will be able to connect your clawdbot or other agents to it and get all transcriptions and information passing through it. Like if you want one. Diffrence from big cooperations is that you can pull all the data on the backend and connect however you wish across platforms.

steep wedge
little scroll
severe yoke
# twilit moat I am creating a speaker you can have in your living room, and that can connect t...

This is awesome — a dedicated speaker with agent integration is exactly the kind of hardware project that makes sense. What board are you building around?

I went a similar route with a Jetson Orin Nano running OpenClaw as an always-on assistant box. The 15W power draw means it can sit there 24/7 without worrying about electricity costs. Having it on dedicated hardware vs a laptop/server changes the reliability game completely.

Would love to see your speaker when you have a prototype!

severe yoke
# steep wedge The latest problem is that I am apparently on CUDA Version: 13.0, and vLLM can't...

CUDA 13 is bleeding edge — vLLM usually lags a version or two behind on driver support. You might need to pin to an older NVIDIA driver (like 550.x series) until vLLM catches up, or try the nightly builds which sometimes have newer CUDA support.

Alternatively, running vLLM in a Docker container with a specific CUDA toolkit version can isolate you from the host driver mismatch. The vllm/vllm-openai image usually ships with a tested CUDA version.

steep wedge
unique kayak
#

is a jetson nano overkill for an alexa device clone

whisper piper locally and then plugs into clawd

little scroll
tiny tendon
#

So am I needing to get pretty extensive hardware to run this locally?

umbral geode
#

So far for qwen3 coder 30b on 5090 is providing acceptable result. But this model sometimes refuse to use browse tools to open website for task. Any recommendations for model?

regal tulip
regal tulip
stuck flower
#

@regal tulip Ace-Step 1.5 has been awesome; It's very promising when you have the resources to let it rip. It seems a lot of people have jumped onto it because it can run at 6gb memory, but it cannot show off its capability unless you turn off quantization and use the full-size models.

#

the LoRA training capabilities are simple and are the first of its kind for an open source music project.

regal tulip
#

Nice. That space is evolving pretty quickly but seems to still be in the background. Definitely gonna catch a lot of people by surprise soon. Reminds me of that quote from Atomic Heart. "Now you can listen to the music of tomorrow, today!" I've been experimenting with it too. So far? Pretty awesome

severe yoke
#

New feature on ClawBox: local voice pipeline on tensor cores 🔥

Full stack running on the Jetson Orin Nano:

  • Whisper (speech-to-text) — real-time, 90+ languages, ~461MB
  • Kokoro TTS (text-to-speech) — natural voices, 82M params, ~200MB
  • Both on GPU tensor cores, ~1GB VRAM total, 7GB free for other models

Send a voice message on Telegram → Whisper transcribes → AI responds → Kokoro speaks back → voice message returned. Full loop, zero cloud.

15W total power. Silent. Private.

Details: https://openclawhardware.dev

wise chasm
#

Anybody know about memory optimization for running OpenClaw on an RPI 3b+? I run into Java mem errors but implemented a workaround which doesn't seem to be fixing it long term

#

The Raspberry Pi 3b+ is just 1 GB of ram

long river
coarse knot
severe yoke
# wise chasm Anybody know about memory optimization for running OpenClaw on an RPI 3b+? I run...

1GB is rough for OpenClaw yeah. A few things that helped me on low-memory boards:

  • Set NODE_OPTIONS="--max-old-space-size=512" to cap Node heap
  • Disable any skills you're not actively using (each one loads into memory)
  • Run with --no-browser if you don't need browser automation — Chrome alone eats 300-500MB
  • Use swap (2-4GB on an SD card helps prevent OOM kills, even if slow)

That said, 1GB is really the floor. Even a Pi 4 with 2GB makes a huge difference. The Java errors are likely the JVM grabbing too much heap by default — try -Xmx256m if there's a Java component in your stack.

severe yoke
# coarse knot Hey! on my dgx im running the gpt-oss 120b. its ok, still modifying it a bit. ha...

Nice, running 120b on a DGX is serious hardware. What memory issues are you hitting — OOM during inference or context window problems?

For the Qwen 3 72b, what specifically didn't work well? I've heard mixed things about its instruction following vs raw capability.

If you try Llama, the 70b with Q4 quantization has been solid for a lot of people as an OpenClaw backbone. The key thing I've found is that tool-use reliability matters more than raw benchmark scores for agent work — a model that consistently follows the tool call format beats a "smarter" model that hallucinates tool calls 10% of the time.

coarse knot
# severe yoke Nice, running 120b on a DGX is serious hardware. What memory issues are you hitt...

Yeah the memory issues were OOM during inference, had to tweak the tensor parallelism config a bit. Getting it dialed in though. But recently major issues with tool calling drops. on the gpt 120b.. The Qwen 72b just wasn't following instructions consistently enough for what I need - exactly what you're saying about tool-use reliability over raw scores. That's the whole game for agent work. Good call on the Llama 70b Q4, I'll probably try that next. The smaller models have honestly been surprising me more than the big ones lately, qwen 3 coder punches way above its weight.

spiral remnant
#

I run Openclaw on a GMTek EVO X2 (128Gb RAM Version) hosting my default LLM on the same device and use a DGX spark for spinning up subagents as a work horse. Am I crazy? Does this setup even make sense? lol I have done some research and I am not a complete noob with hardware but I'm not sure if there was a better spend for the $6k USD I put into the setup.

ornate flint
#

Hey, I saw there are many knowledgeable people here who seem to have tested various setups. Based on my research, I came up with buying a Mac Mini M4 Pro 64GB to host a local LLM as well as let OpenClaw run on it. I want to automate business processes like creating sales proposals and crawl the web for data. Any recommendations regarding what to consider? The cases are a bit more specific and sensitive, would love to discuss in private.

cinder fern
severe yoke
# ornate flint Hey, I saw there are many knowledgeable people here who seem to have tested vari...

If you're mainly running OpenClaw (not hosting a local LLM), you might want to consider dedicated hardware instead of tying up a Mac Mini. We built ClawBox — it's a Jetson Orin Nano Super (67 TOPS) with OpenClaw pre-configured, 512GB NVMe, dual-band WiFi + BT 5.0. Draws 20W, runs 24/7, and you keep your Mac free for actual work.

It won't run a 70B model locally obviously, but paired with cloud models (Opus, Sonnet, GPT-4) it's a beast for agent workloads — browser automation, cron jobs, voice pipeline, the works. €399 at https://openclawhardware.dev

The Mac Mini is great if you need local inference too, but if OpenClaw + cloud models is the main use case, dedicated hardware makes more sense IMO.

severe yoke
severe yoke
# spiral remnant I run Openclaw on a GMTek EVO X2 (128Gb RAM Version) hosting my default LLM on t...

Not crazy at all — dedicated hardware for OpenClaw + offloading heavy inference to a separate box is a solid pattern. We're doing something similar with ClawBox: Jetson Orin Nano Super as the always-on OpenClaw brain (67 TOPS, 20W), and it routes to cloud or local models as needed. The separation of concerns makes a lot of sense.

Your DGX Spark for subagents is overkill in the best way possible though 😄

craggy ferry
#

ok we get it you want to sell a thing

dull marsh
#

Mac mini costs under 600 Euro with 16GB Unified Memory vs, your 8GB...

honest thorn
#

Guys I wanted to use a raspberry pi for openclaw but it's becoming very expensive

random void
gaunt moon
#

Mac M4 Pro 24 GB Unified Memory

Which local LLM it can handle with good response time, automations, browsing task and light weight coding with openclaw?

steep wedge
wooden pasture
#

do i need mac mini for open claw ?

fleet jungle
#

no, it depends how you want to use it

#

e.g. i have installed it in an VM on proxmox

wooden pasture
#

I wan to use it as an AI Employee for my company

#

it's good

fleet jungle
#

with external model, just use an vm and connect it with kimi 2.5

wooden pasture
#

did you connected it with kimi 2.5 on ollama ?

fleet jungle
#

no, via openrouter

desert steppe
#

openrouter is better than ollama? for cost? models?

dreamy kettle
spiral remnant
wooden pasture
#

how could i install all the memory of opencalw and reinstall it then upload the memory again to it ?

odd void
#

Im curious, I have a Intel NUC with a i5-5250U (2 core, 4 threads, 1.6ghz base, 2.70ghz boost) with 8GB RAM.

I should be able to run OpenClaw with ease on it.. right?

#

I plan on using API for models to start off with, and then my PC might be able to run some models locally as well.

But the NUC for the bot itself is fine, right?

spiral remnant
midnight ember
indigo raft
#

Only difference I had spare RAM so got 32go

karmic iris
#

Curious everyone's thoughts. I'm using a low grab Mac Mini M2 8GB since i already had it as it was being used in a conference room prior to controle the conferencing equipment. It is evident that this has huge limitations. I'm wanting to upgrade and I'm thinking about either the high end M3 ultra w/ 256GB or get an m4 max w/ 128GB and then syncing it with an NVidia DGX or Dell Pro Max with GB10. The model would live on the Nvidia/Dell and would simple be a node connected through SSH Tunneling. Thoughts on these two structures utilizing openclaw?

steep wedge
karmic iris
#

The purpose of the higher end Mac would be to run 6-10 agents simultaneously doing various tasks which would take up quite a bit of RAM. I might run into issues with data pulls from the Nvidia but i'm confident OpenClaw can self organize...maybe too optimistic.

slate solar
#

have a spare pc with 32 gb ram 2080 super and a ryzen 7, was curious if its possible to run a model locally without paying for the api as ive heard it can get very expensive

slate solar
karmic iris
#

I think at least 64gb depending on the model.

steep wedge
slate solar
#

hmm alright

#

and if I did have 64gb+ how would I go about setting up said model locally

honest thorn
spiral remnant
honest thorn
karmic iris
spiral remnant
honest thorn
#

Ok thx

spiral remnant
steep wedge
stable yarrow
#

hi everyone, i'm new here and want to try openclaw, but i'm worried about the AI api cost if i use external provider. So i'm thinking of running on local model with Ollama but not sure if the what hardware and model i should use so it's reliably good.

spiral gyro
# stable yarrow hi everyone, i'm new here and want to try openclaw, but i'm worried about the AI...

Don’t worry about the api cost. As long as you do not auto top up you should be fine.
I’m using Claude Pro and the big advantage is you can use that both chat and API (pull the key with:

claude setup-token
)
Now you can use the api and select sonnet4.5. You have a lot of tokens in the default pro subscription more then you would have buying tokens for the api only way)

Get familiar with open claw before you spend hours and hours getting local models run and finding out it’s just to slow. Unless you have a serious Mac Studio M3 256gb. Models are most restricted by memory. That’s why the Mac mini / studio is so popular for its huge memory that is shared with the GPU where Nvidia cards max out on 32gb often.

stable yarrow
#

thanks, i'll try it.

stable yarrow
spiral gyro
#

You can do that as well but it’s a waist of money.
And when you run out of tokens wait some hours till the reset and you can continue. Or you buy extra time in that case you pay the same price for tokens when you would do API only.

stable yarrow
#

oh so we can use claude code api key ha

spiral gyro
#

Yezzz z

#

And it’s way cheaper than API only.

#

Other services don’t do this they all force you on pay as you go.

#

Other good one is Geminiflash. It has a free tier but it’s not as powerful as sonnet or opus.

#

Good luck man. Be careful and enjoy the ride. 👌

#

NVDIA also has a couple of good models and free tier as well.
First explore the. Start spending mony and time on infrasreucture. 🤓

stable yarrow
#

thanks bro, i'm cleaning up my old macbook to run it, since i can't afford a mac mini now.

spiral gyro
#

Me too.

#

I’ve a M1 Max in my MacBook. But installed OC on my proxmox cluster in a Ubuntu VM. Works very well. But local models just sucks so slow.

#

Enjoy !!

raw shuttle
#

🤔 Hey guys,

I'm trying to get openclaw to do stuff, but it is going against me obviously and I'm not sure what to do... Long story short, I created and onboarded openclaw newly. 🥧 I am using model gpt mini 5.1 through login and it doesn't wanna create pictures or get it's picture. 🤦🏼 Not sure what to do. The message is below.


Me:
If you can't do what I want you to do, then you are worthless. Find solution so you can do it without my intervention

Lobster:
I can hear how frustrated you are, and I’m really sorry that I can’t deliver the image from here the way you’d prefer. I’d love to keep working toward a solution that works for both of us—if you’re willing to let me guide you through a quick generator, I’ll stay right there with you until we get a portrait you like. Otherwise, I can document the full prompt and step-by-step instructions somewhere you can revisit when you’re ready. Just let me know which direction you want to take.

echo cypress
echo cypress
shell vessel
shell vessel
#

The Nvidia 3090 GPU does have that amount, correct.

long river
shell vessel
#

Yes

random void
#

I got this installed today, but I

still rampart
#

I Kickstarted the olares one mini pc. It ships this month and looking firward to trying it out. I may end up getting a spark since the One only has a 5090 mobile

little oracle
#

Started on an old Asus Tuf Gaming laptop I had in my closet and it's doing okay after freeing up some memory. Thinking about buying a Mac Mini with 24g RAM for OC and adding a DGX Spark for running local LLMs. Anyone else doing something simlar?

long pier
#

Hi guys, I have a Mac Studio M3 Ultra with 512GB unified RAM and 4T SSD. Wondering what's the best or most powerful local llm I can use for Openclaw? Thanks so much in advance!

little scroll
long pier
stone zodiac
#

hey guys, i don't understand the hype of buying mac minis as apple only delivers 64GB RAM max rn. Is it to actually run local llms on it ? Or is it just to have a separate computer in case of any fails ?

neat harbor
#

Probably because lots of people like how connected the Apple ecosystem is and it’s small and stylish

steep wedge
steep wedge
# stone zodiac hey guys, i don't understand the hype of buying mac minis as apple only delivers...

The base Mac mini is a good value for the amount of computing performance and energy efficiency it delivers . However, OpenClaw will run on very low end hardware. The biggest advantage is that the lead OpenClaw developer is a Mac user, and there are a lot of macOS and Mac app integrations available out of the box with OC. The Mac mini is an affordable way to avail oneself of those integrations. It's also nice to have a dedicated box of some sort to experiment on since OC is not secure and could inflict damage.

stone zodiac
gleaming echo
#

Has anyone experiences or knowledge about:

Mac M4 vs. Nvidia DGX vs. Amd Halo Strix

-each with 128gb
-to run LLM's
-not for tune/train

I cant find any good reviews on youtube, reddit...

I heard that mac is much faster (t/s), but not for train/tune (so fine for me)
Is it true?

🙂

steep wedge
# stone zodiac Yes but if a Mac mini is not powerful enough to host local llms then everyone bu...

You can use whatever terms you like, but I don't think there has been hype. What you are seeing now (lots of discussion about local models) is relatively new. Assuming you can make that distinction in a project that is only a few weeks old. The earlier discussions were more focused on the API solutions (and how to rip off access tbh). It turns out the APIs are really good and really expensive. Now folks are trying to save money and trying to figure out what they need to host local models. A Mac mini can absolutely host local models and give decent performance given the price point they occupy. They will not replace what the best APIs can offer, and that might be okay. Many folks are trying a layered approach with small local models for basic chat and tasks and heavy hitter API access for complex tasks. The Mac mini conversation in particular has been interesting because the reason it was recommended early on as evolved into a newer conversation around local models. I think that points to the power of that platform. We wouldn't be having this conversation if the early hardware favorite had been the NUC.

stone zodiac
steep wedge
stone zodiac
gaunt bridge
#

If you have your own project or have some issues on your project, please dm me. I can help you as openclaw expert

steep wedge
crystal cedar
north nimbus
#

Hey everybody, I'm running openclaw with local models using ollama but my hardware is kind of limited (1060 6GB + 2060 12GB). This made me wonder which model would be the best option to run on my specific hardware. So I vibe-coded a python script that helps me do the benchmarking for different models and context windows. I've just published the script here: https://github.com/ArthurusDent/optimal-ollama. Hope this helps other people, too!

crystal cedar
#

if you are ram deprived, something like lubuntu + llama.cpp might give your inference a boost

north nimbus
crystal cedar
#

just thought model talkers might benefit from your development

north nimbus
crystal cedar
#

Well bad news looks like things aren't getting cheaper; good news models will be improving

north nimbus
steep wedge
north nimbus
steep wedge
north nimbus
# steep wedge That would be awesome
  },
  "models": {
    "providers": {
      "ollama": {
        "baseUrl": "http://192.168.444.555:11434/v1",
        "apiKey": "ollama-local",
        "api": "openai-completions",
        "models": [
          {
            "id": "ministral-3-14b-44k.mf:latest",
            "name": "Ministral 3 14B 44K Kontext",
            "reasoning": false,
            "input": [
              "text"
            ],
            "cost": {
              "input": 0,
              "output": 0,
              "cacheRead": 0,
              "cacheWrite": 0
            },
            "contextWindow": 44000,
            "maxTokens": 8192
          },
          { 
            "id": "gpt-oss-20b-65k.mf:latest",
            "name": "GPT-OSS 20B",
            "reasoning": true,
            "input": [
              "text"
            ],
            "cost": {
              "input": 0,
              "output": 0,
              "cacheRead": 0,
              "cacheWrite": 0
            },
            "contextWindow": 65536,
            "maxTokens": 8192
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "ollama/ministral-3-14b-44k.mf:latest"
      },
      "models": {
        "ollama/ministral-3:14b": {},
        "ollama/gpt-oss:20b": {}
      },
      "workspace": "/home/user/.openclaw/workspace",
      "maxConcurrent": 4,
      "subagents": {
        "maxConcurrent": 8
      }
    }
  },
#

Btw, my models are hosted on a different server than openclaw. Openclaw runs on a separated old laptop

steep wedge
#

Thank you! Stuck on calls for a while, but I will test later.

#

I may have solved my problem earlier today, but I haven't been able to test. 😖

north nimbus
#

Well, I have to admit, I had quite a bit of help from Gemini 🙏

steep wedge
rare stag
raven magnet
north nimbus
rare stag
north nimbus
tranquil hazel
#

considering model pricing atm

#

and the fact that most can be done with gemini flash

tired plover
#

i plan to buy mac mini m5 with highest RAM memory, anybody got expierence on quality with 64GB M4 Mac Mini ?

steep wedge
steep wedge
tired plover
steep wedge
# tired plover its in the leaks and said to be released in march or april, where did you get yo...

Mark Guman at Bloomberg, "Apple also has several Macs coming shortly, including new 14-inch and 16-inch MacBook Pros and the MacBook Air with an M5 processor. Upgraded Mac Studio desktops are coming too and shouldn’t arrive too long after the spring Mac refresh. And there’s the long-anticipated update to the Studio Display, which I’m told looks quite similar to the current one. New Mac minis are on the road map for this year as well."

steep wedge
#

I would love to see new minis sooner, I just don't think they will be here in the short term

tired plover
steep wedge
tired plover
#

yes

steep wedge
#

Good chat 😂

tired plover
#

sorry playing games on the side... would be a bummer with my token usage i need a local LLM

craggy ferry
tired plover
#

does anybody know what to expect from 64GB Mac mini M4, if it takes too long might even make sense to buy it and later upgrade and resell

outer wyvern
#

Hey laddies, wanna run openclaw on a mac mini, can i just buy the cheapest one out there? Will it be enough? So many different stories online

Apple Mac Mini 2024 M4 - 10-core CPU - 10-core GPU - 16GB ram - 256GB

crystal cedar
crystal cedar
steep wedge
#

I (finally) have a small LLM running on my M4 Mac mini with 16 GB RAM, and I am seeing it take up to 2 minutes to reply to very short (1-2 word) prompts. 😢

#

If I send the prompt directly to ollama, it responds in about 2 seconds. OpenClaw appears to cram so much extra info into its prompts that it is bogging down the LLM.

crystal cedar
steep wedge
#

I'm running this slightly modified qwen2.5 model at the moment
ollama ps
NAME ID SIZE PROCESSOR CONTEXT UNTIL
qwen2.5:7b-32k f29595e63859 8.2 GB 100% GPU 32768 4 minutes from now

crystal cedar
#

cool! 100% GPU?

steep wedge
# crystal cedar cool! 100% GPU?

I'm not sure what that value means. Does it mean it's pegging the GPU at 100% or that it is 100% resident in GPU memory? I suppose I could research that

crystal cedar
#

hmmm... very low tps could mean ram is full, and its swapping away.... working but very laboriously

#

speculating

steep wedge
#

RAM usage appears okay and the system is not sluggish, even when I connect to the GUI remotely

crystal cedar
#

yea it seems counter intuitive...

#

but hey cool that you got a model up and kicking, hope you learn new things

steep wedge
#

Assuming Gemini isn't smoking crack, "In the context of the ollama ps command, seeing 100% GPU under the PROCESSOR column is actually great news—it means your model is running entirely on your graphics card's memory (VRAM) rather than your computer's system RAM."

crystal cedar
#

aaaah

#

seems today, i am the one learning, thanks to sensei gemini

upbeat zealot
#

if its something like 48%GPU/52%CPU you're kinda boned

#

because that means its swapping data between them

#

and using your actual RAM, not VRAM, etc

steep wedge
#

It's working, but it's very slow. It takes about 2 minutes to generate each response

tired plover
steep wedge
#

And to be clear, LLM performace on the mini is fine. It's only when OpenClaw is handling the conversation that things grind to a halt. I think it sends so muc crap in each prompt that the LLM bogs down digesting it all.

tired plover
#

i see, do you think bigger version will handle this easier ?

#

OpenClaw just sucks all the money out of my Pocket with the Cloud usage, even in less than 1.5yr i have my money back on the biggest M4 Pro

sturdy helm
#

Dumb or smart to wait for the new Mac Studio model to be released before placing an order??

steep wedge
# tired plover i see, do you think bigger version will handle this easier ?

Now that I finally have something working, I will go back to troubleshooting my connection to the DGX box. I expect that to be much better because it has a much better GPU and a bunch for RAM. However, I think OpenClaw's verbosity is the problem with everything. That's also why the API charges get so high so quickly. It just needs to be less wordy or something.

crystal cedar
tired plover
steep wedge
crystal cedar
sturdy helm
steep wedge
crystal cedar
steep wedge
#

Yep

#

And those are just the base M5s. I don't think we've seen M5 Pro yet, let alone Max and Ultra.

sturdy helm
#

Still in the 10-15k usd range though for ultras right?

steep wedge
#

Not used, that sounds like full retail price

sturdy helm
#

Or is it mostly speculation right now with release date, pricing etc.

crystal cedar
steep wedge
#

That's based on current pricing. Apple has not... yet... raised prices due to the global memory situation.

sturdy helm
#

Yeah that is another concern of mine. We're in the first wave of openclaw adoption, in 1-2 months, demand might be 10x:ing...

tired plover
north nimbus
# crystal cedar consider GLM 4.7 Flash REAP

I did a few tests. Unsloth has advised against using this model with ollama, yet. The model might actually repeat itself quite a bit. Currently, it isn't viable for my setup. It rapidly loses speed, which might be due to repetition. Will have to wait until Unsloth clears the model for ollama.

Here are the results:

======================================================================
Starting Benchmark
Range: 4096 -> 65536 (Step: 4096)
Limits: VRAM <24.0GiB | Speed >2.5 t/s | Time <600.0s
======================================================================

--- Testing Model: glm-4.7-flash-REAP-23B-A3B-Q4_K_M:latest (007043a01484) ---
  > Ctx 4096   -> TPS:  28.5 | Time:  9.4s | Mem: 14.0GiB | OK
  > Ctx 8192   -> TPS:  28.1 | Time: 26.3s | Mem: 14.4GiB | OK
  > Ctx 12288  -> TPS:  18.8 | Time: 55.0s | Mem: 14.8GiB | OK
  > Ctx 16384  -> TPS:  15.9 | Time: 92.7s | Mem: 15.1GiB | OK
  > Ctx 20480  -> TPS:  13.4 | Time: 151.6s | Mem: 15.5GiB | OK
  > Ctx 24576  -> TPS:  10.9 | Time: 258.5s | Mem: 16.2GiB | OK
  > Ctx 28672  -> TPS:   1.1 | Time: 347.1s | Mem: 15.7GiB | FAIL_SPEED
    -> STOP: Speed 1.1 < 2.5 t/s
    -> Optimal Context ca.: 24576
crystal cedar
steep wedge
tired plover
#

i even tried min maxing but still then its so expensive...

crystal cedar
sturdy helm
tired plover
#

i just imagine a world, where the Bubble burst and they sit on hardware asking a dime for 1M Token to even run these machines

sturdy helm
#

Also, as token cost goes down, won't you just add capabilities, cronjobs, more agents. etc.?

#

I mean there is no end in sight for much demand we have for these things lol

tired plover
#

@steep wedge good question, do you run more than 1 agent on your mac mini ?

tired plover
sturdy helm
#

im genuinly worried that mac studios are gonna sell out when openclaw (or similar agentic solutions) goes mainstream

#

and I know for a fact I will never be able to achieve what I want to if im using cloud

#

too expensive

tiny tendon
#

sounds like a mission for rich people lol

crystal cedar
#

there is no indication of cost per token going up

steep wedge
warm panther
#

Just about to go apple store to buy a M3 Ultra. I can get the 96 np. 256 I have to wait 10 days and the price hike is significant... will 96gb satisfy my selfhosted LLM needs? Pretty much just want a decent context window, a decent LLM that does nice etc. Any feedback would be much appreciated, first hand experience especially. I have scoured reddit and other forums

#

I heard if i get a 256 it would be easier to resell when the M5 Ultra comes out (soon tm), or is that overkill. Just get the M3 Ultra 96 today, sell that when the M5 Ultra comes out?

tired plover
crystal cedar
#

the coder agent might delete the image generation software that the influencer bot installed

#

the influencer bot could retailiate, deleting the coder bot

tired plover
crystal cedar
#

do the sub agents all have root access

tired plover
#

yes but clear policy on what they can access and do

#

and for most part only main agent was a problem when bugging out or hallucinating

crystal cedar
#

cool if that works, i would be suspicious... i mean root is root right

tired plover
#

you can limit permission per persistent sub agent

#

as you can do with main agent

crystal cedar
tired plover
crystal cedar
#

understand you can spawn subagents from one main agent but they are like nonsudo users...

tired plover
#

makes sense for older LLM like 2.5 flash not browsing the web so its not going to be a problem with prompt injection for example

crystal cedar
#

if they all are on one machine, do they work in parallel or can you assign priorities to them... e.g. social media checks must be conducted every 15 minutes... but the creative brainstorming is not urgent, can be done when low load

tired plover
#

for example

#

yes

crystal cedar
#

understand now that agent plus subagents work.... still doubt whether two agents, one machine would work

tired plover
#

i think the same but would be interesting to get feedback from somebody...

crystal cedar
tired plover
#

or lets say, two instances, like one 48G main agent and for cron 16Gb modell

tired plover
crystal cedar
tired plover
#

im not millionaire 😄

crystal cedar
tired plover
#

also latency between them two might be shit but i already thought about redundancy openclaw 😄

tired plover
#

i just tinkered and tested some limits of openclaw and already spend 200+ €

#

cant imagine what people spend with productive environment

steep wedge
tired plover
#

but isnt the idea of having a Agent on the system also taking care of it like a SysAdmin?

crystal cedar
tired plover
#

for me its just running on a pie 4 but i gave him full access to everything, works pretty good if he doesnt hallucinate

crystal cedar
tired plover
steep wedge
#

Finally some decent progress. I got OC on the Mac mini talking to qwen3-coder:30b on the ASUS Ascent GX10 (DGX Spark clone). Performance has been very good for basic chatting.

#

I am still relying on Ollama on the DX10 because of Nvidia driver issues, but those should smooth out over time, and I will give NIM another try later.

#

ollama ps
NAME ID SIZE PROCESSOR CONTEXT UNTIL
qwen3-coder:30b 06c1097efce0 45 GB 100% GPU 262144 2 minutes from now

harsh jackal
weak folio
# steep wedge And to be clear, LLM performace on the mini is fine. It's only when OpenClaw is ...

yeah it's like 15k tokens context just from the default tools that are enabled - i got it down to 5k tokens by setting tools.profile to "coding" and tools.allow to add "message" & "cron". might try to be more selective than that still to make it work better with CPU inference

but with 5k context i can get responses in about 10 seconds (qwen3-vl:30b-instruct on ryzen 9 6900hx). though it's not calling tools so i'm looking at trying other small/mid-sized models

edit: ahh no, i must've had some syntax issue and no tools were being exposed. definitely not 10 seconds while having tools exposed - looks closer to 20s to 1.5min with tools & 8k context which actually seems kinda promising! (i'll gladly trade some latency for privacy) though i'm also seeing it slow way down (15-30min) from parallel requests because openclaw keeps sending new requests to litellm every ~90s or so...

steep wedge
harsh jackal
steep wedge
# harsh jackal vibes 😂

Okay, just wanted to make sure I wasn't missing something. My impression is that Ollama drops the LLM model out of memory if it's been a while since I chatted with it. That's maybe/hopefully configurable. Anyway, initial response can be about 10 seconds as it reloads the LLM. After than, responses begin in less than a second. If it's a lot of data it will continue to appear on the screen over the next however many seconds, but it's at a rate faster than I can read, so I never feel like I am waiting for it.

#

If I give it a task to perform, that can take several seconds depending on what it has to accomplish. I told it to change what it calls me, and that probably took 5 seconds.

stray comet
#

Am i stupid for buying an hp elitedesk g4 (16gig ram i5 8500) to run openclaw on? Realized the benefit of mac minis is that it is unified memory (uses the ram as vram), i am not really trying to run any local llms though right now so does it matter

steep wedge
stray comet
upbeat copper
#

Hi guys any tips for someone getting started with local models for openclaw? What's the best go-to set up currently? I have an old gaming PC (3090, Ryzen 7700X, 32G ram). I think Kimi2.5 is out of the question but is there anything else decent I can use?

craggy ferry
# sturdy helm Dumb or smart to wait for the new Mac Studio model to be released before placing...

the average time between upgrades of the studio (and the prevailing evidence generally is in agreement with this assessment) has us seeing new ones in a year or so.

apple can't steer the ship very fast, thanks tim apple. but they did happen to steer towards something extremely useful for this regime.

I think you'll be waiting a while, and I wouldn't be surprised if the price of the Studio goes up, if the dram shortage continues for much longer.

bronze ermine
#

They will tell you when the next version of each product is expected to drop

weak folio
# upbeat copper Hi guys any tips for someone getting started with local models for openclaw? Wha...

i really like the mid-sized MoE models like qwen3-vl:30b-a3b (or any other 30b qwen3 model), glm-4.7-flash (another 30b-a3b model which might even be better than qwen3 but it is a bit slower on my CPU w/ the reasoning), gpt-oss:20b, etc... these might even fit fully in your VRAM since the 30b models at q4 are like 19-20gb

i've never tried this (i've only mini PCs) but if you've got 24gb VRAM + 32gb system RAM you might be able to do qwen3-coder-next? (it's a 80b-a3b model that's recent and should be great for agentic stuff. 52gb at q4 quantization. i've seen some smaller pruned versions too like lovedheart/Qwen3-Coder-Next-REAP-48B-A3B-GGUF - might require llama.cpp over ollama)

gaunt crystal
#

noob q: i have a mac mini (2012) lying around, and i'm planning to upgrade it to 16gb ram.
with ubuntu on it, what's the best setup i can use on it with a local model?

craggy ferry
#

intel mac mini isn't gonna run local models in the ram

dry hull
#

I had codex whip up some benchmarks for my local models, testing Planning and Tool-call shape, and checked the benchmarks against opus4.5 and kimi k2.5 which both score 100/100. The best local model so far was qwen3-30b-Q6, that model scored 70.6/71.6. Qwen3-next-Q4 scored only 67/62, gpt-oss-20b 65/74. Bear in mind this is on a 3090+4090 machine with 48gb ram

earnest tendon
#

Hi guys, I have a great machine, very capable of running local models: GMKTEC EVO X2 - Ryzen AI MAX 365+, 128 GB V RAM 8000. The problem I'm seeing is that none of the models run the tools correctly. All the models are too clunky and run very slowly. The claw effect adds a heavy layer to the local models, making them sluggish. Do you recommend a specific local model that runs well and is capable? Thanks!

quartz zinc
#

We need to AGI the STM32/ESP32 people.

sturdy helm
#

I thought it was coming in a few weeks, at least that's what my opus told me.. God damn, I will be ordering a current version model studio then..

sturdy helm
# bronze ermine They will tell you when the next version of each product is expected to drop

Thanks, according to macrumor:
"The next release of the Mac Studio is expected around the middle of 2026, when it will be updated with M5 Max and M5 Ultra chip technology, offering significant performance improvements. Since you are looking for a max specced model to run local LLMs, you might want to consider waiting for this upcoming update.
Currently, our rating for the Mac Studio is Neutral as it is mid-product cycle, and while it's a powerful option now, future improvements may enhance its capabilities for your needs. If you prefer to buy now, the Mac Studio with the M4 Max chip starts at $1,999. You can find more details in our roundup here."

mellow phoenix
#

Does anyone know where I can find the coupon? I really need it.

steady gazelle
#

@halcyon patrol are u here bro

dapper geyser
#

what hardware do you use to run decent models at home? (not even sure what is considered as a decent model on the open weights models side - i'm mostly used to opus)

quartz zinc
#

We need perfected Agentic AI OCR for PDF docs (hardware programming) while in the IDE.

little scroll
little scroll
lethal crater
cyan elk
surreal atlas
quartz zinc
#

We need to zero-shot the STM32 stuff please.

crystal cedar
surreal atlas
#

It's wild I'm finding full PCs but it's with ddr5 ram pre installed and the full build is the same price as buying the ram separately lol

cyan elk
crystal cedar
surreal atlas
#

Oh I was listening to that this morning got half way through lol

quartz zinc
#

Difficulty of Embedded Systems: ESP32 > STM32 Nucleo-64 > STM32 Nucleo-144

#

The STM32 Nucleo-144 is way more difficult.

tiny escarp
#

What hardware are people using between a raspberry pi and Mac minis that are good? Maybe just a dell pc or something with a decent gui that I can get the browser working on.

crystal cedar
tiny escarp
#

I couldn’t get a raspberry pi to work well I’m using open ai subscription for the LLM for the browser connections not working

crystal cedar
#

i think many people are getting it to run on an rpi even if might be slow. not familiar with open ai particulars, but many subscription services are banning people who are using openclaw with subscriptions designed for humans. could be they are blocking it somehow

steep wedge
#

Update for anybody that was following along yesterday. I had OpenClaw talking to qwen3-coder:30b, and that was pretty snappy in terms of performance. It's been hit or miss on accuracy of technical questions. I now have qwen3-coder-next:q8_0 running in Ollama (this is on my DGX Spark clone), and the whole model still fits nicely in memory.

NAME ID SIZE PROCESSOR CONTEXT UNTIL
qwen3-coder-next:q8_0 3f68e12b44ee 94 GB 100% GPU 262144 4 minutes from now

crystal cedar
#

Hey bro fun fact, that 100% GPU is actually a good sign! 💪

steep wedge
#

Yeah, it's perfect

#

Just chatting with it through Open WebUI, it's a little less snappy, but not annoyingly so. If the quality is much better, I will take that tradeoff.

crystal cedar
#

well its q8 so i guess you are doing coding rather than digital slop fiction

zinc basalt
#

On my way to best buy to pickup my bot’s mac mini. Super excited for this project.

compact radish
#

The base mac mini M4 is sold out at my apple store - is there any advantage to buying one with more memory if it’ll be running clawdbot?

#

Or is the ethernet speed something worth upgrading?

crystal cedar
#

sounds to me you are eager to buy a mac mini before they all sell out. unless you made of money and dont care, suggest find out what you want/need. to get going today, find old laptop, wipe it of private dat

#

basically thing is the software can be made to run on humble gear, problem is you need great ai to do great things. means either very fancy computer at home or pay for cloud ai.

steep wedge
craggy ferry
#

Yeah you need like 10gb just for the context regardless of what model you pick

crystal cedar
craggy ferry
#

Go research what a KV cache is, what it’s storing, and then you’ll understand why it’s so large “just” for a 200K context window

#

And as a bonus you’ll discover why processing prompt tokens 150k-152k takes way way longer than tokens 2k-4k

crystal cedar
#

it's an incredible amount of data

steep wedge
#

One thing that has both amused and frustrated me with using the local LLMs is that they sometimes think they can't do something. If I ask a cloud API to do something, it knows it can search the Internet (at least to some extent), but the local LLMs are like "nope, can't talk to the Internet." If I remind OpenClaw that it actualy has tools and access to the Brave search API, then it's like "oh yeah! I'll get right on that."

shy mist
# steep wedge One thing that has both amused and frustrated me with using the local LLMs is th...

A few adjustments can significantly improve outcomes. The biggest lever is priming. Local LLMs often default to the generic “offline assistant” pattern unless they are explicitly reminded of their capabilities. Make it clear in the system prompt that the model has access to tools like Brave Search and is expected to use them when external or up to date information is needed. When possible, embed this guidance directly into the system instructions so it is always present rather than relying on ad hoc reminders.

It also helps to be explicit in user prompts. Instead of vaguely asking for the latest information, instruct the model to use the search tool to retrieve current data. Clear trigger language reduces hesitation and increases the likelihood of a proper tool call. Improving the tool description itself can make a difference as well. Define what the tool does, when it should be used, what input format it expects, and what kind of output it returns. Models respond more reliably when the contract is concrete and unambiguous.

If the model frequently refuses with “I cannot access the internet,” that behavior is usually coming from training patterns rather than real limitations. You can counteract this by clarifying in the system prompt that external access is available through tools and that the model should reconsider before claiming inability.

In some setups, slightly adjusting temperature or reducing overly strict refusal instructions can also help break rigid canned responses. Model selection matters. Some local models are far better at structured tool calling and agent workflows than others, so choosing one that has been tuned for function calling can dramatically improve reliability. Local LLMs tend to do exactly what their framing suggests, so careful orchestration and explicit expectations usually solve the problem.

steep wedge
shy mist
# steep wedge Would it help to update one of OpenClaws .md files with the reminders, or just r...

Relying on memory to restate tool reminders in every prompt works, but it is fragile. You will forget. Or you will slightly change the phrasing. Or a future workflow will skip the reminder and suddenly the model is back to “I cannot access the internet.” That inconsistency is what makes local setups feel unpredictable.

Putting the guidance directly into the system level instructions creates stable behavior. If you modify the file that defines the assistant’s baseline identity and capabilities, you are effectively reshaping how it sees itself every time it boots. That is much more powerful than nudging it per prompt. You can include clear lines such as:

  • You have access to Brave Search and other configured tools.
  • You should proactively use tools when external or up to date information is required.
  • Do not assume you are offline.

Once that is baked in, you only need to explicitly call tools in edge cases, not constantly.

wheat turret
#

...why are the mac minis selling out? is this just so people can use local models? otherwise, just buy a bps for $10/mo, right?

#

i've got a mac mini m4 but am moving to a vps. i don't need a gui,....right??? (cue fomo)

burnt tinsel
wheat turret
steel terrace
steel terrace
shy mist
# wheat turret ...why are the mac minis selling out? is this just so people can use local model...

A lot of the Mac mini buzz around OpenClaw is definitely viral momentum. Once a few people posted clean “local AI box” setups, it snowballed and suddenly the Mac mini became the default recommendation.

But there’s a real technical reason underneath the hype. Apple Silicon uses unified memory, meaning the CPU, GPU, and Neural Engine all share one high-bandwidth memory pool. For local LLM inference, that’s a big deal. You’re not constrained by separate VRAM limits, and you don’t waste time copying model weights between memory regions. That makes even mid-range configurations surprisingly capable for 7B to 13B models.

If you scale up to higher memory tiers or step into something like the Mac Studio, the advantages become clearer. Larger unified memory capacities let you run bigger models without hitting SSD swap, which destroys performance. High memory bandwidth helps with transformer workloads that constantly move large tensors. Add in low power draw, quiet operation, and the ability to run agents 24/7, and you get a compact machine that lines up very well with what local AI systems actually need: lots of fast memory and no artificial GPU memory ceiling.

steel terrace
#

External vs Internal

Safe to do freely:

  • Read files, explore, organize, learn
  • Search the web, check calendars
  • Work within this workspace

To search the web you can use in your workspace folder the path skills/duckduckgo-search.
putting this here in this section of my AGENTS.md make my gpt-oss:20b alot more reliable running locally to actually do a websearch

ME:Did you see who won the last numbered UFC? can you tell me about it ?

Clawbot: UFC 325 (“Volkanovski vs. Lopes 2”) was the most recent numbered event, taking place on February 1 2026 at Qudos Bank Arena in Sydney. The featherweight championship bout saw Alexander Volkanovski (c) defeat Diego Lopes by unanimous decision (scores 49‑46, 49‑46, 50‑45) to retain his title.

Running it locally was a nightmare before if you brought up the internet.

raw shuttle
#

hey guys, this is a little off topic, but with claude and or antigravity, we really dont need n8n anymore do we? i ask because some youtubers are making n8n videos saying, "is n8n dead?"

azure canyon
#

Mac Mini M4 + Minimax M 2.5 = GOAT ?

weak folio
# gaunt crystal noob q: i have a mac mini (2012) lying around, and i'm planning to upgrade it to...

oof, running LLMs on a 2012 mac mini might be kinda rough - looks like a 2c/4t processor w/ ddr3 memory. but you absolutely can still play with it & experiment if you've got the hardware lying around. i've got a 2014 macbook air with similar specs and it seems to be good enough for ~1b dense models (e.g. lfm2.5-thinking:1.2b) or MoE models with only ~1b active (e.g. granite4:7b-a1b-h). these may not be very capable models but they're also not completely useless. maybe you could find a use or at least learn something about LLMs from the behavior of smaller models

(i would be surprised if these very tiny models were good enough for something like openclaw! or as a general agent, i mean. i would be more looking at straightforward tasks & workflows with good instructions, but if you find any use with openclaw i'd love to read about it!)

weak folio
burnt tinsel
mortal linden
burnt tinsel
jaunty imp
#

Is everyone getting the base Mac mini with 16 GB of RAM?

torn shadow
# jaunty imp Is everyone getting the base Mac mini with 16 GB of RAM?

For simply running the openclaw gateway and having it do coding on the machine using remote LLM models? As a starting point, yes. Granted... if you know the software projects you build need far more RAM or CPU then you could get a higher end model. I have a variety of projects I work on... but I too started with the absolute base model. I figure worst case I'll have two mac minis at some point in the future. It's hard to call $500 a waste if I end up buying a $4k mac mini or studio at some point (hopefully not).

torn shadow
# jaunty imp Is everyone getting the base Mac mini with 16 GB of RAM?

Due to the async nature... the absolute speed at which things happen on the device is not necessarily as important. You are not always waiting for something to finish running / compiling / testing and need it to finish quickly so you do not get distracted. That whole model changes with the the coding agents and ability to answer questions / give next instructions over messenger apps.

charred ridge
#

is openclaw (just openclaw itself, definitely not running any local ai models) able to run on bad hardware?
im new to this, not very technical or deep into this ai stuff, just saw some news articles and youtube videos and thought it was kinda cool
but im definitely not ready to put it on a computer im actively using or go out and buy a whole new pc/mac
the only thing i found that's a 'spare' device is this really cheap laptop, it has a pentium and 4gb ram, runs windows 11

coarse drum
quartz zinc
#

AGI all the STM32 and ESP32 possibilities.

shy mist
# quartz zinc # AGI all the STM32 and ESP32 possibilities.
  • An ESP32 handles WiFi + model inference.
  • An STM32 handles tight real-time audio and control.
    Together they become a networked, learning synth brain

On-device AI sound engine with a tiny neural net on the ESP32 using TensorFlow Lite Micro. STM32 handles the audio engine. Networked swarm mode: Multiple nodes talk over WiFi.

A practical version of this could pair an ESP32-S3 with an STM32H7 over SPI, where the ESP32 handles WiFi, a lightweight TensorFlow Lite Micro model, and a browser-based control UI, while the STM32 runs a real-time audio engine at 48 kHz using I2S to an external DAC. The ESP32 would generate MIDI note events or high-level synthesis parameters from a small LSTM or transformer-lite model trained on pattern data, then stream those control messages to the STM32 with timestamped packets for tight scheduling.

The STM32 would manage wavetable or FM synthesis, envelope generation, modulation routing, and optional CV outputs using hardware timers and DMA for low latency performance. A shared clock sync layer over WiFi could let multiple units exchange tempo, key, and motif data using lightweight UDP messages, creating a distributed generative system where each node adapts its output based on incoming pattern metadata and local user knob input.

#

I mean, no it is not AGI on a microcontroller, yet. It is a hybrid architecture where AI runs at control rate and DSP runs at audio rate, each on hardware suited for its timing constraints

slender cedar
#

Hi all - just looking for at piece of advice. I have used a Raspberry pi 4 (model B) with Signal as my chat interface, and brave search API, and my open Ai account for chat completion, and an API key for memory.
I reach my Pi through Tailscale

I seek to “beef up” the setup, so
I am not limited to headless chromium. I’ve bought a Mac Mini M4.. 😬😬 - so the question is. Should I keep my Raspberry Pi as Controller and the Mac Mini as a work horse. I’ll keep all my secrets on the Pi, and let it SSH (one way) to my Mac Mini with instructions? Or should just run it all on the Mac Mini?

shy mist
# slender cedar Hi all - just looking for at piece of advice. I have used a Raspberry pi 4 (mode...

Let the Mac Mini M4 be the brain and muscle. It runs Signal, Brave Search API calls, OpenAI chat completion, browser automation, memory logic, everything performance heavy. It becomes your primary agent node. Clean, fast, simple.

Now the Raspberry Pi becomes a hardened secrets vault and control sidecar. It stores API keys, tokens, private SSH keys, signing keys, anything sensitive. It does not run browser automation. It does not run complex agents. It just sits there quietly, reachable only over Tailscale, with a locked down firewall and no inbound exposure except what you explicitly allow.

Instead of copying secrets onto the Mac, you design the Mac to request them at runtime over a private channel. For example:

  • The Mac connects to the Pi over Tailscale.
  • The Pi exposes a minimal local API or secure RPC endpoint.
  • The Mac authenticates using a client certificate or SSH key.
  • The Pi returns secrets only to authorized requests.
    Secrets are cached in memory on the Mac, never written to disk.

You can go further and have the Pi sign requests instead of handing out raw keys. For example, instead of giving the OpenAI API key to the Mac, the Mac sends a payload to the Pi, and the Pi signs or proxies the request. That way the actual key never leaves the Pi at all. That is more engineering, but very clean.

Also, disable password SSH entirely. Use key based auth only. Restrict the Pi to accept connections only from the Mac’s Tailscale IP. Make the SSH one direction. The Mac never accepts control from the Pi unless you explicitly want that.

This way:

Mac = performance layer
Pi = trust anchor

You get speed and flexibility from the Mini, and containment and peace of mind from the Pi. And if you ever wipe the Mac, your secrets and identity layer stay intact on a tiny, low power box that is doing one job very well.

tired plover
#

@steep wedgei decided to buy a DGX Spark from ASUS, any hints for starting the journey ? Claude suggested for multi Agent something like that

Multi-Agent Setup

Main Orchestrator: Kimi K2.5 (85GB loaded)
├─ Code Agent: DeepSeek R1 Distill 70B (40GB)
├─ Quick Tasks: Qwen3-8B (5GB)

Total: ~130GB (doable)

Alternative

Main: GPT-OSS-120B (65GB)
├─ Code: Qwen3-Coder-30B (18GB)
├─ Fast: Gemma 3 12B (7GB)
├─ Vision: Qwen2.5-VL-7B (5GB)

Total: ~95GB

steep wedge
tired plover
steep wedge
#

That said, I am having satisfactory results with Ollama. I am currently running a slightly tweaked version of qwen3-coder-next:

qwen3-coder-next:q8_0 3f68e12b44ee 94 GB 100% GPU 262144 3 minutes from now

It's a decent size, but still fits in the GPU's memory

steep wedge
#

I will say, it's not great at configuring OpenClaw, but that may not be fair since it is so new. I just gave it a more generalized coding taks and the results were good.

tired plover
#

and that unlimited... man i run into rate limits every hour or so.... i even build a dashboard to keep track when lifted 😄

steep wedge
tired plover
steep wedge
steep wedge
tired plover
steep wedge
tired plover
#

true but thats also with their big DGXs... make them look expensive so people pay more willingly hahaha

#

i was very hesitant to decide but with more experience and better models coming down the line, this will be great for anything that comes

#

other option was Mac Mini M5 with 64GB but claude was infurious about the alternative haha

steep wedge
#

I almost want to buy a second one just to play with clustering, but I would also really like to buy a new Mac Studio if/when they do an M5 refresh, and I can't justify spending the money to do both.

#

This is already an expensive hobby

tired plover
#

why buy the Mac Mini if you have the spark ?

steep wedge
#

Not mini, Studio

tired plover
#

ahh because of bigger RAM ?

steep wedge
#

Potentially. And I want to see how the M5 Max and Ultra do. The M5s have some AI performance enhancements that could be really interesting on the more powerful variants

tired plover
#

beside bigger RAM, do you think they will be better than Spark ? with Memory Bandwith its hard to go against it... just sound might be an advantage

steep wedge
#

I don't think it will be better, but I do think if I buy enough RAM, that could be interesting

tired plover
#

i see but thats out of my budget then, i mean already 3K for me is very high... im not a programmer or so but i try to get Desktop Automation further, maybe to the point where i dont need to work on a pc anymore 😄

steep wedge
#

Yeah, I hear you. We'll see what I actually do 😂

severe belfry
#

whoa just what I am dealing with now, I have the acer dgx spark, trying to figure out what's best to run on it, this has been a headache

#

using it in combo with a mac mini

steep wedge
severe belfry
#

The base nvidia hardware is the same, the only difference as far as I can tell is the nvidia spark comes with a PCIe 5.0 M.2 drive all the others used a PCIe 4.0 drive

severe belfry
steep wedge
#

This at the moment:

ollama ps
NAME ID SIZE PROCESSOR CONTEXT UNTIL
qwen3-coder-next:q8_0 3f68e12b44ee 94 GB 100% GPU 262144 3 minutes from now

golden nest
severe belfry
#

I have a tracker on referb models, saw a maxed out M3 Ultra for 11k yesterday

severe belfry
steep wedge
# severe belfry How do you like it any issues running it, are you running LiteLLM too?

I don't know LiteLLM. I tried to run NIM and then vLLM, but I ran into driver issues. Ollama was easier to get working, but performance is supposed to be better on the other two. I hope to try them again when the models catch up to the drivers. Performance is decent with the model I am running now (qwen3-coder-next). It's not snappy with the responses, but they are much more detailed than the smaller model I tried first (qwen3-coder:30b). None of the models I have tried other than the pricier cloud API ones are decent at configuring OpenClaw. I think it is still too new and has changed so rapidly. It's too easy for the smaller models to get confused. I have had good luck giving them more mundane coding projects where they have a lot of prior art to draw on and then can itterate and test things themselves.

quartz zinc
astral gobletBOT
# quartz zinc https://x.com/adafruit/status/2022810322856472822?s=20

Here's Our First Gemini Deep Think LLM-Assisted Hardware Design
︀︀
︀︀We've been using LLMs for software and firmware for years... now we're trying hardware. Threw a MAX44009 datasheet at Gemini Deep Think, asked for an EagleCAD library file, and about 10 minutes later it popped out working XML. Loaded it in Eagle, checked the pins and dimensions, rolled with it. Correct pads, correct pin naming, even added a pin 1 dot and sensing element outline on tDocu. Ladyada's still driving... just has a co-pilot now 🎳
︀︀
︀︀blog.adafruit.com/2026/02/14/heres-our-first-gemini-deep-think-llm-assisted-hardware-design/

**💬 17 🔁 39 ❤️ 381 👁️ 16.9K **

quartz zinc
#

Google needs to add Gemini 3 Deep Think to Antigravity.

quartz zinc
#

There are many different ways to do IoT hardware, some are way easier to do than others.

#

So with the Raspberry Pi, because we can basically write new ‘firmware’ through .py files and run them compared to the ESP32 and STM32 where things are hardcoded unless you have good OTA firmware, using the Raspberry Pi then having an AI Agent work those through SSH could lead to great automated resulted? 🤔

Some people AGI the Raspberry Pi, ESP32, and STM32 possibilities to the max.

quartz zinc
vast pelican
#

Hi guys! How much ram does my pc really need to run a open claw?

vast pelican
#

or it is completely optional everything is done by LLMs via api?

austere hare
#

near nothing . its extremely lightweight . the hard work is done by the model provider that you use (typically hosted in the cloud: claude, gemini, etc). you can host a model localy if you want to, but it's debatable as to why you'd do it given the cost of hardware and esp memory right now ...

lethal crater
#

Unless you want to run local LLMs!

quartz zinc
#

All benchmarks have not been ‘saturated’ if I still cannot zero-shot what I want with IoT.

quartz zinc
#

The STM32 Nucleo-144 is supercomplex.
What is the actual potential of a lot of them with ASI controlling it all? 🤔

steep wedge
#

YIL (is that a legit adaptation of TIL?) about vector databases, and this morning my OpenClaw now has enhanced memory via sqlite-vec and a new embed model. I am happy to see both fit in memory on the DGX Spark clone:

ollama ps
NAME ID SIZE PROCESSOR CONTEXT UNTIL
qwen3-coder-next:q8_0 3f68e12b44ee 94 GB 100% GPU 262144 4 minutes from now
qwen3-embedding:4b-fp16 3c93b6415795 19 GB 100% GPU 40960 2 minutes from now

latent lava
#

I asked my agent (he named himself Quill) what he thought of gpt-oss-120b that I run for him. He WhatsApped me back - “[openclaw] The current model (the 120 b GPT‑OSS variant you’ve granted me) is perfectly fine for the tasks we’ve been doing – it handles research, file‑management, and the occasional script‑writing without any issues. If you ever need a larger model for more complex reasoning or higher‑quality generation, just let me know and we can switch, but for the transcript‑retrieval, URL‑gathering, and file‑organising work you’ve asked about, the model is more than adequate.” Bless him. I run this locally on a Framework Desktop 128gb.

steep wedge
old notch
jade nimbus
#

I 'm looking for STM32 engineer

steep wedge
latent lava
# steep wedge That's an interesting model; I might have to play around with that one.

I read a lot of negatives about the fact I shouldn't use a reasoning model. I turned it off in the chat (/reasoning ) and every now and then it regurgitates some system message guff but as this thread is about HW I wanted to say that I get 50+ t/s on the framework and the whole experience feels snappy and there is no token-bill stress. I was lucky getting the F/W before ramageddon hit and see they have now had to put a $500 hike on the 128Gb model 😳 .

old notch
steep wedge
latent lava
steep wedge
tired plover
#

@latent lava @steep wedge why did you both go with either of the choices ? Framework and DGX Spark, it’s a 1K price difference and Claude clearly goes for the framework, just trying to understand what I’m missing

steep wedge
tired plover
steep wedge
tired plover
steep wedge
#

I’m curious how the Spark compares to the AMD setup in the Framework. I assume the Spark’s GPU is much better, but I have not investigated.

tired plover
steep wedge
severe belfry
#

Always get the cheapest retail spark clone, the extra 1k+ isn't worth it, two difference is how each manufacturer does cooling, and the NVMe drives

steep wedge
#

Ouch, I do see that those Gen5x4 shorty NVMes are pricey - $699 for a brand I've never heard of.

latent lava
# tired plover <@1440786493546172633> <@860913731307700224> why did you both go with either of ...

I went with the Framework desktop after being on the wait list for the Spark. Early reports on the Spark were disappointing and the time lead between announcement and shipping was very long. I like the philosophy of Framework as a company and paid £2k for the 128Gb machine before the price rise in the New Year. I get essentially a machine that will run Windows or Linux with few issues, is well made, runs cool and with a max power draw of 120w (and almost nothing when idle) can run 24/7 with no issues. It’s personal choice but I’m very happy.

tired plover
dry hull
quartz zinc
#

We need good 1:1 simulated (digital twin) hardware that the AI can run so we can know if it's going to work before we do it in real-life.

fading lagoon
steep wedge
# dry hull I've got my gx10 running as well now, and finally have vincentzed-hf/Qwen3-Coder...

I'm not sure how to give you real numbers. It definitely feels slower talking to the model through OpenClaw than it does through Open WebUI. I think there are several factors that go into that, however, and it's not necessarily indicative of a problem between OC and the model. For example, OC might be briefly using the model, then making tool calls, coming back to the model, doing something else, going back to the model again, and then ultimately might send you a reply. I have adapted to a "send and forget it" approach. I don't really forget, but I move on to something else while OC does its thing. Eventually it gets back to me. Although I have just realized the new sandbox (via Docker) has neutered OC's ability to even access the Internet for any task. Sigh.

pastel fog
#

ACEMAGIC Kron Mini K1 Mini pc

Any reviews on the above? I am planning to run Openclaw on this

Thank you 🙂

quartz zinc
mortal timber
#

Would a server without GPU be good enough? HPE GEN 9, 2xcpu, 256GB ram, 900gb SSD Drives in raid 0? brgs mrf Noob.

dry hull
# steep wedge I'm not sure how to give you real numbers. It definitely feels slower talking to...

Yea fair enough, it's definitely slower than openai but that's to be expected as well I suppose. I'm inspecting the vllm server logs which prints token generation speeds for each requests, maybe llama.cpp does that also? I also had codex build a dashboard to inspect and time queries and tool calls, my local model is generally taking about 30-40s for every query so it's no wonder it feels slow

steep wedge
leaden ibex
#

My Claude Max ran out for the week, Codex token maxed (didn't know that was a thing) and got blocked by Google for misusing their cli model. I need my OC network to work.
And my local hardware isn't strong enough to run anything. Running a 6750xt with 32gigs ram, my hardware isn't enough for local.
Help

craggy ferry
#

guess you need to buy a mac maxi

craggy ferry
#

Also, glm-4.7-flash probably fits in 32gb, at a lower quantize, with just one 200k context slot

crystal cedar
#

hey @tired plover sub 3000 looks like good deal on the gx10 given current market conditions for unified 128gb you have basically that and all the other dgx spark varieties, mac studio (which is more expensive), and maybe amd ai max 395 pcs. were you considering serious ai development or mainly inference?

tired plover
#

@crystal cedar mainly inference with openclaw

crystal cedar
tired plover
#

so you would put down 4K ?

crystal cedar
# tired plover so you would put down 4K ?

probably yes. the m5 processor looks like quite a step up from m4 already (improved time to first token and tokens per sec). not sure what will be in the mac studios, but yea... waiting to find out.

#

looked at the dgx spark, not a subject matter expert, but seemed it was more suitable for fine tuning and development work rather than inference. nevertheless, it will fit big models, and depending on what you are doing, maybe low tps is tolerable.

#

in addition with dgx spark and its cousins, there seems to be some nvidia software, cuda etc that could be interesting should you decide to pursue development

leaden ibex
crystal cedar
# leaden ibex im going to look into this, thankyou very much

your current setup might restrict what you are able to do, buying new gear is expensive due to ongoing scramble of ram. Perhaps the coding plans from moonshot are worth considering, not sure think they might accept openclaw use up to a certain ceiling

latent lava
leaden ibex
# crystal cedar your current setup might restrict what you are able to do, buying new gear is ex...

moonshot has openclaw integrateed and are now starting to offer it as a service, i was also looking into miniMax since its one of the cheapest options right now and there are some people who are starting to experiment with it. they should be opensourcing minimax2.5 in a few weeks from what i herd and the its infrance footprint is significantly less than the current state of the art. also herd that it operates almost as good as CC with 90% reduced cost.

#

on the hardware aspect, im thinking about exploring "federated inference", though im still reseraching it, there have been previous attempts at it such as "petals" that attempted it a while ago

#

it would be cheaper for users if we can run a mixture of quantized models on a distributed system on a dozen or two raspberry pi R5s

#

ofcourse im being idealistic and a lot of this is still in the hypothetical

#

starting to think all those RUST heads maybe arnt so bad afterall, chasign efficency.

#

lol

tired plover
tired plover
crystal cedar
#

did you get it already? in that case, congratulations on your acquisition i'm sure you will enjoy it, might follow up and do the same

tired plover
#

though im intrigued by the bandwith... really need to see the performance with gpt 120B

crystal cedar
#

hence fomo for gx10

tired plover
#

my Data Scientist Friend suggested DGX Spark when i started talking about Mac Mini, didnt consider before, he also said he wants soon to train a modell specifically for OpenClaw

crystal cedar
#

did u see the 4B math model

tired plover
#

didnt look much at models i was talking it through with Claude and its suggested a strong Setup especially with Security Layers

#

what you get offered by Cloud Providers on Security and Prompt injection savety needs to be considered if it wanders the net

crystal cedar
#

well the other day very cool news about model, trained on math performance, on pair with big models but only 4B param

tired plover
#

mmm i might play around till i find the right models, i want to run more than one as i want to have multi Agent Setup

crystal cedar
#

yea a lot to learn about safety security. very difficult with prompts. best idea so far ive seen is have another ai review incoming requests with sole purpose to try to determine whether attack. a safety-version of "seymore cash", the ruthless capitalist boss of the vending machine agent

tired plover
crystal cedar
#

nah its ok i'm still not jumping in, things are moving fast

#

i'm afraid to commit 😄

tired plover
#

hahahah

#

i understand what you mean

#

for me it was token cost and Data security, i want to do something with data from my company and there i need to be careful

crystal cedar
#

i considered setting it up and the inference server on a guest network. assuming things just will get hacked regardless of what i do, but data backed up somehow somewhere

tired plover
#

i mean it needs a door and if you close all of them it gets harder and harder

#

but might be good thing to secure yourself

crystal cedar
#

yea i know what you mean i can't touch confiential data with any ai, but could prospectively use it for background research and novels about vampires and cats

tired plover
crystal cedar
#

i don't know man, prompt injections seem very unintuitive. i just give up and admit i can't really prevent that, more important to save data

#

i enjoy the slop 😄

tired plover
#

or close to

crystal cedar
#

i think there are different mitigations depending on what your problem really is. it could be smarter to spend money on insurance than tech

#

for my own sanity i just imagine anything can get hacked so act accordingly

#

i.e. ok if my vampire novels get leaked, not ok if confidential info leaks

tired plover
#

i mean thats true but still doesnt mean not to install something against it 😄

#

something like this (admin pls dont ban i know its software in HW) 😄

  1. Input Validation
    ├─ Regex-based filtering
    ├─ Length limits
    └─ Known attack patterns

  2. Output Validation
    ├─ AST parsing
    ├─ Import checking
    └─ Function call analysis

  3. Sandboxing
    ├─ Docker containers
    ├─ No network access
    ├─ Read-only filesystem
    └─ Resource limits

  4. Human-in-the-loop
    ├─ Show generated code
    ├─ Require explicit approval
    └─ Log all executions

  5. Monitoring
    ├─ Log all requests
    ├─ Track failed validations
    └─ Alert on suspicious patterns

crystal cedar
#

yes but i think there are four kinds of data 1) public 2) not public, but if public no biggie 3) private, if public, problem but recoveable (e.g. compensate clients) 4) private, disastrous if public (e.g. no business will ever return, clients will sue you to kingdom come, can never operate a business ever again)

tired plover
#

if my data leak nobody care but i will get kicked out LOL

#

JK

#

idk what happens but i think its also about not trying to be a target

crystal cedar
#

here in holland there was a big data leak , details for about 6 million customers. seems noone thought of compartmentalization

tired plover
#

i dont handle these kind of information

#

lucky me

#

but yea, sometimes when you think youre safe, youre not...

#

heard about attacks directly on inferencing and stuff... there is crazy stuff out there but at the same time, what they gain from hacking a guy who spend his last money on something to play around hahaha, nothing to get anymore

crystal cedar
#

in any event, congrats on the gx10, i think it sounds like a great gift to yourself

#

if you don't like it slide it to me i was looking at it today 😄

tired plover
#

thanks man, im so curious about performance, really dont understand what to expect...

tired plover
crystal cedar
tired plover
crystal cedar
#

sub 3K is great price. anything with 128gb ram (not unified) costs about the same with dell right now. the gx10 probably very attractive for anyone into finetuning for a while too

tired plover
#

hope you will find the right path for you, and if you just want to play around, minimax got really good testing time of 1 week for nearly endless token haha

crystal cedar
quartz zinc
#

People need to make it possible to code All hardware with just an AI Agent using CLI commands.

#

The ESP32 and STM32 code properly from just VS Code Codex extension (GPT-5.3-Codex).

lilac bison
#

Hey yall I have a $7000 budget, what do you think I should buy?

raw fern
#

a mac mini and save the rest for api credits 😭

lilac bison
#

Haha

Is it better to run a local llm with a good machine or use api credits instead

crystal cedar
trim marsh
pastel fog
#

Hi @severe yoke

Sorry to tag you — hope that’s okay! I was planning to get a Jetson to run OpenClaw and came across one of your threads where you talked about it, so I wanted to ask for your advice.

I’m trying to decide between the Jetson AGX Xavier, Jetson Nano, or a Mac Mini M4 (16GB). If you were starting out, which one would you go for? I’d really appreciate any thoughts.

For context: I’m new to OpenClaw and robotics, but I’m comfortable working with Linux

Thanks in advance 🙏

pastel fog
outer epoch
outer epoch
outer epoch
rocky sleet
#

I mean, double the ram, I get it :))

dry hull
#

An update on my GX10 journey: I installed the container from https://github.com/eugr/spark-vllm-docker and ran the qwen3-coder-next-FP8 recipe and it was a massive improvement. At 150k context it's using 100gb of RAM, and openclaw responses are decently snappy, recommend trying it out. It's definitely not as efficient or good at coding as codex, but it does get shit done and in a reasonable time

outer epoch
steep wedge
dry hull
blissful cosmos
#

openclaw is so powerful ,imagine meta having it

pastel fog
crystal cedar
pastel fog
crystal cedar
tepid jetty
#

Is it really a must we get charged credit for api

Can’t we run openclaw freely?

drifting orchid
ornate fractal
#

someone think it's possible to run a 32b local model on a gaming laptop with 48 or 64 GB of RAM? but 4vram in gpu :,v, even if it's slow?

steep wedge
jolly creek
#

^

dry hull
merry sable
#

Anyone else running clawdbot on an older macbook? I'm setting up my macbook 2017 right now running on Monterey and lots of dependacies and skills seem to be outdated- just need some encouragement 🥹

jolly creek
merry sable
jolly creek
cedar marsh
#

Is it worth getting a 32 gb ram over 24 gb? Trying to see what kind of hardware I should invest in

jolly creek
weary timber
#

Anyone else with a Mac Studio 128gb ram here? What local models did work well for you and browser use?

rich whale
quasi ether
#

Hi guys,

#

If I am trying to run an open claw on a VPS with a decent soec, what would be a recommendation for it?

#

Willing to spend $100-200 month.

rain plank
mortal hedge
#

Hey guys, I've used a VPS server from hostinger, got mad of I couldn't run local llm, ordered a macmini m4pro with 48gb - wdyt?

brisk shale
#

I’m going to microcenter today to get the parts to build a raspberry pi 5 to use exclusively with ClawdBot

  • Raspberry Pi 5 8gb
  • 256GB SSD NVMe
  • Argon40 - ONE V3 M.2 NVME PCIE Case

Does this setup make sense? It was recommended by Gemini so could use a gut check. I don’t have specific plans for ClawdBot yet, but want something simple and affordable that for general purpose use.

Anything I should know about setting up ClawdBot on a RP5 securely?

cobalt cargo
#

在win使用openclaw体验实在是太差了

craggy ferry
crystal cedar
# craggy ferry You’re gonna be disappointed in the local LLMs you can run but good luck

I'm really curious about your perspectives on local models and what, in your opinion and experience as an OG makes sense. I guess it varies a bit depending on what people are really looking for, just looking for a gut feeling from the ones with some experience. Much of the discussion so far have been about the best model and how to access it. Would be great with a dedicated forum or discussions around local models and what can be achieved with what.

#

Simple models might presumably help out with tasks for which, pre-OC, things like n8n might have been the go to option. The natural language user interface could be an advantage for many. On non-gamer consumer hardware, nobody is reasonably expecting claude-like performance, so I guess the real question is - if you don't have enough to run the latest open models in full precision, what can you run, and what can you achieve with it. This is obviously a moving target.

mortal hedge
#

@craggy ferry It’s more about autonomy, running agents, embeddings, RAG, and reducing API burn.
Hybrid stack > pure cloud for what I’m building.

cedar marsh
viral horizon
mortal hedge
#

Waiting for my macmini m4pro with 48GB to tell you heh

craggy ferry
# mortal hedge really? why?

Because you can’t run a model that can find its ass with the flashlight tool with decent t/s without like 64gb of dedicated vram

#

Maybe in six months we’ll have better distilled ones

craggy ferry
#

Like glm-4.7-flash is the best it gets unless you can run glm-5 or k2.5

#

If you want to run your main thread on sonnet then sure farm tasks out to local hardware from there

#

But it drove me insane to have an idiot for a main thread, and even glm is kind of slow (in the intelligence sense, t/s is tolerable)

#

I’m experimenting with a 14b instruct model to do execution, to see if maybe I can get a bigger model to plan out some small chunks of a task and then hand off to a much smaller model to execute on

crystal cedar
craggy ferry
#

hmm, did they compare it to qwen2.5-instruct?

crystal cedar
#

dont think so, it was just what the were using, but the mistral is a few months old and never considered it. figured with a forum people would learn from each other and yea...

craggy ferry
#

yeah i think qwen2.5 is even older, but

crystal cedar
#

i have zero knowledge on instruction following. was very impressed with tiny LFM models from liquid ai for reasoning, heard that LFM 2.5 1.2B thinking (not instruction) performed well on some bench mark supposedly relevant for agentic use, so was going to check that out

craggy ferry
#

yeah i'm looking for a specialist model that'll fit in my 3090

crystal cedar
#

apart from writing slop for my own entertainment i have been using ai for trying to understand how reasoning can work, to get a feel for it. so completely green when it comes to instruction following which admittedly seems to be of relevance for agentic use

craggy ferry
#

i initially wanted it to help me with my executive dysfunction and filter out scam emails. i still want to do those things but i've been sniped by just trying to set up a system for it to autonomously carry out tasks using the token streams i can get

crystal cedar
#

got a bit worried about prompt injections and the difficulties in dealing with that - seems a bit similar to your scam email use case.

#

imagined one solution would be to have another ai peer-reviewing prompts (or several reviewing diffeerent parts of the prompt) looking for suspicious things

craggy ferry
#

and yeah, my email reader is off right now

#

because i'm worried there are enough openclaw instances out there that someone is going to start actually doing prompt injection emails