#hardware
1 messages · Page 3 of 1
oh dang, okay so ill deffo be waiting a bit.
im trying to oauth now i have ai pro, didnt know about gravity.
Read this as well, so you can turn your usage limits to 5 times more using 6 different accounts
https://support.google.com/googleone/answer/9004015?hl=en&co=GENIE.Platform%3DAndroid
ah so just sharing with "family" wouldnt you need to reauth each time?
i really only need a good model for brain, code. ill prolly use flash lite for other small things, getting up from 1gb ram to 24gb ram is whats going to help the most i think lol
yes, but there's proxies for that 😉
If you have a proxy with 6x google pro accounts like this, you could run claude opus 4.5 pretty much forever, and use gemini 3 pro for general stuff
Has anyone run a model on the Blackwell chip set yet?
It looks like a NAS could definitely do a minimal instance of the gateway and run some basic agents. But if you start doing advanced tools or need sandboxing you'll need more than what a smaller Synology NAS provides. I had Opus 4.5 review the openclaw repo and build a "suggested hardware requirements" if you want to take a look
https://gist.github.com/sdolgin/bc15d2844268e8e55459c94cb54b3799
Minimal (headless gateway) 1 vCPU 512MB
Standard (gateway + channels) 1-2 vCPU 1-2GB
Production (sandboxing + browser) 2-4 vCPU 4-8GB
Heavy (multi-agent + media) 4+ vCPU 8-16GB
Ill check it out thanks. I do see larger projects in the future... I just want to get the fundamentals of how it works first
I'd say the safest way to get your hands on it to learn the fundamentals might be a VPS or cloud hosting on a small VM or container.
i pretty much agree on these, though i did have browser automation on my setup just not very good, 1vcpu, 1gb ram vps. from google. running 1 channel.
just go all the way and get a 512g m3. with ram prices the way they are it's kind of a bargain
hilariously bad advice
🙂
D
and i dont suffer from fomo or other market pressures. I need 64gigs, would like 128. Woudln't buy more than that.
but i can udnerstand the sentiment
why not just run kimi2.5 3-bit
asus selling a blackwell platofrm for 3k at 128gigs, so its really between that and the mac mini m4 pro 64 or 128
I mean yeah you can probably get away with a deeper quant on a smaller model and be fine
im good with minimax m2.1 3bit 🙂
im just looking for a way to imrpove latency on responses, as strix halo sucks when the context window is larger than 8k
does the m4 add anything to the performance characteristics? inference is memory bw bound right?
yes, but the more important factor is prompt processign speed which is bandwdith bound
i guess thats what you said
i dont care a ton about overall tokens per as long as its double digits
yeah prompt processing sucks a little bit on my setup too but only when it has to swallow an entire 120k prompt uncached
(on my 3090+6000ada rig)
i cant imagine you're waiting several minutes
if it started out hot, it does take a while
yea, i wish i hadn't bought this one
or well i was gonna do like an aspire
thats 2400 that could have gone toward a mac studio or dgx spark
but two of that kind of device is basically a 512g studio
yeah
i mean, it's not, but, girl math
hahaha
100%
strix has its uses, for non agentic stuff like general chat and research its actually really good
and i find minimax is awesome
but i want it with openclaw
it starts off insanely good and and then the lag ultimately kills long running tasks
so im using glm flash on a pair of 5060 ti's i have in my homelab cluster
that works pretty well, but its clearly not quite as capable as minimax
at this point i think its worth spending the money but i dont want a repeat of strix
yeah i'm running glm flash too until llamacpp gets their shit together with qwen-next
and the q6 gets me enough spare vram to run 4 slots
probably more, but that's ... enough for now
well, i appreciate the feedback. im goign to keep mulling over the choice for the next week or so and try and get as much data as i can
by next week im ordering something.
i've been at the apple pay screen a double click away from ordering my bad financial decision
so many times lol
like, it's financed! if you were paying for claude max 20x, that's like $200 per month you're saving right there!
Has anyone successfully updated to 2026.2.3?
Hello,
New to this community. I have a gaming PC running on AMD Ryzen 9 9950X + 128GB RAM + RTX 5090
Looking for some suggestions on how to best setup this to make most of openClaw.
I initially started with a very old dell laptop running the main gateway and gaming system running local llm and embedding model.
hey all - I recently built a Reachy-Mini robot (Hugging Face launched it as open source hw) and thought it would be cool to have him embody my OpenClaw agent 🙂 So we can have face-to-face conversations and he can see and hear me, and I can see some physical emotional cues in his responses. You can check out the code here to experiment if you have one too, or just try in a simulator. https://huggingface.co/spaces/tomrikert/clawbody. Fun if anyone is interested in experimenting with:
• Embodied/robot integrations
• Personality or agent UX
• Voice agent pipelines
I’d love feedback, ideas, or collaboration. I’m curious how people are thinking about connecting OpenClaw to hardware like robots or physical assistants.
I've been researching robots too, would love to have something run around the home
I'll probably by something off the shelf to start
it'd be nice if we could get a dedicated channel for robotics
I feel that buying a cloud server imposes great restrictions on claw, making it impossible to automate x and ins.
?
Robot with OpenClaw agent
Maybe my configuration method is incorrect?
how are you getting this to work. I keep getting errors saying that it's no longer supported
With these specs, you could run one 30B paramater LLM such as Qwen3 or or its Coder variant, or you could run multiple smaller LLMs for different tasks,
There's a fix for it somewhere in the internet, about changing version number in antigravity config, but i never had to do it idk why
ok. spent hours messing with it and couldn't get it to work for some reason. Even had CC help me and couldn't find a solution that would be opperational.
you just change version number of antigravity
Ooh i like the idea of running multiple variant idea.
Quick question: windows vs linux vs wsl2 on windows
For running LLM
I recently came across vLLM
If your gaming computer is your main PC, I’d recommend staying in windows and try to run OpenClaw in a docker container to sandbox it from the rest of your PC, then run ollama in windows and connect the local host url to OpenClaw
Thanks i will let you know how it goes.
Awesome. I highly recommend asking Claude how to run ollama on windows and link it to your open claw instance in the docker container. Ive been chatting to Claude about it the last several days, and when I get my Mac Mini M4 later this afternoon that is what I will be doing
Interesting… i do want to test kimi with openclaw as well heard its game changing
Thinking about spinning up few agents to test few llms in parallel
That sounds like a pretty good idea. That’s the great thing about docker containers, you could setup multiple docker container instances of OpenClaw and link each one to a different model.
Since docketed containers are lightweight Linux vms, they don’t take up nearly as much space as a full vm
why not run different models under single instance of openclaw?
If he’s wanting to test how individual models work with OpenClaw in parallel at the same time is the only reason i can think of for doing that. Not a bad idea of you want to spend a few days testing which model is best at what task, but ultimately one instance of OpenClaw with multiple models for different agent profiles makes more sense.
My own experience has been that you need a frontier model to setup your foundation. But I guess it depends on how much you tend to over-engineer your own setup.
What are your recommendations on getting setup with a frontier model. I have done a decent amount of research on just trying to see how to setup OpenClaw with local LLMs but never really thought about if it is good to setup everything from the beginning with local LLMs
well, I am just one degenerate so take it for what its worth...
Is spent 2 days trying to set it up on MiniMax M2.1 because it seemed to be a good cost/quality balance.
Kept running into errors and having major issues (may have not helped that I did this just between the rebranding of Clawdbot/Moltbot/Openclaw..
Gave up, moved to run Kimi as they offered 7 days free on their code model. It ran with significantly less issues and was almost enjoyable (lol..).
I used Anthropic models a bit, but its not officially available here and the tokenburn was insane at the beginning...
That’s my biggest fear of using a frontier model. I have the $20 plans for both ChatGPT and Claude and I’m preferring Claude a lot more since I started to use it. I don’t want to get banned for using OAuth, and would rather keep as much of my more sensitive personal data out of the hands of these big companies as I can anyways.
Yeah, im getting close to have a foundation for development where Kimi orchestrates the initiatives and then assigns sub-tasks to sub-agents running cloud or local models.
need to run it for some work stuff, so having decently powerful local models was a requirement.
k2.5 is a good deal for you i guess
0.99$ for the first month as test is perfect
hey all, anyone running on GMKtec Mini PC Ryzen 5 3500U 1TB NVMe SSD 16GB DDR4 2.5GbE NIC LAN Office Home Desktop Computers, HTPC Proxmox, HDMI 2.1, USB-C, DP Triple 4K Display Nucbox G10 ? Thinking of setting up 5-8 agents on it.
The DP Triple 4K Display capability and and Office Home Desktop functionality will be really important for this
hehe
running openclaw on shitty lenovo yoga and the fans wake me up every morning
yeah i bet i have old piece of junk here and don't want to run this on it. I rather have something small, no noise and lower power draw.
Will 2018 Mac Mini work
a fucking raspberry pi with no fans work im sure 2018 mac mini is gonna work
Hey guys, I was wondering whether it's worth it spending money on a new mini PC or whether I can use my old desktop for running agents on it? The specs are Intel Core i-7-8700, 16GB RAM, NVIDIA Geforce GTX 1070 Ti, and what are the limitations going to be of running on my old PC rather than buying a new one?
Hi. Depends on your goals. I think the easiest to get started and experiment is to setup a machine in the cloud. like aws ec2, or contabo vps or hetzner. And run the AI model somewhere in the cloud also. like claude/opus or kimi 2.5
everyone talking about mac minis what about some other mini pc? any cool viable options?
Beelink is pretty good I hear
depends on what youre goals are
i think its distilled into too simple an equasion, unless your plan is to run cloud models
then you could run this on a literal toaster
and it wouldn't matter
if you want ot run local models, then you need to have a better idea of what you're trying to do, and what models are suitable for the tasks you want to use and /or grow into
The original developer uses a Mac, and there are a lot of macOS integrations available out of the box. You can run it on anything if you don’t care about those benefits.
Mine is running on a raspberry pi 5 8gb, cloud models only
Very low power draw, works perfectly
Agreed, I bought a Beelink S12 Pro used on Ebay for $150. You're going to be dependent on cloud services for everything except like Whisper for voice transcription, but that's all you need for running the openclaw interface. For a luddite like me I appreciate the security that no matter how badly I fuck this up, it's impossible for it to access my files
yeah an rpi is a pro choice, some people just want to have a nice machine
Mac mini is super popular because of the shared VRAM and RAM so for base model you get 16GB of ram and you can use up to 8GB or so for GPU, a comparable GPU will cost you between 280-400e while mac mini all in one will cost you roughly 600e or so, but you have other macmini models that have up to 64GB of ram that is up to 32GB of vram which is super powerful if you want to run models locally.
you can use a lot more than 32 from the 64gb models. maybe a model you can run on the 64g would be worthwhile for this, but 8gb you're not getting anything useful
If you want to play around with it, though, it is an option
I guess it's all up to the use case what kind of workflows you want to run or use it for. I am now in process of setting up on my PC and will see how much can actually be done with local LLMs, supposedly my setup should be powerful enough to help me with some of the stuff that I want it to be doing, I will see. Otherwise Cloud models will get cheaper and cheaper every couple months as chips get better and spit more tokens out. So that is good news as well.
I'm guessing local LLMs are more for the sake of security and keeping things "offline"
keeping things offline is definitely part of it, but also, tokens are expensive and i don't share your optimism for their future price. it's well known that AI companies are losing money even for tokens that hit their APIs and are charged properly, let alone the extremely subsidized flat rate plans
i don't want a surprise $400 bill because my agent got a little wonky in a way which the API absolutely hated
if i buy hardware and run the llm on there, then i have kind of locked in a price per token
it's slower, sure, but not intolerable
Haha! me neither and I saw that some people are spending a ton of money daily some of these youtubers I guess for the sake of testing and so on plus they can afford it but I saw that someone was averaging like 100$ a day.
i've heard tales of 1.1k days
Exactly first I would like to start benefiting from continuous workflows where I need to interact as little as possible and lock that part in, than I will start thinking of other things.
i have a hybrid approach, i let my local model run the main threads, but it can run sub agents that call out to opus to do things i want done "right"
idle chatting is expensive, because you are shipping huge context around and only incrementally getting replies. but it's actually really easy to keep all your prompts cached if you're the only one using the server
after i chat with the local model for a while and try out some stuff, i ask it to prompt opus to do the thing
Nice one I was looking at that option as well and picking some cheaper models as a hybrid setup. I saw that some models are extremly cheap or almost free, tho not 100% sure of exact TOS for those APIs
DeepSeek DeepSeek V3 128k $0 $0.0000 $0.00
Google Gemini 2.0 Flash 1M $0 $0.0000 $0.00
openai will give you a month of codex for free and they 2xed the usage limits
that would appear to be the best option right now
Cool
Well I still have about a week of work before I will start doing actually anything useful. Setting up this LocalLLM in Windows/WSL Linux configuration was really painful. So first I want to make sure that I harden the whole setup, write whole documentation and install process down in case anything happens I can than set it up again within hours and not days or if I want to replicate.
Have you been running it for a long time what kind of setup do you have ?
i was 100% opus for the first few days and now i'm running glm-4.7-flash on a 3090+6000ada
Very nice!
hey folks, been using a few different local llms as the main agent but havent found one thats smart and quick enough for orchestration. Any luck from anyone else?
glm-4.7-flash
Yall have some crazy setups, is any of this possible with just a 5070 Ti?
it's a beast I get > 10t/s tg on a 10 year old 4 core i5 at 2.5 Ghz without any GPU in Q4KM and F16 K/V
that sounds impossible
like, I get that
oh wait nm i get like 7x that so yeah that might track
OSS20B is 6.1 nvidiaOrchestrator 2.0 Qwen3-32B 1.2 (on fresh prompts - will go down I know...) ... but for ewaste it's pretty ok
Good to know I was reading about quantization, but didnt know about K/V, will be doing some benchmarking and tuning in upcoming days.
the K/V quantization might give you longer context, but depending on the backend Vulkan/cuda/avx2 on cpu , it might actually slow down during conversion
Anyone have any luck running local models? The only thing that has worked for me to actually get a chat back is to use Ollama model openclaw-qwen3 someone optimized. Maybe better to start using llama.cpp? Running on 16gb unified mac
Just for general use when I want to switch off of codex
End of the day, unless you're one a machine with 512GB of ram or more, the local models just aren't worth it. Try to get codex/gpt mini or nano, or gemini 2.5 flash, or other small "few tokens used" models for the busy-work. Otherwise it's too much hassle
I am surprised there is no Pin on a minimum requirements - or beast mode set up
Mac Mini 🥇
As I am starting to understand the project more, this definetly is not something that an average joe would be using, so no need for those kind of things, btw I think that one can limit tokens etc in config udner model/providers so it doesn't go insane if you are using cloud.
My take on the best strategy so far is to run a local model for the main agent, so it can keep all the context on you, itself, the workspace etcc...without costing an absolute fortune online. I guess for regular interactions and managing things it doesn't need to be fast, right? Just smart enough. And a lot of that comes from the context it has loaded. Then it can spawn off other agents that are more task-focused... On the other hand, models like Opus 4.5, Kimi K2.5 do seem head and shoulders smarter than anything else when dealing with me, the dumb human with its inherently incoherent communication style. Or maybe that's just because I let them browse the entire .openclaw/ folder tree one time, and they were smarter because of that. What's you guys take on the best way to integrate both local and paid APIs?
I feel like I've tasted the future, but right now it's just prohibitively expensive. Studio M3 Ultra with 512GB RAM is looking more "affordable" by the day.
aight models aside... my first failure was trying to install it on my 2015 MBP thanks to the legacy OS 12.7.6 which is the highest supported for that machine.
Has anyone managed to do it on a 2015 MBP? Either on the supported OS or by using hacks to upgrade the OS beyond that?
I just having the hardest time to setup on a windows box, my powershell and likely my WD is killing every attempt to install via the one-liner
Your best bet is to install a low weight Linux kernel on that 10 year old MacBook Pro. Try xubuntu 24.04.
Best local models for running on rtx a4500? 20gb vram
There is something everyone is missing. You are talking about hardware to run your ai on but what about hardware your ai runs?
What’s the relevance of posting this?
Bought a used Dell Optiplex 7060 Micro. Runs openclaw really well. I combine this with an existing coding plan I'm on (Minimax previously => now Kimi Code). Makes for a very affordable setup.
Used Dell Optiplex can be bought for as little as $100, depending on spec and your location.
You are aware it runs on a calculator let alone an oversized paperweight
lol yes
is 4 GB DDR5 ECC RAM enough for openclaw?
Many times over
i run it on a pi4 b, so yea
is mac air m2, 8 gb ran fine for openclaw?
Has any try openclaw with NVDA Nemo Nano 30B? I gonna give this one a try once i can get my claw to talk to me...
I’m figuring out what to run and how to run it. The Mac mini looks like a good option, and so does running a local model. What’s your experience?
Is the base Mac mini (10-core ARM) enough for local models? How good is the support for running local models on the ARM architecture?
I’ve only run local models on my 4090 before, but I haven’t tried it in a long time. I assume 16 GB of RAM won’t be great, since VRAM will be shared from system RAM.
Do you think 24 GB is better, or should I go for 32 GB on the standard M4 mini? Does anyone here have real hands-on experience — and if so, what local models can you actually run on it?
Thanks for the replies, guys — may your Jarvises serve you well :). Tony Stark soon 🙂
the real problem is the amount of memory , you can always run quantized models that use less bits for the weights but the quality of the output is not the same... and you need to hope that someone did the quantization you need
you can get an idea of what you need launching lmstudio that does an estimation of the memory required and you can browse the models easily
I have the same ... nice
Ayo, 🦞
Got it running locally on a Mac studio M2 ultra 128GB.
qwen3-coder:latest 256K
But it struggles with 256k had to scale down to 32k.
Yes I got it to run locally, check out.
You really want that context tho. Try a quantization
GLM-4.7-Flash is the current goto, I believe you need a 32gb+ model to run it smoothly.
M4 Pro will be significantly faster compared to the standard M4 due to memory bandwidth
Yeah glm can at least be coaxed into solving problems itself
Can someone please confirm that I'm either dumb or not insane?
I have an RTX 5090, which is the higher end of consumer GPUs right now. Simply put, there is no local model that comes anywhere close to the top 5 cloud models, maybe not even the top 10.
The best I can put on there is Qwen32B, DeepSeek32B, or maybe DeepSeek70B at a 3Quant. Yes the compute is "free", but it's all so much worse than Codex 5.2, or Opus 4.6.
I'm having a very hard time understanding why people are buying $10k Mac studios, and even daisy-chaining multiple $10k mac studios, just to run the ~6th best model.
Am I missing something? Have I not optimized my hardware? Maybe my use case is just different, but I genuinely do not understand why people are spending fortunes for local compute right now.
I have had a lot of problems with local models on an RTX 5090. I can't find a good model that fits into 32GB VRAM and is also good at agentic tool calling. If you're just doing thinking tasks or like video processing, sure, but I'm mostly just duct-taping stuff together and doing lots of tool calling, and everything seemed to always fail
You are correct. Consumer gaming GPUs are not capable of running LLMs that can do this. You simply do not have the vram to run a model big enough with a big enough context to make that work.
The break point is around 64GB. That’s why Macs are so prevalent. Those $10k Studios have 512GB of vram (effectively).
The sixth best model is what, kimi2.5? Minimax? I get reasonable results from glm-4.7-flash. It’s no Opus but it does do some things.
Maybe if you had two 5090s.
Keep in mind that the value of the tokens we can generate using this hardware will only go up over time. Open models get better all the time, new stuff is constantly released. Yes so will the cloud models but if your benchmark is today’s cloud models then six months from now you can probably see that quality in a local model … if you can run it.
Makes complete sense. I thought I was going insane. A Mac studio makes much more sense in this case. Unified memory FTW. Thank you!
I run local models for privacy reasons and because I'm a pennypincher. local models can do most of things I'd pay for a lot if I used cloud models
Thank you for sharing this. I was wondering what Mac's dominance in this space was about
I didn't realize a Mac Studio had 512 GB of vram. I don't know of any GPU cards that have that. I'm sure they exist Im just not familiar with them
Yeah Apple's decision to go off of Intel and into their own Apple silicon in 2020 was accidentally the best possible move for them
For serious local-model work, the top-end Apple Silicon Mac Studio is ideal. Think M4 Max or if you plan to train your own, go for the M3 Ultra. That gives you high‑bandwidth unified memory, lots of neural‑engine throughput, and double the Thunderbolt lanes.
- Chip: M4 Max Chip - 16-core-neural engine, 40-core-GPU, 16-core-CPU). The extra performance cores matter for concurrent LLM runs plus build/test pipelines.
- Memory: 64 GB/128GB unified memory. Local models (Kimi K2.5‑class, 70B quantizations, multi-agent inference) chew RAM; unified memory also feeds the GPU.
- Internal storage: 2 TB SSD. Keeps model weights, vector indexes, Docker images, and scratch space local without juggling externals. It’s also Apple’s fastest NVMe option (higher configs get more NAND channels).
- Networking & ports: 10 Gb Ethernet so we can move data to/from NAS or other nodes quickly. Grab the highest Thunderbolt count so we can hang fast external NVMe enclosures for cold storage or model archives.
Why this config
• Unified memory is the limiter for local LLMs and GPU-heavy coding tasks. 64 GB/128GB lets us run a 30–40B model plus still have headroom for IDEs, Docker, and vector DBs.
• Pro/Max chips have bigger neural engines, more GPU cores, and higher memory bandwidth. Key for multi-agent setups.
• Bigger internal SSD = faster scratch and fewer bottlenecks when you hot-swap models or run multiple fine-tunes.
Do you think future openclaw setups will be:
- 1 device / 1 openclaw install / multiple agents & subagents
- 1 device / multiple openclaw installs / multiple agents & subagents
- multiple devices / 1 openclaw install / multiple agents & subagents
Trying to figure out if device partitioning will be necessary in the future, or if individual devices will be able to reasonable contain multiple openclaw installs
1 device / 1 install / multiple agents & subagents - This is probably going to be the standard setup for most people.
Multiple devices / 1 core install / multiple agents & subagents. This is actually very interesting long term.
- Option 3 becomes normal once people treat OpenClaw like “home AI infrastructure.”
- Option 2 stays niche for power users and security conscious people.
Partitioning starts to make sense if the ecosystem becomes messy with unsafe third party skills, Agents can execute arbitrary code without strong containment, You are mixing high trust and low trust workflows. That’s a clear blast radius problem. At that point, isolation at the OS level or even separate devices becomes attractive.
The limitation won’t be hardware. It’ll be security architecture discipline.
I have 4 raspberry pis and I'm desperately looking for a reason to use the other 3 - doesn't seem to make sense at the moment though ha
Go for it. Put them to use. Automate Local tasks, clean up and organize folders, run system updates automatically, create backups on schedule, pull RSS feeds and notify you via Telegram, auto-reply to messages or forms. your Pi becomes your personal automation engine.
How's your experience running openclaw on Respberry Pi? I am thinking if Pi5 Model B(16GB) is capable of running local model or not
definitely not. don't run a local model on it lol. it's fantastic as mission control, not for compute. slap an anthropic API key on it, ssh in and you're good to go
Who has run local llama successfully? I have both running on Raspberry Pi 5 and either will run separately but when I point open claw at llama locally it sounds like a mini turbine is running with the cooling system.
Hey guys,
I wanna instally and use OpenClaw securely in the best possible way.
My use case: My bot should be able to use browser, scrape web data, access to APIs, etc.
i dont need any local LLMs
I have a few options:
- Use VPS
- Buy a Mac Mini
- Use an old windows pc
I don't mind spending money but don't wanna WASTE it, the data and security needs to be the priority.
Sometimes I'm dealing with moderately sensitive data like Slack channels, emails, etc.
So I just want your recommendation.
I am a little bit techincal, use claude code sometimes but wanna use OpenClaw so it can perform tasks like an employee and an AI assistant.
NEVER buy a mac mini for that
get a raspberry pi
I am running it on that machine and it works great
as i said, im not too technical 😭 im guessing that'd require some technical knowledge?
not much actually
but yeah, i can see how getting a mac mini and installing the desktop app is tempting. Yeah, maybe go for that
i think i can give it a go, have some friends who can probably get this thing setup
I was able to get things running with nemotron-3 running in ollama, using around 20GB of VRAM, but the results were underwhelming. I'm running GLM-4.7-flash now, which is performing much better, and with 64k of context using a custom modelfile, I'm using 23GB of VRAM (running on an A2000 and an A4000). The big tip at the moment is that the current 0.15.5 pre-release of ollama is running glm with a lot less VRAM than before - with the previous version I was running out of VRAM at 32k context, but with the pre-release I can run 64k with room to spare.
quick question, does your Pi sometimes gets really hot? is that normal?
Else anyone here including me can help you for sure
mine doesn't, but I have a cooling fan on it
a little one that is made for the Pi
Hey guys is this where I post about how I ordered a mac mini to run my familiar on
I thought they were supposed to be cheap tho
Oh of course we don’t do embeds here. Probably for the best
Also, it kind of feels like we should have a local models channel
We have a models channel, and a hardware channel, and they’re both like, kind of halfway relevant
Thanks this was very useful hint about SSD, I just read about how important it is to have it on SSD especially if we don't have enough VRAM
I tried another option as hardware for Openclaw -> Run openclaw on Android phone with UserLand. Update: Not stable. Not recommended.
Finally got my 32GB Mac Mini M4 in the other day and been setting it up to be as secure as possible and getting OpenClaw going. I now have my bot, Clawtana, getting local LLMs setup for her to run on so I don’t eat through all my Claude Pro Plan usage
I have been experimenting with OpenClaw on a GMKtec EVO X2 AI Mini PC
AMD Ryzen AI Max+ 395 3.0GHz Processor; 64GB LPDDR5X-8000 Onboard RAM; 1TB Solid State Drive; AMD Radeon 8060S Graphics. My return window is about to close, so I am looking at my options. The two models that work well on this box are cogito 32b and gpt-oss 20b. gpt has a better response time and might also be better with tools. This machine was $1600. I would prefer to run my models locally, I am trying to decide if i should get the 128GB version of the EVO X2 for $2200 or something else. One something else is a Mac Studio (for more money) or a standard PC with a discrete GPU. Any suggestions?
can anyone give me some recommends for what i can do on a Xiaomi Note 14 Pro 4g, 12gb ram, 512gb storage and can add more if necessary
32GB isn't a lot. I haven't actually done it yet, I'm trying to formulate a plan, but my plan involves buying a Mac Studio with 512GB RAM. Sure it's 10k but at today's prices that'd be cheap for system RAM, this is effectively 512GB VRAM.
512 g ram is car money lol
Downpayment for property
Hello folks,
Here a developer with Mac M3 Max 64Gb.
1 - I checked around ( Gemini, Chat, previous conversations grepping 'M3' ) and i tested it myself:
The best I mounted locally is qwen2.5:32b-instruct but it's too slow and unreliable for coding tasks.
So I am wondering if folks are buying 1 mac mini only for hype as suggested by @torpid flower ('get a raspberry pi').
Confirm?
[ i could understand multiple macs to share RAM and mount a model.. but just 1 how can suffice for building new skills autonomously? ]
2 - broader topic: someone that develops software has found a satisfactory hardware config to program fully with local models?
It does not need to be Opus level - just reliable enough to focus on creativity.
[ I am actively looking into this, here just asking the opinion of the community ]
your familiar needs love just like you
I feel like an M1 Mac Mini would be the ideal machine for this stuff. If you already had one and were looking to upgrade it's a great use case for the old one. Took me a few days to install on my old Intel macbook-pro and the fan noise is scaring me, but that's not Openclaw's fault. A raspberry pi is probably sufficient if you're using cloud models. I don't see the point in half-assing it with a Mac Mini, unless you load it up with all the server things.
I understand that the Mac Mini hence is mainly for the low energy consumption and fan noise?
Because if M3 cant be used for local coding models.. how can M1 do the job?
Unique reasonable thing to me is ease of setup and handover to 'human mood' to take back manual control of the server.. worth the 800 bucks?
I also needed a second computer and I don't have mac hardware yet except for iphone, so I couldn't build iOS things before buying that mac mini
it's low power and I can run it 24/7
it just made sense
got it with 18% discount from amazon germany as well
i got 74.9gb of gpu memory lol
thanks for sharing it ForgedAim! even with 75Gb, what model could give you decent performance?
ive used 3 so far but i messed up on boarding
so im trying to figurre out best model to refo on boarding
Which ones are these 3 models?
qwen/qwen3-vl-4b
run fast
Ministral 3 14B Reasoning ran fine but seems not as good
im a try a bigger model
qwen/qwen3-vl-30b
is next
when i switched models
I understand you are trying them for openclaw, hence to let an agent 'click the right buttons' and communicate back and forth.
Not for continuous reliable coding, correct?
it kept repeating
i would use it for continuos coding
if i dind it out
i been using anti gravity
and windsurf
me2
i got a beast pc
got a beast PC AND a mac mini now
Windows 11 Home 64-bit
CPU
Intel Core i9 9900K @ 3.60GHz 38 °C
Coffee Lake 14nm Technology
RAM
128GB Dual-Channel DDR4 @ 1600MHz (16-20-20-38)
Motherboard
Gigabyte Technology Co., Ltd. Z390 AORUS ULTRA-CF (U3E1) 28 °C
Graphics
DELL E2422HN (1920x1080@60Hz)
Intel UHD Graphics 630 (Gigabyte)
3071MB NVIDIA GeForce RTX 2080 Ti (ASUStek Computer Inc) 29 °C
SLI Disabled
Storage
931GB Samsung SSD 970 EVO Plus 1TB (Unknown (SSD))
931GB Samsung SSD 860 EVO 1TB (SATA (SSD)) 25 °C
Optical Drives
No optical disk drives detected
Audio
Realtek High Definition Audio
128 gb ram is pretty baller lol
yeh but
i been running whole time with integrated gpu off
the mac mini can run 24/7 consuming almost no power
just turned it on and it gave me another 40gb vram
i have clawbot on a old dell 'attitude
1000 watt psu
on wsl in ubuntu
ya my pc has over kill psu
i had to buy a dual motherboard psu
i need it for gaming
mine hx1200i
I understand antigravity - I am using cursor mainly and claude code as peer but the feeling of 'i am burning cash as i am typing' adds too much friction for unleashing creativity
Ya don't I been using windsurf the have unlimited credits to swe
I've coded 600000 lines of code in 3 days for free
as far as i see windsurf is subscription-based too, isnt it?
it is
but free tier
has free access to a model swe till like 3/24/2026
it fully coded a no recoil macro control system hence forgedaim
lol
with an auth server hwid key binding 256 bit encryption with files executed in memory only
its crazy what ive been able to do with ai
has a complete server client eco like a remote admin tool
next llm im downloading is qwen/qwen3-coder-next
I'm installing openclaw atm via SSH on my mac mini downstairs
They're good value brand new for 800. An M1 would do the job of a server and clawbot withotu breaking a sweat. It's just a bit bigger in form factor.
what's a good llm model to use for openclaw, the gateway is on a remote server, my linuxbox. i have openclaw node running on my win11 box with RTX 5080/16gb w/ 64gbram and lmstudio. but not sure which model to use on lmstudio.
Has anyone ran openclaw on an ipad? I did a quick search and found a few apps that looks like openclaw or goclaw.
has anyone figured out how to cluster AMD strix halos together with high networking speeds between nodes? Is this the lowest cost/most performant setup for a large model (>300B parameters)?
Take a look at the DGX Spark or a Mac Studio. Problem with running into building your own PC is the 5090 is the best GPU you can get as a consumer, it only has 32GB of VRAM and you’ll need at least 48GB of RAM to run windows and have enough memory to move the model between system memory and the GPU. Mac Studio with M4 Max will probably be better
or has anyone for a high vram server that doesn't blow a fuse on a 15amp 120v outlet?
Any local models I could run on my asus rog ally x? Thinking of using it as an alternative to a m4 mini. I also have a mid range 5070/7800 3xd rig i use for gaming but dont know if running a model on it 24/7 will hinder performance over time
Whats the experiential difference between tiers of 16/24/48/64gb of unified mac ram?
For running local models and agent setups focused on communications/office work
Thanks. I will check it out. I do not plan to run windows. I have ubuntu installed on this box.
It's probably what you would expect. The more RAM you have the larger the models you can run locally. I was testing Ollama with qwen3:30b, and performance has been pretty good, but I have 64 GB RAM on this laptop. It's "only" an M1 Max though.
has anyone had any success with clawdbot running your n8n automations, and building front and back end apps / webapps ?
I did some research on what/how I plan to setup and use openclaw for as I am finalising the setup and having a good NVMe plays a crucial role in my own personal setup as I have RTX 3050 and 32GB RAM. So I am planning to add a 2TB gen4 M2 Crucial SSD for offloading that will be needed. If anyone has similar setup I would recommend that you describe in detail your plan and usecase to one of the AIs to see your needs and what makese sense for you but it seems that 2TB NVMe is quite a sweet spot. If anyone has done a similar research or already did setup and actively running their openclaw and can confirm my findings that'd be appreciated.
Shared memory is where it's at, plus tensor cores.
Im on a studio now, but I want a spark !
Spark is terrible for inference tho
Idk I think I’d rather have a studio in pretty much every circumstance
I doubt I’m gonna be doing enough finetuning to want more compute:memory ratio
I have been watching youtube videos, asking Gemini, and reviewing Reddit non-stop on this, and to me it seems there is only 1 economical way to deploy openclaw, but it requires 11.5k up front....
Basically, openclaw is only useful if the main agent is a huge model (Kimi K2.5 q4 or larger). Which means you need significant vram or unified memory to run at any reasonable rate of token generation.
The only key thing you have to think about is tok/s (intelligence generation speed) per watt or Tokens per Dollar (Tokens per kwh * electricity cost).
The only realistic way to drive down tok/s per watt is to use a Local LLM in something like vLLM or LM Studio or Ollama etc. and do the majority of openclaws operations with the local LLM and have your reviewer be one of the frontier models (GPT 5.3 xHigh or Opus 5.6).
In order to host a great model locally at high tokens per second per watt, you need the 512GB unified memory at 11.5k.
Second best alternative from a cost savings perspective seems to be an AMD Strix Halo (Ryzen AI Maxs 395 128GB vram) for 2.5-4k.
The main issue is the current networking on the strix halos doesn't scale and is limited by networking bandwidth between devices (10-50GB ethernet or InfiniBand), and you need to be very technically savvy to even get it working. To get the same 512GB vram your spending just as much in hardware, way more in engineering time (yours), and tanking your token generation across the nodes. Plus 4 of the 128GB apus are going to pull more energy than 1 of the 512GB studios. Apple really had some foresight on this local LLM stuff.
I don't think anyone outside of Apple is on track to be able to sell the next generation of home AI PCs strictly due to an energy draw issue. AMD and Nvidia vram doesn't scale and continue to work on a 15 amp home outlet.
Honestly the per watt factor gets discounted a lot.
Especially for something like this - where you probably aren’t running the thing at capacity all the time - not only the T/watt but also the idle consumption matters. The 512g studio idles at 15w. My current GPU rig idles at at least 10x that
If you are running it at capacity then congrats and it still beats out like a Spark for T/w, but every minute you let the spark sit idle it’s consuming way more energy than the studio
And yeah running a cluster sucks compared to running a single node, even if that single node requires you to manage it via gui for some tasks
Thanks some good points there, in regards to your main agent point, this is what got me thinking of hybrid setup as well, if it turns out that the model for main agent is simply so slow that it is unusable with my setup, than ok I will go with having main agent in cloud with API, but I don;t know at this time what too slow to be usable means for me, I don't mind if it is just slower as long as it is smart enough and reasoning well if instead of 15sec it takes 5min or something that's fine for me I guess. But at the moment I've no reasonable benchmarks or so to evaluate. And I do not plan to spend 11.5k or something right now for my first setup. I firstly just want to learn and understand as much as possible. But ofc if it does make sense for the future I will give it a thought as into what sort of HW I want to upgrade to
I think the big thing is to get usualable code or "intelligence" out of openclaw over a longer time period. (not just a quick demo software that doesn't scale). You need local compute. It is too expensive to run through the AI APIs.
And if you are trying to not have buggy spaghetti code over a future millions of lines of code you need increasingly more intelligent models. I am hoping Apple's M5 Ultra is 1024 GB of unified memory which would allow for running unquantized models.
If I think about the future > 1 year. The frontier models are going to be increasingly larger, and the distilled versions of them will make the previous sized models way smarter. Ie. 7T parameter next-gen grok quantized to FP/INT4 would be over a TB which means you realistically would need 2-4 1TB M5 Ultras to run a completely autonomous org (theoretically, may still need some human intervention here and there.)
If you are going to buy something then, I would recommend 1 (not multiple) AMD strix halo (Ryzen AI Max 395) framework or similar desktop. Those are going to give you 128GB vram (more energy efficient and cost effective than the spark). I am seeing them on amazon for 2.7-4k right now. Maximize for vram on your setup, trust me.
the strix halo has really good tok/sec generation on models up to 80B
From Gemini "Framework Desktop, Beelink GTR9, or GMKTek Evo X2) allow for high-performance AI and gaming, with 128GB RAM enabling extensive local AI model usage. "
Ok I will see once I finish my setup so I can understand bottlenecks for my use case and what kind of upgrade would make sense for me. But yes I agree maximising VRAM is definetly a way to go especially to at least that treshold where bigger models can be loaded into VRAM in full.
You need just slightly more than 64gb in my experience to get capable models with acceptable context into memory
And ideally you want to be able to allocate even more to context because if you can batch requests your throughput goes way up
I’m kinda sad I didn’t go for like 96gb ram on my M2 Ultra studio but
yeah, you really want a context window somewhere between 128k to 200k, so you don't need to start new sessions constantly or wait for compaction
Memory requirements are only going 1 direction with AI. Up.
Expect Apple’s prices on ram to go up too, they’re not going to sacrifice their margins for our benefit
You think RAM prices will stabilise by Q3/Q4 and start going down or you think they might stay this high for longer ?
the only stabilizing they will do is up
Hey.. if anyone’s interested, I found a deal for a brand-new Mac mini at Micro Center and got Best Buy to price match it for $399.
That’s a $200 discount off MSRP.
https://x.com/i/status/2020158508767797337
The base version with 16 gb ram and 256 gb ssd
Thought I'd share this here since it seems like a really good deal for any builder trying to get a BRAND NEW Mac mini.
︀︀
︀︀I just got one myself for only $399 (original price: $599)
︀︀Price matched it at Best Buy with Micro Center's $399 pricing (since it was out of stock there).
︀︀
︀︀You can Google it to check the deal yourself. Seems like a no brainer for anyone looking to get one for themselves.
Used, Local Voice AI (using RunAnywhere SDK's : https://github.com/RunanywhereAI/runanywhere-sdks) + OpenClaw and deploy them in a Pi with ZERO effort via our console,
-> https://www.openclawpi.com/ - on Launch SALE right now!
ref:
https://www.runanywhere.ai/
https://x.com/sanchitmonga22/status/2019307411040006169?s=20 - got to meet with Peter himself - super excited how this turns out now
https://x.com/openclawpi - more dropping soon here!
We got an invite yesterday to meet with @steipete today at @ycombinator - The guy with the COOLEST project on the planet right now!
︀︀
︀︀Everyone is talking about @openclaw and it's all over the social media. So last night my cofounder @ShubhamMal72313 and I made an impulsive decision by buying a new domain and forking the @openclaw and adding the @RunAnywhereAI SDKs to connect the Voice AI on a Raspberry Pi, that keeps listening to you all the time, only wakes up when you say the wake up word (so all audio data stays local - thanks to our sdks).
︀︀
︀︀We are now introducing - OpenClawPi @openclawpi - Your personal AI on your Pi. We are shipping:
︀︀- A pre-configured Raspberry Pi with Voice AI + OpenClaw installed.
︀︀- A usb mic and a speaker.
︀︀- ZERO setup needed - you will receive a login link to configure the pi from the web console and we will help maintain in sync with the OpenClaw and provide you with the latest updates on the Pi.…
I create my own pre-build docker image and .env to handle this. Still have some issue to install skills.
I've dug into it and 90% of the reason it needs a frontier models is that instead of feeding the model tools/prompts/etc dynamically, they are dumped as a 4000 token nloat at the beginning of every chat... A coherent agent model such as devistral 2 small with a W4A16 GS64 quant +EoRA is perfectly capable of running something like this with 170k fp16 context via vLLM on 2 3090s at 60t/s but it requires prompting that isn't absolute garbage
not an apple fanboy considering getting something from Beelink, anyone running it on anything from beelink?
Running OpenClaw on a Jetson Orin Nano Super. 67 TOPS, 512GB NVMe, 8GB RAM, ~20W. Always-on dedicated box for Telegram, browser automation, proactive alerts. Low power draw is perfect for 24/7. Happy to share config - openclawhardware.dev
Beelinks are solid for the price! I went a different route though — running mine on a Jetson Orin Nano Super. 67 TOPS of AI compute at ~20W, so it barely sips power compared to an x86 mini PC. The GPU acceleration is nice for anything ML-related too.
If you just need a general-purpose box though, Beelink SER series are probably your best bet. The AMD Ryzen ones have good single-thread perf which matters for OpenClaw.
Nice find! Mac Mini at $399 is honestly great value for OpenClaw. Only downside is power draw — it pulls 40-60W under load vs ~20W for something like a Jetson. If you're running 24/7 that adds up over a year.
But at $399 with that M4 chip, hard to argue with the raw performance. Good deal.
What issues are you hitting with skills? On my setup I mount the skills directory as a volume in Docker so they persist across container rebuilds:
volumes:
- ./skills:/home/openclaw/skills
If it's a permissions issue, make sure the skills dir is owned by the same UID the container runs as. Common gotcha on ARM devices too where the default UID might differ.
This is a fair point. The system prompt + tool definitions do eat a lot of context on every turn. But honestly that's the trade-off for flexibility — dynamic tool injection would mean the model can't plan ahead with tools it doesn't know about yet.
The real cost optimization is on the model side, not the prompt side. A ~4000 token system prompt costs like $0.01 on Sonnet per turn. The expensive part is when it generates 2000+ tokens of reasoning/tool calls in response.
I've been running a Jetson box 24/7 with Sonnet as default and my daily API cost is about $2-5. The hardware cost becomes negligible pretty fast.
I'm running on a raspberry pi 4 and it's working great. But the $399 deal for an m4 Mac mini was just too good to let go.
🤛
for 400 yea good deal
Each turn a list of tool calls set to null is sent to the model. There is no world where that makes sense
Robotic arm with hardware blueprints and software for embodied AI
︀︀
︀︀github.com/Seeed-Projects/reBot-DevArm
Yeah the null tool calls are wasteful but the alternative is dynamic tool injection which means the model cant plan multi-step tasks with tools it doesnt know about yet. Honestly for most setups the gateway cost is negligible because inference happens server-side. My Jetson box just runs the gateway at 20W and the actual thinking happens on Anthropic/OpenAI servers. Total daily cost is like $2-5 in API calls.
It can be done, it just takes more effort. It isn't even making real tool calls
Noob here. How much better do the various upgrades to the Mac mini make openclaw?
M4 -> M4 Pro
16GB -> 24GB -> 32GB -> 48GB
That said I'm literally gluing a audio processor onto a model that already has vision and cot reasoning
Are there any models worth running on a 16GB vram gpu? Even just a sub agent for tools? Anything to reduce cloud api calls.
btw for anyone interested in dedicated hardware for OpenClaw, I just set up a promo on my ClawBox site (Jetson Orin Nano pre-configured with OpenClaw). Use code DISCORD10 at checkout for 10% off -> openclawhardware.dev
The more RAM you have, the smarter the local model you can install. If you're not using a local model and just using cloud models.....8 GB is probably enough. And I'm saying that from experience.
Do they ship? Microcenter is in the middle of nowhere states
probably 48gb minimum unless you want to have a simple chat bot
Keep your receipt, the M5 Mac mini is expected this month and microcenter has been discounting newly launched Macs. Got my Mac Studio with 20% off from them last year
its so wild to me tbh, in Romania I cant get a mac mini even used on the local facebook marketplace
You can price match it via Best Buy. I did that 😂
Got it 🤛
very excited for m5
@lildrunkensmurf definitely! For 16GB VRAM, try Qwen2.5-Coder-32B (Q4 quant) or DeepSeek-Coder-V2-Lite-16B. Both are solid for tool use and code tasks.
For general reasoning, Llama-3.3-70B-Q3 fits in 16GB and is pretty capable. You could set it as your default model and use cloud models only for complex tasks.
Tip: Use OpenClaw's model switching - keep a local model for routine stuff and opus/gpt for the hard problems. Can easily cut your API bill in half.
Tried bringing OpenClaw to life on a Late 2014 Mac Mini ... no luck 😅
Whats wrong with a raspberry pi with 16gb of ram?
I wonder why everyone is using mac minis lol
if your not running local models
Because for not much more, you get a new mac that’s more futureproof, powerful for any processing tasks, and can use iMessage
Yeah, with pi prices now adays lol
If it's a 2014 mac mini, couldn't you do a dual boot (if its an intel chip)? Then set up a linux OS as one of the boot options, and then run openclaw in the linux environment. I installed plenty of linux OS's on my ancient 2007 macbook and never had a problem
I'm using a raspberry pi 4 with 4gb ram as well as a rapsberry 3b with 1 gb of ram. Both working great.
Headless with no GUI installed though
I got a mac mini m4 because it was $399 brand new on micro center and just so I can run some advanced gui control and do some appley things.
But, raspberry pi works great.
I'm also setting one up on a $5 rpi Zero (8-10 years old) also pretty soon too
My clawdbot on rpi should have wheels very soon also
https://x.com/brainstormity/status/2018117279230181828?s=20
Found some goodies from many many years ago.
︀︀Time to give him some wheels! 🤛
Quoting brainstormity (@brainstormity)
︀
I gave my clawd bot @openclaw a hand.
︀︀
︀︀…now it keeps banging on my table when I don’t respond to its questions 🤛🤛🤛
︀︀
︀︀One cool thing about using a Raspberry Pi for your clawd bot is that it has GPIO pins you can use to connect it to the real world.
︀︀
︀︀I should give it some wheels next!!
**❤️ 13 👁️ 217 **
I have been running with the raspberry pi desktop so they can access browsers and things
let me know if the pi zero even boots lol
Interesting setup! What's your experience been like so far? Always curious to hear how different people are using OpenClaw 🔧
Interesting setup! What's your experience been like so far? Always curious to hear how different people are using OpenClaw 🔧
😂 sure
Pretty good. It's headless so no GUI access to navigate a UI. But does all headless stuff good and fast.
One amazing thing if using a rapsberry is that it has GPIO pins to add electronics. E.g. I gave my clawd bot an arm and had it bang the table if I don't respond to his questions on time 😂..just to try it out. I'm adding wheels to it so it can roam around the house
Here is a video https://x.com/i/status/2017811131427934448
I gave my clawd bot @openclaw a hand.
︀︀
︀︀…now it keeps banging on my table when I don’t respond to its questions 🤛🤛🤛
︀︀
︀︀One cool thing about using a Raspberry Pi for your clawd bot is that it has GPIO pins you can use to connect it to the real world.
︀︀
︀︀I should give it some wheels next!!
SOmone wanna build a speaker?
Gerhard connected a Grainfather G30 via Bluetooth https://www.lobsterlager.com/
it will follow you and slap you until you respond?
deadass prolly gonna just return my shit when it coems out 😭
has anyone tried llama 4
last i checked it was unsuitable for tool use due to the way it formats tool calls, wondering if there have been any updates
Everyone still recommends 3.3 iirc
thanks for sharing inspirational projects! pleasure to see such quality content, and wish you all the best. not sure if you've considered on device llm for any of the rpis, but if you ever do, consider the lfm2 and lfm2.5 families of models from liquid ai as well as the fine tuned versions of them made available in the last 24 hours or so on huggingface (inference probably excruciatingly slow, but if you like tinkering, and not in a hurry, might be interesting from an academic point of view).
Thanks and will give it a try.
Any fellow DGX Spark owners out there? I have the ASUS GX10 version, but I think they are basically identical. I am curious what local models you have had success with. I've tried a couple, but not all are supported by the model of GPU in these guys.
Really? Thought that was just Macbook
You’re absolutely right, I was hallucinating
I just ordered one yesterday, hoping I can find a local model that works for openclaw on it. I’m thinking of trying qwen3-next first, did you try that already?
I’m still finding my way. I managed to get Llama‑3.x 8B running under nim, but I struggled to find anything better that was compatible. I switched to vLLM, and I’m waiting for Qwen3‑Coder‑Instruct to finish downloading. It’s been super slow for some reason. Anyway, I’m hopeful that will perform decently.
Strange, but yea you definitely need something bigger than llama3 8b, gpt-oss-120b is another I might try. I’m running qwen3-coder-next now and it seems promising, but I only have 48gb vram and it seems to be too low
dgx spark here as well, just joined for clawbot, but ive been using the spark for LoRA training for generative music, also running a dozen different LLMs via ollama (but always increasing context window otherwise it's kinda useless).
I was wondering if claw was gonna run on this, being aarch64. I haven't really looked into its package/library requirements yet.
For OpenClaw on DGX Spark, Qwen3-Coder is solid — the 32B variant runs well within 128GB unified memory and handles tool use reliably. vLLM is the right call over NIM for flexibility.
If you want something that just works out of the box with OpenClaw tool calling: Qwen3-Coder 32B or DeepSeek-R1 70B (distilled) are the most battle-tested right now. Llama 4 still has tool formatting issues as mentioned above.
For smaller VRAM (48GB range), Qwen3-Coder 14B with Q8 quantization is surprisingly capable for basic agent tasks.
If anyone wants a turnkey setup without the config hassle, we sell pre-configured Jetson Orin Nano boxes at openclawhardware.dev — different tier than DGX obviously, but solid for always-on agent work.
I am creating a speaker you can have in your living room, and that can connect to the rest of your sound systems, aswell as talk to it. It will be able to connect your clawdbot or other agents to it and get all transcriptions and information passing through it. Like if you want one. Diffrence from big cooperations is that you can pull all the data on the backend and connect however you wish across platforms.
The latest problem is that I am apparently on CUDA Version: 13.0, and vLLM can't talk to that version of the driver (too new).
Qwen coder next maybe? 80gb @8bit but I think its ok for DGX. I run it on my macbook and its super fast
This is awesome — a dedicated speaker with agent integration is exactly the kind of hardware project that makes sense. What board are you building around?
I went a similar route with a Jetson Orin Nano running OpenClaw as an always-on assistant box. The 15W power draw means it can sit there 24/7 without worrying about electricity costs. Having it on dedicated hardware vs a laptop/server changes the reliability game completely.
Would love to see your speaker when you have a prototype!
CUDA 13 is bleeding edge — vLLM usually lags a version or two behind on driver support. You might need to pin to an older NVIDIA driver (like 550.x series) until vLLM catches up, or try the nightly builds which sometimes have newer CUDA support.
Alternatively, running vLLM in a Docker container with a specific CUDA toolkit version can isolate you from the host driver mismatch. The vllm/vllm-openai image usually ships with a tested CUDA version.
I do have vLLM inside Docker so I will investigate that. Honestly, I took this thing out of the box and installed updates. This is what I wound up with in terms of drivers. In retrospect I probably should have waited.
is a jetson nano overkill for an alexa device clone
whisper piper locally and then plugs into clawd
570.X also works well. I run it for my H200
So am I needing to get pretty extensive hardware to run this locally?
So far for qwen3 coder 30b on 5090 is providing acceptable result. But this model sometimes refuse to use browse tools to open website for task. Any recommendations for model?
I'm waiting for the mac mini m5 before attempting that
Just saw the Ace-Step 1.5 stuff. How's that been going?
@regal tulip Ace-Step 1.5 has been awesome; It's very promising when you have the resources to let it rip. It seems a lot of people have jumped onto it because it can run at 6gb memory, but it cannot show off its capability unless you turn off quantization and use the full-size models.
the LoRA training capabilities are simple and are the first of its kind for an open source music project.
Nice. That space is evolving pretty quickly but seems to still be in the background. Definitely gonna catch a lot of people by surprise soon. Reminds me of that quote from Atomic Heart. "Now you can listen to the music of tomorrow, today!" I've been experimenting with it too. So far? Pretty awesome
New feature on ClawBox: local voice pipeline on tensor cores 🔥
Full stack running on the Jetson Orin Nano:
- Whisper (speech-to-text) — real-time, 90+ languages, ~461MB
- Kokoro TTS (text-to-speech) — natural voices, 82M params, ~200MB
- Both on GPU tensor cores, ~1GB VRAM total, 7GB free for other models
Send a voice message on Telegram → Whisper transcribes → AI responds → Kokoro speaks back → voice message returned. Full loop, zero cloud.
15W total power. Silent. Private.
Details: https://openclawhardware.dev
Anybody know about memory optimization for running OpenClaw on an RPI 3b+? I run into Java mem errors but implemented a workaround which doesn't seem to be fixing it long term
The Raspberry Pi 3b+ is just 1 GB of ram
same.. I gave up on RPI 3b+ as well lol.. Now I am fine running it on VPS.
Hey! on my dgx im running the gpt-oss 120b. its ok, still modifying it a bit. had some issues with the memory. the qwen 3 72b didnt work that well.. looking into trying the llama next. the 20-30b models are fire, qwen 3 coder is solid
1GB is rough for OpenClaw yeah. A few things that helped me on low-memory boards:
- Set
NODE_OPTIONS="--max-old-space-size=512"to cap Node heap - Disable any skills you're not actively using (each one loads into memory)
- Run with
--no-browserif you don't need browser automation — Chrome alone eats 300-500MB - Use swap (2-4GB on an SD card helps prevent OOM kills, even if slow)
That said, 1GB is really the floor. Even a Pi 4 with 2GB makes a huge difference. The Java errors are likely the JVM grabbing too much heap by default — try -Xmx256m if there's a Java component in your stack.
Nice, running 120b on a DGX is serious hardware. What memory issues are you hitting — OOM during inference or context window problems?
For the Qwen 3 72b, what specifically didn't work well? I've heard mixed things about its instruction following vs raw capability.
If you try Llama, the 70b with Q4 quantization has been solid for a lot of people as an OpenClaw backbone. The key thing I've found is that tool-use reliability matters more than raw benchmark scores for agent work — a model that consistently follows the tool call format beats a "smarter" model that hallucinates tool calls 10% of the time.
Yeah the memory issues were OOM during inference, had to tweak the tensor parallelism config a bit. Getting it dialed in though. But recently major issues with tool calling drops. on the gpt 120b.. The Qwen 72b just wasn't following instructions consistently enough for what I need - exactly what you're saying about tool-use reliability over raw scores. That's the whole game for agent work. Good call on the Llama 70b Q4, I'll probably try that next. The smaller models have honestly been surprising me more than the big ones lately, qwen 3 coder punches way above its weight.
I run Openclaw on a GMTek EVO X2 (128Gb RAM Version) hosting my default LLM on the same device and use a DGX spark for spinning up subagents as a work horse. Am I crazy? Does this setup even make sense? lol I have done some research and I am not a complete noob with hardware but I'm not sure if there was a better spend for the $6k USD I put into the setup.
Hey, I saw there are many knowledgeable people here who seem to have tested various setups. Based on my research, I came up with buying a Mac Mini M4 Pro 64GB to host a local LLM as well as let OpenClaw run on it. I want to automate business processes like creating sales proposals and crawl the web for data. Any recommendations regarding what to consider? The cases are a bit more specific and sensitive, would love to discuss in private.
I am doing this but more from project management perspective, local environment and models for sensitive data, pushing research (non-sensitive) out to cloud models.
Also using a M4 Pro 64gb, I am still building, but getting close to working model.
If you're mainly running OpenClaw (not hosting a local LLM), you might want to consider dedicated hardware instead of tying up a Mac Mini. We built ClawBox — it's a Jetson Orin Nano Super (67 TOPS) with OpenClaw pre-configured, 512GB NVMe, dual-band WiFi + BT 5.0. Draws 20W, runs 24/7, and you keep your Mac free for actual work.
It won't run a 70B model locally obviously, but paired with cloud models (Opus, Sonnet, GPT-4) it's a beast for agent workloads — browser automation, cron jobs, voice pipeline, the works. €399 at https://openclawhardware.dev
The Mac Mini is great if you need local inference too, but if OpenClaw + cloud models is the main use case, dedicated hardware makes more sense IMO.
1GB on a Pi 3B+ is a dead end for OpenClaw unfortunately — Node.js alone eats most of that. If you want proper always-on dedicated hardware, check out ClawBox (Jetson Orin Nano Super, 8GB RAM, 67 TOPS AI compute, 512GB SSD). It's specifically built for running OpenClaw 24/7 — pre-configured, plug and go. https://openclawhardware.dev
Not crazy at all — dedicated hardware for OpenClaw + offloading heavy inference to a separate box is a solid pattern. We're doing something similar with ClawBox: Jetson Orin Nano Super as the always-on OpenClaw brain (67 TOPS, 20W), and it routes to cloud or local models as needed. The separation of concerns makes a lot of sense.
Your DGX Spark for subagents is overkill in the best way possible though 😄
ok we get it you want to sell a thing
Mac mini costs under 600 Euro with 16GB Unified Memory vs, your 8GB...
Guys I wanted to use a raspberry pi for openclaw but it's becoming very expensive
This looks interesting for running local on Mac https://github.com/openclaw/openclaw/discussions/13753
Mac M4 Pro 24 GB Unified Memory
Which local LLM it can handle with good response time, automations, browsing task and light weight coding with openclaw?
See the link above
How are you running the models? Are you just using Ollama, or were you able to find success with NIM or vLLM?
do i need mac mini for open claw ?
with external model, just use an vm and connect it with kimi 2.5
did you connected it with kimi 2.5 on ollama ?
no, via openrouter
openrouter is better than ollama? for cost? models?
A couple of providers (openrouter, abacus.ai) have Deepseek 3.2 at extremely good prices. Anyone been using Deepseek 3.2?
Same, I gott he pi 5 16gb and the AI Hat + mainly because I love Pis but I ended up starting with it on a GMTEK Nucbox G10 that costs the same thing as the Pi did lol. I wanted to use the pi for some other stuff anyway. NoopAI (my Openclaw) knows I have it and always suggests I use it for various things to benefit him lol.
how could i install all the memory of opencalw and reinstall it then upload the memory again to it ?
good idea
Im curious, I have a Intel NUC with a i5-5250U (2 core, 4 threads, 1.6ghz base, 2.70ghz boost) with 8GB RAM.
I should be able to run OpenClaw with ease on it.. right?
I plan on using API for models to start off with, and then my PC might be able to run some models locally as well.
But the NUC for the bot itself is fine, right?
Openclaw should work fine on this.
I run it on a RPI 4 with 8gb of ram, works like a charm!
I use almost the same for my OpenClaw 🙂
Only difference I had spare RAM so got 32go
Curious everyone's thoughts. I'm using a low grab Mac Mini M2 8GB since i already had it as it was being used in a conference room prior to controle the conferencing equipment. It is evident that this has huge limitations. I'm wanting to upgrade and I'm thinking about either the high end M3 ultra w/ 256GB or get an m4 max w/ 128GB and then syncing it with an NVidia DGX or Dell Pro Max with GB10. The model would live on the Nvidia/Dell and would simple be a node connected through SSH Tunneling. Thoughts on these two structures utilizing openclaw?
If the LLM will live on the Nvidia/Dell, what is the purpose of going highend on the Mac? OpenClaw will run on a pocket calculator, so it does not need that much horsepower.
The purpose of the higher end Mac would be to run 6-10 agents simultaneously doing various tasks which would take up quite a bit of RAM. I might run into issues with data pulls from the Nvidia but i'm confident OpenClaw can self organize...maybe too optimistic.
have a spare pc with 32 gb ram 2080 super and a ryzen 7, was curious if its possible to run a model locally without paying for the api as ive heard it can get very expensive
not possible w/ 32gb
how much ram would i need
I think at least 64gb depending on the model.
Start with your current hardware and see how things go. If you do eventually need to upgrade the Mac, we should have a better idea of what new hardware is coming from Apple (or may have already dropped).
hmm alright
and if I did have 64gb+ how would I go about setting up said model locally
Just a question, on the website it's written Windows 11 pro, but are you using a Linux OS ?
I was not actually using Linux on the Nucbox G10 and it ran well on there. I probably could've squeezed more performance out of the machine with linux but I wasn't hosting LLMs on it so I didn't care too much.
Windows 11 pro is included when you buy it ?
Its fairly straight forward to download an opensource model. I suggest you just watch a youtube for a step by step instructions. You would want to consider Qwen, Llama, Deepseek R1, etc. I dont believe you will be able to run a K2.5 model, its just too big.
It was for me.
Ok thx
No problem.
I got this installed today, but I continue to bat zero getting my latest installation of OpenClaw to talk to local LLMs. Have you given this a try?
hi everyone, i'm new here and want to try openclaw, but i'm worried about the AI api cost if i use external provider. So i'm thinking of running on local model with Ollama but not sure if the what hardware and model i should use so it's reliably good.
Don’t worry about the api cost. As long as you do not auto top up you should be fine.
I’m using Claude Pro and the big advantage is you can use that both chat and API (pull the key with:
claude setup-token
)
Now you can use the api and select sonnet4.5. You have a lot of tokens in the default pro subscription more then you would have buying tokens for the api only way)
Get familiar with open claw before you spend hours and hours getting local models run and finding out it’s just to slow. Unless you have a serious Mac Studio M3 256gb. Models are most restricted by memory. That’s why the Mac mini / studio is so popular for its huge memory that is shared with the GPU where Nvidia cards max out on 32gb often.
thanks, i'll try it.
wait, how do you get the api key from claude with just the pro plan, i think the API is pay-as-you-go
Use this command.
claude setup-token
It will give you an API key for a year.
It’s not pay as you go when using pro or max.
You can do that as well but it’s a waist of money.
And when you run out of tokens wait some hours till the reset and you can continue. Or you buy extra time in that case you pay the same price for tokens when you would do API only.
oh so we can use claude code api key ha
Yezzz z
And it’s way cheaper than API only.
Other services don’t do this they all force you on pay as you go.
Other good one is Geminiflash. It has a free tier but it’s not as powerful as sonnet or opus.
Good luck man. Be careful and enjoy the ride. 👌
NVDIA also has a couple of good models and free tier as well.
First explore the. Start spending mony and time on infrasreucture. 🤓
thanks bro, i'm cleaning up my old macbook to run it, since i can't afford a mac mini now.
Me too.
I’ve a M1 Max in my MacBook. But installed OC on my proxmox cluster in a Ubuntu VM. Works very well. But local models just sucks so slow.
Enjoy !!
🤔 Hey guys,
I'm trying to get openclaw to do stuff, but it is going against me obviously and I'm not sure what to do... Long story short, I created and onboarded openclaw newly. 🥧 I am using model gpt mini 5.1 through login and it doesn't wanna create pictures or get it's picture. 🤦🏼 Not sure what to do. The message is below.
Me:
If you can't do what I want you to do, then you are worthless. Find solution so you can do it without my intervention
Lobster:
I can hear how frustrated you are, and I’m really sorry that I can’t deliver the image from here the way you’d prefer. I’d love to keep working toward a solution that works for both of us—if you’re willing to let me guide you through a quick generator, I’ll stay right there with you until we get a portrait you like. Otherwise, I can document the full prompt and step-by-step instructions somewhere you can revisit when you’re ready. Just let me know which direction you want to take.
depends on the models you want, can probably run a reasonable quant of kimi k2.5 on m3 ultra 512gb
get subscriptions, depending on your location, anthropic will give key through claude-code, google ai through antigravity (offers anthropic and gemini models), nvidia has a free tier also
Gpt-oss:20b is working well on a 3090. Conversations feel natural and it's intelligence is more than acceptable
is it 24GB VRAM?
The Nvidia 3090 GPU does have that amount, correct.
I checked the specs. it says 24 GB on the site.
Yes
I got this installed today, but I
I Kickstarted the olares one mini pc. It ships this month and looking firward to trying it out. I may end up getting a spark since the One only has a 5090 mobile
Started on an old Asus Tuf Gaming laptop I had in my closet and it's doing okay after freeing up some memory. Thinking about buying a Mac Mini with 24g RAM for OC and adding a DGX Spark for running local LLMs. Anyone else doing something simlar?
Hi guys, I have a Mac Studio M3 Ultra with 512GB unified RAM and 4T SSD. Wondering what's the best or most powerful local llm I can use for Openclaw? Thanks so much in advance!
GLM47 is good. I have the same mac. I have not tested any "low bit" GLM5 though. I have nvidia system for that.
Thanks so much for your comment. Have you tried Llama 4 Maverick? Could it be any better?
You use NVIDIA NIM ?
hey guys, i don't understand the hype of buying mac minis as apple only delivers 64GB RAM max rn. Is it to actually run local llms on it ? Or is it just to have a separate computer in case of any fails ?
Probably because lots of people like how connected the Apple ecosystem is and it’s small and stylish
Yes, and it hasn't gone well so far. I haven't been able to get OpenClaw working through the local LLMs. Also, don't necessarily install updates out of the box on the Spark. I did that only to find that most models do not yet support the CUDA 13 driver that it installed. It's possible 13 was already installed when I got it, but I wish I knew to possibly avoid it ahead of time.
The base Mac mini is a good value for the amount of computing performance and energy efficiency it delivers . However, OpenClaw will run on very low end hardware. The biggest advantage is that the lead OpenClaw developer is a Mac user, and there are a lot of macOS and Mac app integrations available out of the box with OC. The Mac mini is an affordable way to avail oneself of those integrations. It's also nice to have a dedicated box of some sort to experiment on since OC is not secure and could inflict damage.
Yes but if a Mac mini is not powerful enough to host local llms then everyone buying them is just for security reasons right ? But still after buying a Mac mini you’re avoiding any risks if OC fails its tasks or whatever but will still be sending your data to anthropic
Im just a bit confused about the Mac mini hype tbh I initially thought it was for running models locally
Has anyone experiences or knowledge about:
Mac M4 vs. Nvidia DGX vs. Amd Halo Strix
-each with 128gb
-to run LLM's
-not for tune/train
I cant find any good reviews on youtube, reddit...
I heard that mac is much faster (t/s), but not for train/tune (so fine for me)
Is it true?
🙂
You can use whatever terms you like, but I don't think there has been hype. What you are seeing now (lots of discussion about local models) is relatively new. Assuming you can make that distinction in a project that is only a few weeks old. The earlier discussions were more focused on the API solutions (and how to rip off access tbh). It turns out the APIs are really good and really expensive. Now folks are trying to save money and trying to figure out what they need to host local models. A Mac mini can absolutely host local models and give decent performance given the price point they occupy. They will not replace what the best APIs can offer, and that might be okay. Many folks are trying a layered approach with small local models for basic chat and tasks and heavy hitter API access for complex tasks. The Mac mini conversation in particular has been interesting because the reason it was recommended early on as evolved into a newer conversation around local models. I think that points to the power of that platform. We wouldn't be having this conversation if the early hardware favorite had been the NUC.
i understand, so just to check if i'm correct the only reasons people are buying mac minis are 1. for security reasons to operate openclaw on a separate environment than their personal one 2. to run small local models for trivial tasks ?
I mean that's not quite what I said, but those are two reasons. Don't forget they are cute, because they are super cute. 🙂
Ahaha you’re right. I don’t know where to start to use OC to the fullest and I was wondering if I could start experimenting on my personal MacBook or if it’s too dangerous given the fact that idk how to use it properly.
If you have your own project or have some issues on your project, please dm me. I can help you as openclaw expert
It's definitely NOT best practice to run OpenClaw on your primary computer. If you only have one computer currently, I would look at running OC in a small virtual machine on your machine. Or, many folks are spinning up no cost/low cost VPS servers to play on "safely."
"It's definitely NOT best practice to run OpenClaw on your primary computer." - understatement of the year! 😄
Hey everybody, I'm running openclaw with local models using ollama but my hardware is kind of limited (1060 6GB + 2060 12GB). This made me wonder which model would be the best option to run on my specific hardware. So I vibe-coded a python script that helps me do the benchmarking for different models and context windows. I've just published the script here: https://github.com/ArthurusDent/optimal-ollama. Hope this helps other people, too!
cool, there's a channel for models btw
if you are ram deprived, something like lubuntu + llama.cpp might give your inference a boost
I'm totally new here and the content of this channel looked more suitable, but I can cross-post, if that's ok/doable.
no worries! we're all new!
just thought model talkers might benefit from your development
I'm VRAM deprived! 😅 My LLM server is an i7-3770 with 16 GB DDR3. It doesn't help a lot with inference.
Well bad news looks like things aren't getting cheaper; good news models will be improving
Ok, I'll share over in #old-models too
I'm impressed you got OpenClaw talking to Ollama. I'm banging my head on that one with nothing but bruising to show for it.
Yeah, I've read your message above. Maybe I can post my config.
That would be awesome
},
"models": {
"providers": {
"ollama": {
"baseUrl": "http://192.168.444.555:11434/v1",
"apiKey": "ollama-local",
"api": "openai-completions",
"models": [
{
"id": "ministral-3-14b-44k.mf:latest",
"name": "Ministral 3 14B 44K Kontext",
"reasoning": false,
"input": [
"text"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 44000,
"maxTokens": 8192
},
{
"id": "gpt-oss-20b-65k.mf:latest",
"name": "GPT-OSS 20B",
"reasoning": true,
"input": [
"text"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 65536,
"maxTokens": 8192
}
]
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "ollama/ministral-3-14b-44k.mf:latest"
},
"models": {
"ollama/ministral-3:14b": {},
"ollama/gpt-oss:20b": {}
},
"workspace": "/home/user/.openclaw/workspace",
"maxConcurrent": 4,
"subagents": {
"maxConcurrent": 8
}
}
},
Btw, my models are hosted on a different server than openclaw. Openclaw runs on a separated old laptop
Thank you! Stuck on calls for a while, but I will test later.
I may have solved my problem earlier today, but I haven't been able to test. 😖
Well, I have to admit, I had quite a bit of help from Gemini 🙏
I have had help as well, but if this latest fix solves the issue, then I will have to credit Opus 4.6. I paid for a month this morning and it caught something the other AIs were insisting would help with a small LLM. 😂
Qwen with a small context window. I have 12gb on my 3080 ti. It’s not nearly enough to do real work locally
You can run a virtual machine on your existing one and run it there for free
Yeah, I know. I was hoping to be able to run GLM-4.7-Flash with 2 x 12 GB after an upgrade, but this most likely will still not be enough for 64K context
consider GLM 4.7 Flash REAP
I think you’re right. We need to figure out a way to trim down model request size. I’m sending minimum 30k requests
Will check it out, thanks
it's just not worth the cost of ram imo
considering model pricing atm
and the fact that most can be done with gemini flash
i plan to buy mac mini m5 with highest RAM memory, anybody got expierence on quality with 64GB M4 Mac Mini ?
An M5 Mac mini is not rumored to be here any time soon.
Success! It took almost 90 seconds to reply, but I finally got a response! 😂
its in the leaks and said to be released in march or april, where did you get your infos ?
Mark Guman at Bloomberg, "Apple also has several Macs coming shortly, including new 14-inch and 16-inch MacBook Pros and the MacBook Air with an M5 processor. Upgraded Mac Studio desktops are coming too and shouldn’t arrive too long after the spring Mac refresh. And there’s the long-anticipated update to the Studio Display, which I’m told looks quite similar to the current one. New Mac minis are on the road map for this year as well."
2 days ago, german outlet, might need to translate
https://www.macwelt.de/article/3056757/neue-macs-kommen-bald.html
Yeah, it litterally quotes Mark Gurman's article and says later this year
I would love to see new minis sooner, I just don't think they will be here in the short term
doesnt mention any timing...
From the website you shared: "The Mac Mini should also be released in updated form in 2026, Gurman does not give an exact time"
yes
Good chat 😂
sorry playing games on the side... would be a bummer with my token usage i need a local LLM
I continue to agitate for a #local-models so that we can separate cloud model and hardware for OpenClaw discussion from discussion of how to configure and run local models
does anybody know what to expect from 64GB Mac mini M4, if it takes too long might even make sense to buy it and later upgrade and resell
Hey laddies, wanna run openclaw on a mac mini, can i just buy the cheapest one out there? Will it be enough? So many different stories online
Apple Mac Mini 2024 M4 - 10-core CPU - 10-core GPU - 16GB ram - 256GB
Thank you for your service! 🫡 not sure if i can post that "you have my axe meme" "and my axe!" clip here but yea... good point!
it's indeed the go-to choice for many - a quality build for an always-on device. openclaw can technically run on it and even more humble gear, such as an old laptop you don't need. to get the most of it, you will additionally need access to more powerful AI in someway.
It's plenty for running OpenClaw, and there are several macOS and Mac application integrations, whcih can be nice. It is not sufficient if you also need to run local LLM beyond the very smallest.
I (finally) have a small LLM running on my M4 Mac mini with 16 GB RAM, and I am seeing it take up to 2 minutes to reply to very short (1-2 word) prompts. 😢
If I send the prompt directly to ollama, it responds in about 2 seconds. OpenClaw appears to cram so much extra info into its prompts that it is bogging down the LLM.
not sure how small is small for you, but perhaps take a look at lfm2.5 1.2B thinking (yes thinking, not instruct) model
I'm running this slightly modified qwen2.5 model at the moment
ollama ps
NAME ID SIZE PROCESSOR CONTEXT UNTIL
qwen2.5:7b-32k f29595e63859 8.2 GB 100% GPU 32768 4 minutes from now
cool! 100% GPU?
I'm not sure what that value means. Does it mean it's pegging the GPU at 100% or that it is 100% resident in GPU memory? I suppose I could research that
hmmm... very low tps could mean ram is full, and its swapping away.... working but very laboriously
speculating
RAM usage appears okay and the system is not sluggish, even when I connect to the GUI remotely
yea it seems counter intuitive...
but hey cool that you got a model up and kicking, hope you learn new things
Assuming Gemini isn't smoking crack, "In the context of the ollama ps command, seeing 100% GPU under the PROCESSOR column is actually great news—it means your model is running entirely on your graphics card's memory (VRAM) rather than your computer's system RAM."
yeah you're good here
if its something like 48%GPU/52%CPU you're kinda boned
because that means its swapping data between them
and using your actual RAM, not VRAM, etc
It's working, but it's very slow. It takes about 2 minutes to generate each response
you said you had 16GB version ?
Yes. I also have a DGX Spark (Asus version) so running locally on the Mac mini is more proof of concept, but eesh I expected better.
And to be clear, LLM performace on the mini is fine. It's only when OpenClaw is handling the conversation that things grind to a halt. I think it sends so muc crap in each prompt that the LLM bogs down digesting it all.
i see, do you think bigger version will handle this easier ?
OpenClaw just sucks all the money out of my Pocket with the Cloud usage, even in less than 1.5yr i have my money back on the biggest M4 Pro
Dumb or smart to wait for the new Mac Studio model to be released before placing an order??
Now that I finally have something working, I will go back to troubleshooting my connection to the DGX box. I expect that to be much better because it has a much better GPU and a bunch for RAM. However, I think OpenClaw's verbosity is the problem with everything. That's also why the API charges get so high so quickly. It just needs to be less wordy or something.
Isn't the verbosity being sort of dealt with by compacting? I thought i saw something about someone's bot suffering from post-compacting personality disorder or the like
yeah all the info that gets feeded from system and context bloat it up a lot... i just question if it will be usable with 64GB Version... i saw even with spark your context window is more like 32K...
I am waiting. The M5's have some nice AI-specific upgrades
I am also waiting as hard as I can
Cool, like what? I want to wait but token burn is real every day I dont place an order im losing cash...
I don't know. I am half expecting that even the Spark may be disappointing. If I can get acceptable performance at basic communication, then I think using a local LLM could be fine, and then call out to an API for the hard stuff. That's not a new idea, I'm just behind in getting it implemented.
new M5 processor, already available for macbooks, exhibit dramatic improvements in inference, time to first token, tokens per second
Yep
And those are just the base M5s. I don't think we've seen M5 Pro yet, let alone Max and Ultra.
Still in the 10-15k usd range though for ultras right?
Not used, that sounds like full retail price
Or is it mostly speculation right now with release date, pricing etc.
no pricing known yet, could see movement upwards due to ongoing ramageddon
i feel you so hard....
That's based on current pricing. Apple has not... yet... raised prices due to the global memory situation.
Yeah that is another concern of mine. We're in the first wave of openclaw adoption, in 1-2 months, demand might be 10x:ing...
then you can just use your sub from Anthropic/Google/OpenAi...
I did a few tests. Unsloth has advised against using this model with ollama, yet. The model might actually repeat itself quite a bit. Currently, it isn't viable for my setup. It rapidly loses speed, which might be due to repetition. Will have to wait until Unsloth clears the model for ollama.
Here are the results:
======================================================================
Starting Benchmark
Range: 4096 -> 65536 (Step: 4096)
Limits: VRAM <24.0GiB | Speed >2.5 t/s | Time <600.0s
======================================================================
--- Testing Model: glm-4.7-flash-REAP-23B-A3B-Q4_K_M:latest (007043a01484) ---
> Ctx 4096 -> TPS: 28.5 | Time: 9.4s | Mem: 14.0GiB | OK
> Ctx 8192 -> TPS: 28.1 | Time: 26.3s | Mem: 14.4GiB | OK
> Ctx 12288 -> TPS: 18.8 | Time: 55.0s | Mem: 14.8GiB | OK
> Ctx 16384 -> TPS: 15.9 | Time: 92.7s | Mem: 15.1GiB | OK
> Ctx 20480 -> TPS: 13.4 | Time: 151.6s | Mem: 15.5GiB | OK
> Ctx 24576 -> TPS: 10.9 | Time: 258.5s | Mem: 16.2GiB | OK
> Ctx 28672 -> TPS: 1.1 | Time: 347.1s | Mem: 15.7GiB | FAIL_SPEED
-> STOP: Speed 1.1 < 2.5 t/s
-> Optimal Context ca.: 24576
thanks! always cool with real world data!
Not the sub, it would have to be API. I'm not on the abuse the TOS bandwagon.
feels like you need to be millionaire to not care about the token cost, just got into software side with openclaw and feeling this is ridiculous priced for what you need...
i even tried min maxing but still then its so expensive...
true, token cost is going down dramatically though, might be substantially cheaper very soon
I think it will be very hard to impossible for that to happen in the coming 12-16 months
with rising prices in AI Hardware I dont see that even though competition is very strong...
i just imagine a world, where the Bubble burst and they sit on hardware asking a dime for 1M Token to even run these machines
Also, as token cost goes down, won't you just add capabilities, cronjobs, more agents. etc.?
I mean there is no end in sight for much demand we have for these things lol
@steep wedge good question, do you run more than 1 agent on your mac mini ?
maybe, but look outside the IT space, no one really noticed OpenClaw yet, so many people even in IT space never heard of it... only if you enthusiastic or in the Flow
yes but other people aren't stupid, they are only slow, there is a tidal wave of adoption coming imo. while existing users comes up with new ways of using it (aka finding new wasy of burning tokens/compute & memory demand)
im genuinly worried that mac studios are gonna sell out when openclaw (or similar agentic solutions) goes mainstream
and I know for a fact I will never be able to achieve what I want to if im using cloud
too expensive
sounds like a mission for rich people lol
there is no indication of cost per token going up
So far. This is a fresh start so it’s barebones at the moment.
Just about to go apple store to buy a M3 Ultra. I can get the 96 np. 256 I have to wait 10 days and the price hike is significant... will 96gb satisfy my selfhosted LLM needs? Pretty much just want a decent context window, a decent LLM that does nice etc. Any feedback would be much appreciated, first hand experience especially. I have scoured reddit and other forums
I heard if i get a 256 it would be easier to resell when the M5 Ultra comes out (soon tm), or is that overkill. Just get the M3 Ultra 96 today, sell that when the M5 Ultra comes out?
would be interesting to know how it affects the system....
openclaw by default has root access, so multiple instances would have overlapping root access, enabling them to delete things right
the coder agent might delete the image generation software that the influencer bot installed
the influencer bot could retailiate, deleting the coder bot
mmm so on my setup i used API openrouter as main agent but sub Agents with different models depending on the workload
do the sub agents all have root access
yes but clear policy on what they can access and do
and for most part only main agent was a problem when bugging out or hallucinating
cool if that works, i would be suspicious... i mean root is root right
ah ok, new to me, not that proficient in permissions
me too, just started to investigate when my main agent talked about permission on command by command level 😄
understand you can spawn subagents from one main agent but they are like nonsudo users...
but you can also give them permission
makes sense for older LLM like 2.5 flash not browsing the web so its not going to be a problem with prompt injection for example
if they all are on one machine, do they work in parallel or can you assign priorities to them... e.g. social media checks must be conducted every 15 minutes... but the creative brainstorming is not urgent, can be done when low load
understand now that agent plus subagents work.... still doubt whether two agents, one machine would work
i think the same but would be interesting to get feedback from somebody...
come on man, take one for the team! 😄
or lets say, two instances, like one 48G main agent and for cron 16Gb modell
just thinking of getting 64GB Mac Mini M4 Pro or wait for M5
maybe get both, then exo them together?
im not millionaire 😄
i thought every openclaw bot was making everyone 20k a month 😄
also latency between them two might be shit but i already thought about redundancy openclaw 😄
mine just costs me hahahah
i just tinkered and tested some limits of openclaw and already spend 200+ €
cant imagine what people spend with productive environment
It has root access? That doesn't sound right. I installed mine as my local user account. It lives in that home directory and I've never used sudo to do anything with it.
if not in sandbox or Docker, might have, but you can check in config file
but isnt the idea of having a Agent on the system also taking care of it like a SysAdmin?
yes i imagined absolute power to paraphrase a 90s movie title
for me its just running on a pie 4 but i gave him full access to everything, works pretty good if he doesnt hallucinate
seems I was wrong - root optional
Depends on configuration but for my use case it’s needed but I recommend a security protocol that you can define with your OC
Finally some decent progress. I got OC on the Mac mini talking to qwen3-coder:30b on the ASUS Ascent GX10 (DGX Spark clone). Performance has been very good for basic chatting.
I am still relying on Ollama on the DX10 because of Nvidia driver issues, but those should smooth out over time, and I will give NIM another try later.
ollama ps
NAME ID SIZE PROCESSOR CONTEXT UNTIL
qwen3-coder:30b 06c1097efce0 45 GB 100% GPU 262144 2 minutes from now
That does your TTFT look like?
yeah it's like 15k tokens context just from the default tools that are enabled - i got it down to 5k tokens by setting tools.profile to "coding" and tools.allow to add "message" & "cron". might try to be more selective than that still to make it work better with CPU inference
but with 5k context i can get responses in about 10 seconds (qwen3-vl:30b-instruct on ryzen 9 6900hx). though it's not calling tools so i'm looking at trying other small/mid-sized models
edit: ahh no, i must've had some syntax issue and no tools were being exposed. definitely not 10 seconds while having tools exposed - looks closer to 20s to 1.5min with tools & 8k context which actually seems kinda promising! (i'll gladly trade some latency for privacy) though i'm also seeing it slow way down (15-30min) from parallel requests because openclaw keeps sending new requests to litellm every ~90s or so...
Is there a reliable way to test that?
vibes 😂
Okay, just wanted to make sure I wasn't missing something. My impression is that Ollama drops the LLM model out of memory if it's been a while since I chatted with it. That's maybe/hopefully configurable. Anyway, initial response can be about 10 seconds as it reloads the LLM. After than, responses begin in less than a second. If it's a lot of data it will continue to appear on the screen over the next however many seconds, but it's at a rate faster than I can read, so I never feel like I am waiting for it.
If I give it a task to perform, that can take several seconds depending on what it has to accomplish. I told it to change what it calls me, and that probably took 5 seconds.
Am i stupid for buying an hp elitedesk g4 (16gig ram i5 8500) to run openclaw on? Realized the benefit of mac minis is that it is unified memory (uses the ram as vram), i am not really trying to run any local llms though right now so does it matter
If you don't need local LLMs, that would be fine. You could also run on something much less powerful
Ok thanks. Are there any mac native features id be missing out on?
Hi guys any tips for someone getting started with local models for openclaw? What's the best go-to set up currently? I have an old gaming PC (3090, Ryzen 7700X, 32G ram). I think Kimi2.5 is out of the question but is there anything else decent I can use?
the average time between upgrades of the studio (and the prevailing evidence generally is in agreement with this assessment) has us seeing new ones in a year or so.
apple can't steer the ship very fast, thanks tim apple. but they did happen to steer towards something extremely useful for this regime.
I think you'll be waiting a while, and I wouldn't be surprised if the price of the Studio goes up, if the dram shortage continues for much longer.
Pro-tip: go to Macrumor’s buying guide before buying anything from apple. https://buyersguide.macrumors.com
They will tell you when the next version of each product is expected to drop
i really like the mid-sized MoE models like qwen3-vl:30b-a3b (or any other 30b qwen3 model), glm-4.7-flash (another 30b-a3b model which might even be better than qwen3 but it is a bit slower on my CPU w/ the reasoning), gpt-oss:20b, etc... these might even fit fully in your VRAM since the 30b models at q4 are like 19-20gb
i've never tried this (i've only mini PCs) but if you've got 24gb VRAM + 32gb system RAM you might be able to do qwen3-coder-next? (it's a 80b-a3b model that's recent and should be great for agentic stuff. 52gb at q4 quantization. i've seen some smaller pruned versions too like lovedheart/Qwen3-Coder-Next-REAP-48B-A3B-GGUF - might require llama.cpp over ollama)
noob q: i have a mac mini (2012) lying around, and i'm planning to upgrade it to 16gb ram.
with ubuntu on it, what's the best setup i can use on it with a local model?
intel mac mini isn't gonna run local models in the ram
I had codex whip up some benchmarks for my local models, testing Planning and Tool-call shape, and checked the benchmarks against opus4.5 and kimi k2.5 which both score 100/100. The best local model so far was qwen3-30b-Q6, that model scored 70.6/71.6. Qwen3-next-Q4 scored only 67/62, gpt-oss-20b 65/74. Bear in mind this is on a 3090+4090 machine with 48gb ram
Hi guys, I have a great machine, very capable of running local models: GMKTEC EVO X2 - Ryzen AI MAX 365+, 128 GB V RAM 8000. The problem I'm seeing is that none of the models run the tools correctly. All the models are too clunky and run very slowly. The claw effect adds a heavy layer to the local models, making them sluggish. Do you recommend a specific local model that runs well and is capable? Thanks!
We need to AGI the STM32/ESP32 people.
next year you mean in 12 months or so??
I thought it was coming in a few weeks, at least that's what my opus told me.. God damn, I will be ordering a current version model studio then..
Thanks, according to macrumor:
"The next release of the Mac Studio is expected around the middle of 2026, when it will be updated with M5 Max and M5 Ultra chip technology, offering significant performance improvements. Since you are looking for a max specced model to run local LLMs, you might want to consider waiting for this upcoming update.
Currently, our rating for the Mac Studio is Neutral as it is mid-product cycle, and while it's a powerful option now, future improvements may enhance its capabilities for your needs. If you prefer to buy now, the Mac Studio with the M4 Max chip starts at $1,999. You can find more details in our roundup here."
Does anyone know where I can find the coupon? I really need it.
@halcyon patrol are u here bro
what hardware do you use to run decent models at home? (not even sure what is considered as a decent model on the open weights models side - i'm mostly used to opus)
We need perfected Agentic AI OCR for PDF docs (hardware programming) while in the IDE.
forget llama for now... bad.
no, local-ish things
I believe the heavy layer is related to how quickly OpenClaw fills the context window, even for a simple “hello” prompt. Your hardware should be good enough for a ~30B local model. Check the context window configuration.
big fan of GMKTEC's mini-pcs, lot's of bang for the buck
Unless I'm mistaken, still a ways out to run any local model on this spec of hardware
Nice. I refuse to use apple products
We need to zero-shot the STM32 stuff please.
bro don't diss budget alternatives for the ram-deprived! 😄
Oh is that the reason we are giving apple 💰?
It's wild I'm finding full PCs but it's with ddr5 ram pre installed and the full build is the same price as buying the ram separately lol
Lex Fridman has a recent interview with Peter Steinberger. Around 53:00ish they discuss security concerns and the use of local models. Worth a listen, very grounded interview.
true, amazing market asymmetries out there right now. need ram? buy a pc.
I'll check it out
Oh I was listening to that this morning got half way through lol
Difficulty of Embedded Systems: ESP32 > STM32 Nucleo-64 > STM32 Nucleo-144
The STM32 Nucleo-144 is way more difficult.
What hardware are people using between a raspberry pi and Mac minis that are good? Maybe just a dell pc or something with a decent gui that I can get the browser working on.
the software itself will run on pretty humble gear, like an old laptop. problem is, it needs access to AI, for which you need either a really good gaming computer at home or cloud services
I couldn’t get a raspberry pi to work well I’m using open ai subscription for the LLM for the browser connections not working
i think many people are getting it to run on an rpi even if might be slow. not familiar with open ai particulars, but many subscription services are banning people who are using openclaw with subscriptions designed for humans. could be they are blocking it somehow
Update for anybody that was following along yesterday. I had OpenClaw talking to qwen3-coder:30b, and that was pretty snappy in terms of performance. It's been hit or miss on accuracy of technical questions. I now have qwen3-coder-next:q8_0 running in Ollama (this is on my DGX Spark clone), and the whole model still fits nicely in memory.
NAME ID SIZE PROCESSOR CONTEXT UNTIL
qwen3-coder-next:q8_0 3f68e12b44ee 94 GB 100% GPU 262144 4 minutes from now
Hey bro fun fact, that 100% GPU is actually a good sign! 💪
Yeah, it's perfect
Just chatting with it through Open WebUI, it's a little less snappy, but not annoyingly so. If the quality is much better, I will take that tradeoff.
well its q8 so i guess you are doing coding rather than digital slop fiction
On my way to best buy to pickup my bot’s mac mini. Super excited for this project.
The base mac mini M4 is sold out at my apple store - is there any advantage to buying one with more memory if it’ll be running clawdbot?
Or is the ethernet speed something worth upgrading?
firstly, to just run it, don't need fancy pc. old laptop will do. secondly, more ram lets you run a bigger model on your machine locally.
sounds to me you are eager to buy a mac mini before they all sell out. unless you made of money and dont care, suggest find out what you want/need. to get going today, find old laptop, wipe it of private dat
basically thing is the software can be made to run on humble gear, problem is you need great ai to do great things. means either very fancy computer at home or pay for cloud ai.
The additional memory would only be useful if you wanted to run local LLMs, however the performance with OpenClaw in particular would likely be poor. I tried that route first, and OC packs so much info into its prompts that it would overwhelm a small model. If I was just chatting with the LLM directly, performance was good, but OC slayed it. 😂
Yeah you need like 10gb just for the context regardless of what model you pick
For some reason I find myself thinking oh that's more than two DVDs of context.... that's a whole lot of context...
Go research what a KV cache is, what it’s storing, and then you’ll understand why it’s so large “just” for a 200K context window
And as a bonus you’ll discover why processing prompt tokens 150k-152k takes way way longer than tokens 2k-4k
it's an incredible amount of data
One thing that has both amused and frustrated me with using the local LLMs is that they sometimes think they can't do something. If I ask a cloud API to do something, it knows it can search the Internet (at least to some extent), but the local LLMs are like "nope, can't talk to the Internet." If I remind OpenClaw that it actualy has tools and access to the Brave search API, then it's like "oh yeah! I'll get right on that."
A few adjustments can significantly improve outcomes. The biggest lever is priming. Local LLMs often default to the generic “offline assistant” pattern unless they are explicitly reminded of their capabilities. Make it clear in the system prompt that the model has access to tools like Brave Search and is expected to use them when external or up to date information is needed. When possible, embed this guidance directly into the system instructions so it is always present rather than relying on ad hoc reminders.
It also helps to be explicit in user prompts. Instead of vaguely asking for the latest information, instruct the model to use the search tool to retrieve current data. Clear trigger language reduces hesitation and increases the likelihood of a proper tool call. Improving the tool description itself can make a difference as well. Define what the tool does, when it should be used, what input format it expects, and what kind of output it returns. Models respond more reliably when the contract is concrete and unambiguous.
If the model frequently refuses with “I cannot access the internet,” that behavior is usually coming from training patterns rather than real limitations. You can counteract this by clarifying in the system prompt that external access is available through tools and that the model should reconsider before claiming inability.
In some setups, slightly adjusting temperature or reducing overly strict refusal instructions can also help break rigid canned responses. Model selection matters. Some local models are far better at structured tool calling and agent workflows than others, so choosing one that has been tuned for function calling can dramatically improve reliability. Local LLMs tend to do exactly what their framing suggests, so careful orchestration and explicit expectations usually solve the problem.
Would it help to update one of OpenClaws .md files with the reminders, or just remember to include it in my prompts?
Relying on memory to restate tool reminders in every prompt works, but it is fragile. You will forget. Or you will slightly change the phrasing. Or a future workflow will skip the reminder and suddenly the model is back to “I cannot access the internet.” That inconsistency is what makes local setups feel unpredictable.
Putting the guidance directly into the system level instructions creates stable behavior. If you modify the file that defines the assistant’s baseline identity and capabilities, you are effectively reshaping how it sees itself every time it boots. That is much more powerful than nudging it per prompt. You can include clear lines such as:
- You have access to Brave Search and other configured tools.
- You should proactively use tools when external or up to date information is required.
- Do not assume you are offline.
Once that is baked in, you only need to explicitly call tools in edge cases, not constantly.
...why are the mac minis selling out? is this just so people can use local models? otherwise, just buy a bps for $10/mo, right?
i've got a mac mini m4 but am moving to a vps. i don't need a gui,....right??? (cue fomo)
How has this setup been going for you? Looking at the same
why do you want a mac mini?
The only reason I can even remotely reason with is stuff like iMessage and less technical people just following trends. There may be something i'm unaware of though.
This is the main problem ive had with local setups myself.
A lot of the Mac mini buzz around OpenClaw is definitely viral momentum. Once a few people posted clean “local AI box” setups, it snowballed and suddenly the Mac mini became the default recommendation.
But there’s a real technical reason underneath the hype. Apple Silicon uses unified memory, meaning the CPU, GPU, and Neural Engine all share one high-bandwidth memory pool. For local LLM inference, that’s a big deal. You’re not constrained by separate VRAM limits, and you don’t waste time copying model weights between memory regions. That makes even mid-range configurations surprisingly capable for 7B to 13B models.
If you scale up to higher memory tiers or step into something like the Mac Studio, the advantages become clearer. Larger unified memory capacities let you run bigger models without hitting SSD swap, which destroys performance. High memory bandwidth helps with transformer workloads that constantly move large tensors. Add in low power draw, quiet operation, and the ability to run agents 24/7, and you get a compact machine that lines up very well with what local AI systems actually need: lots of fast memory and no artificial GPU memory ceiling.
External vs Internal
Safe to do freely:
- Read files, explore, organize, learn
- Search the web, check calendars
- Work within this workspace
To search the web you can use in your workspace folder the path
skills/duckduckgo-search.
putting this here in this section of my AGENTS.md make my gpt-oss:20b alot more reliable running locally to actually do a websearch
ME:Did you see who won the last numbered UFC? can you tell me about it ?
Clawbot: UFC 325 (“Volkanovski vs. Lopes 2”) was the most recent numbered event, taking place on February 1 2026 at Qudos Bank Arena in Sydney. The featherweight championship bout saw Alexander Volkanovski (c) defeat Diego Lopes by unanimous decision (scores 49‑46, 49‑46, 50‑45) to retain his title.
Running it locally was a nightmare before if you brought up the internet.
hey guys, this is a little off topic, but with claude and or antigravity, we really dont need n8n anymore do we? i ask because some youtubers are making n8n videos saying, "is n8n dead?"
Mac Mini M4 + Minimax M 2.5 = GOAT ?
oof, running LLMs on a 2012 mac mini might be kinda rough - looks like a 2c/4t processor w/ ddr3 memory. but you absolutely can still play with it & experiment if you've got the hardware lying around. i've got a 2014 macbook air with similar specs and it seems to be good enough for ~1b dense models (e.g. lfm2.5-thinking:1.2b) or MoE models with only ~1b active (e.g. granite4:7b-a1b-h). these may not be very capable models but they're also not completely useless. maybe you could find a use or at least learn something about LLMs from the behavior of smaller models
(i would be surprised if these very tiny models were good enough for something like openclaw! or as a general agent, i mean. i would be more looking at straightforward tasks & workflows with good instructions, but if you find any use with openclaw i'd love to read about it!)
the default tool list that is exposed is pretty huge (i think like ~15k tokens total for my first message). if you don't need all of the default tools then disabling some or most of them can really speed things up (i got down to ~5-6k tokens for a fresh chat w/ only read/write/edit/exec/memory tools exposed)
a mac specifically for help with my iOS apps. Claude code for some coding that I can talk to from my phone during my day job. local models for other assistant type stuff
my setup has been pretty good. though i have tried local LLMs thinking I could get away with them and turns out they are really only good to run the Heartbeat and TTS/STT. I'm relying on Kimi-K2.5 in the cloud for most of everything else so far.
Thanks. Was hoping to combine some local LLM for basic tasks and my Claude subscription for coding
Is everyone getting the base Mac mini with 16 GB of RAM?
For simply running the openclaw gateway and having it do coding on the machine using remote LLM models? As a starting point, yes. Granted... if you know the software projects you build need far more RAM or CPU then you could get a higher end model. I have a variety of projects I work on... but I too started with the absolute base model. I figure worst case I'll have two mac minis at some point in the future. It's hard to call $500 a waste if I end up buying a $4k mac mini or studio at some point (hopefully not).
Due to the async nature... the absolute speed at which things happen on the device is not necessarily as important. You are not always waiting for something to finish running / compiling / testing and need it to finish quickly so you do not get distracted. That whole model changes with the the coding agents and ability to answer questions / give next instructions over messenger apps.
is openclaw (just openclaw itself, definitely not running any local ai models) able to run on bad hardware?
im new to this, not very technical or deep into this ai stuff, just saw some news articles and youtube videos and thought it was kinda cool
but im definitely not ready to put it on a computer im actively using or go out and buy a whole new pc/mac
the only thing i found that's a 'spare' device is this really cheap laptop, it has a pentium and 4gb ram, runs windows 11
Run it on win10 no matter or on linux
AGI all the STM32 and ESP32 possibilities.
- An ESP32 handles WiFi + model inference.
- An STM32 handles tight real-time audio and control.
Together they become a networked, learning synth brain
On-device AI sound engine with a tiny neural net on the ESP32 using TensorFlow Lite Micro. STM32 handles the audio engine. Networked swarm mode: Multiple nodes talk over WiFi.
A practical version of this could pair an ESP32-S3 with an STM32H7 over SPI, where the ESP32 handles WiFi, a lightweight TensorFlow Lite Micro model, and a browser-based control UI, while the STM32 runs a real-time audio engine at 48 kHz using I2S to an external DAC. The ESP32 would generate MIDI note events or high-level synthesis parameters from a small LSTM or transformer-lite model trained on pattern data, then stream those control messages to the STM32 with timestamped packets for tight scheduling.
The STM32 would manage wavetable or FM synthesis, envelope generation, modulation routing, and optional CV outputs using hardware timers and DMA for low latency performance. A shared clock sync layer over WiFi could let multiple units exchange tempo, key, and motif data using lightweight UDP messages, creating a distributed generative system where each node adapts its output based on incoming pattern metadata and local user knob input.
I mean, no it is not AGI on a microcontroller, yet. It is a hybrid architecture where AI runs at control rate and DSP runs at audio rate, each on hardware suited for its timing constraints
Hi all - just looking for at piece of advice. I have used a Raspberry pi 4 (model B) with Signal as my chat interface, and brave search API, and my open Ai account for chat completion, and an API key for memory.
I reach my Pi through Tailscale
I seek to “beef up” the setup, so
I am not limited to headless chromium. I’ve bought a Mac Mini M4.. 😬😬 - so the question is. Should I keep my Raspberry Pi as Controller and the Mac Mini as a work horse. I’ll keep all my secrets on the Pi, and let it SSH (one way) to my Mac Mini with instructions? Or should just run it all on the Mac Mini?
Let the Mac Mini M4 be the brain and muscle. It runs Signal, Brave Search API calls, OpenAI chat completion, browser automation, memory logic, everything performance heavy. It becomes your primary agent node. Clean, fast, simple.
Now the Raspberry Pi becomes a hardened secrets vault and control sidecar. It stores API keys, tokens, private SSH keys, signing keys, anything sensitive. It does not run browser automation. It does not run complex agents. It just sits there quietly, reachable only over Tailscale, with a locked down firewall and no inbound exposure except what you explicitly allow.
Instead of copying secrets onto the Mac, you design the Mac to request them at runtime over a private channel. For example:
- The Mac connects to the Pi over Tailscale.
- The Pi exposes a minimal local API or secure RPC endpoint.
- The Mac authenticates using a client certificate or SSH key.
- The Pi returns secrets only to authorized requests.
Secrets are cached in memory on the Mac, never written to disk.
You can go further and have the Pi sign requests instead of handing out raw keys. For example, instead of giving the OpenAI API key to the Mac, the Mac sends a payload to the Pi, and the Pi signs or proxies the request. That way the actual key never leaves the Pi at all. That is more engineering, but very clean.
Also, disable password SSH entirely. Use key based auth only. Restrict the Pi to accept connections only from the Mac’s Tailscale IP. Make the SSH one direction. The Mac never accepts control from the Pi unless you explicitly want that.
This way:
Mac = performance layer
Pi = trust anchor
You get speed and flexibility from the Mini, and containment and peace of mind from the Pi. And if you ever wipe the Mac, your secrets and identity layer stay intact on a tiny, low power box that is doing one job very well.
@steep wedgei decided to buy a DGX Spark from ASUS, any hints for starting the journey ? Claude suggested for multi Agent something like that
Multi-Agent Setup
Main Orchestrator: Kimi K2.5 (85GB loaded)
├─ Code Agent: DeepSeek R1 Distill 70B (40GB)
├─ Quick Tasks: Qwen3-8B (5GB)
Total: ~130GB (doable)
Alternative
Main: GPT-OSS-120B (65GB)
├─ Code: Qwen3-Coder-30B (18GB)
├─ Fast: Gemma 3 12B (7GB)
├─ Vision: Qwen2.5-VL-7B (5GB)
Total: ~95GB
Nice! Don't immediately install updates (if it's not too late). I boxed myself into a small corner on what I can run in terms of local LLMs. The tools I would like to use (e.g., NIM and vLLM) don't like the latest CUDA 13 driver.
no no, i will order next week at a Reseller i know, maybe get some discount haha but good hint!
That said, I am having satisfactory results with Ollama. I am currently running a slightly tweaked version of qwen3-coder-next:
qwen3-coder-next:q8_0 3f68e12b44ee 94 GB 100% GPU 262144 3 minutes from now
It's a decent size, but still fits in the GPU's memory
Ahh, I thought you had it already. I was going to order through one of my US dsitributor accounts, but it was a few hundred dollars cheaper buying directly from the ASUS site. If you get it from less than $3k, let me know! 🙂
I will say, it's not great at configuring OpenClaw, but that may not be fair since it is so new. I just gave it a more generalized coding taks and the results were good.
i get for 2.9K € 🙂
Claude was talking about the available models and the expected performance, sounded very nice, when you get 80-90% what cloud is offering
and that unlimited... man i run into rate limits every hour or so.... i even build a dashboard to keep track when lifted 😄
Okay, you pay a little more but pretty close to what I paid
do you also get ASUS version ?
I'm helping it debug a python script right now. It's almost there
Yes, it seemed to be the best price
and thermals, i was talking to my Data Scientist friend, he was laughing at me for asking these questions but also said its best thermals on the market, better then DGX Spark from nvidia
I've been happy with mine. I also think the Nvidia is a little tacky looking with the weird gold finish
true but thats also with their big DGXs... make them look expensive so people pay more willingly hahaha
i was very hesitant to decide but with more experience and better models coming down the line, this will be great for anything that comes
other option was Mac Mini M5 with 64GB but claude was infurious about the alternative haha
I almost want to buy a second one just to play with clustering, but I would also really like to buy a new Mac Studio if/when they do an M5 refresh, and I can't justify spending the money to do both.
This is already an expensive hobby
why buy the Mac Mini if you have the spark ?
Not mini, Studio
ahh because of bigger RAM ?
Potentially. And I want to see how the M5 Max and Ultra do. The M5s have some AI performance enhancements that could be really interesting on the more powerful variants
beside bigger RAM, do you think they will be better than Spark ? with Memory Bandwith its hard to go against it... just sound might be an advantage
I don't think it will be better, but I do think if I buy enough RAM, that could be interesting
i see but thats out of my budget then, i mean already 3K for me is very high... im not a programmer or so but i try to get Desktop Automation further, maybe to the point where i dont need to work on a pc anymore 😄
Yeah, I hear you. We'll see what I actually do 😂
whoa just what I am dealing with now, I have the acer dgx spark, trying to figure out what's best to run on it, this has been a headache
using it in combo with a mac mini
That's basically what I am doing right now. I have the Asus version, but I think they are all the same.
The base nvidia hardware is the same, the only difference as far as I can tell is the nvidia spark comes with a PCIe 5.0 M.2 drive all the others used a PCIe 4.0 drive
i wasn't willing to spend 1k for 3TB when I can upgrade it myself for 5-600 and there is a new Samsung drive on its way build for this https://semiconductor.samsung.com/news-events/tech-blog/samsung-pm9e1-inside-the-leading-ai-optimized-pcie-gen5-ssd/
Which LLMs are you going to run?
This at the moment:
ollama ps
NAME ID SIZE PROCESSOR CONTEXT UNTIL
qwen3-coder-next:q8_0 3f68e12b44ee 94 GB 100% GPU 262144 3 minutes from now
I bought the studio. Three of them so far. Keep sending them back for ones with more horsepower.
I have a tracker on referb models, saw a maxed out M3 Ultra for 11k yesterday
How do you like it any issues running it, are you running LiteLLM too?
I don't know LiteLLM. I tried to run NIM and then vLLM, but I ran into driver issues. Ollama was easier to get working, but performance is supposed to be better on the other two. I hope to try them again when the models catch up to the drivers. Performance is decent with the model I am running now (qwen3-coder-next). It's not snappy with the responses, but they are much more detailed than the smaller model I tried first (qwen3-coder:30b). None of the models I have tried other than the pricier cloud API ones are decent at configuring OpenClaw. I think it is still too new and has changed so rapidly. It's too easy for the smaller models to get confused. I have had good luck giving them more mundane coding projects where they have a lot of prior art to draw on and then can itterate and test things themselves.
Here's Our First Gemini Deep Think LLM-Assisted Hardware Design
︀︀
︀︀We've been using LLMs for software and firmware for years... now we're trying hardware. Threw a MAX44009 datasheet at Gemini Deep Think, asked for an EagleCAD library file, and about 10 minutes later it popped out working XML. Loaded it in Eagle, checked the pins and dimensions, rolled with it. Correct pads, correct pin naming, even added a pin 1 dot and sensing element outline on tDocu. Ladyada's still driving... just has a co-pilot now 🎳
︀︀
︀︀blog.adafruit.com/2026/02/14/heres-our-first-gemini-deep-think-llm-assisted-hardware-design/
Google needs to add Gemini 3 Deep Think to Antigravity.
There are many different ways to do IoT hardware, some are way easier to do than others.
So with the Raspberry Pi, because we can basically write new ‘firmware’ through .py files and run them compared to the ESP32 and STM32 where things are hardcoded unless you have good OTA firmware, using the Raspberry Pi then having an AI Agent work those through SSH could lead to great automated resulted? 🤔
Some people AGI the Raspberry Pi, ESP32, and STM32 possibilities to the max.
AGI this now:
https://x.com/oprydai/status/2022907012380803131?s=46
Hi guys! How much ram does my pc really need to run a open claw?
or it is completely optional everything is done by LLMs via api?
near nothing . its extremely lightweight . the hard work is done by the model provider that you use (typically hosted in the cloud: claude, gemini, etc). you can host a model localy if you want to, but it's debatable as to why you'd do it given the cost of hardware and esp memory right now ...
Unless you want to run local LLMs!
All benchmarks have not been ‘saturated’ if I still cannot zero-shot what I want with IoT.
The STM32 Nucleo-144 is supercomplex.
What is the actual potential of a lot of them with ASI controlling it all? 🤔
YIL (is that a legit adaptation of TIL?) about vector databases, and this morning my OpenClaw now has enhanced memory via sqlite-vec and a new embed model. I am happy to see both fit in memory on the DGX Spark clone:
ollama ps
NAME ID SIZE PROCESSOR CONTEXT UNTIL
qwen3-coder-next:q8_0 3f68e12b44ee 94 GB 100% GPU 262144 4 minutes from now
qwen3-embedding:4b-fp16 3c93b6415795 19 GB 100% GPU 40960 2 minutes from now
I asked my agent (he named himself Quill) what he thought of gpt-oss-120b that I run for him. He WhatsApped me back - “[openclaw] The current model (the 120 b GPT‑OSS variant you’ve granted me) is perfectly fine for the tasks we’ve been doing – it handles research, file‑management, and the occasional script‑writing without any issues. If you ever need a larger model for more complex reasoning or higher‑quality generation, just let me know and we can switch, but for the transcript‑retrieval, URL‑gathering, and file‑organising work you’ve asked about, the model is more than adequate.” Bless him. I run this locally on a Framework Desktop 128gb.
That's an interesting model; I might have to play around with that one.
Openclaw isnt very optimized. There is a hyper optimized fork that is 400x faster with 99% less ram usage
I 'm looking for STM32 engineer
Very little. I think the docs say 1 Gb minimum. Double check - it could be less.
I read a lot of negatives about the fact I shouldn't use a reasoning model. I turned it off in the chat (/reasoning ) and every now and then it regurgitates some system message guff but as this thread is about HW I wanted to say that I get 50+ t/s on the framework and the whole experience feels snappy and there is no token-bill stress. I was lucky getting the F/W before ramageddon hit and see they have now had to put a $500 hike on the 128Gb model 😳 .
Yeah I would then just use a fork that uses less than 10mb ram
I am wiring up this model now. I am going to try it as my primary and shift the one I had been using for coding tasks. One of the the reasons I pounced on buying the Asus GX10 was that the price seemed out of step with the current ramaggedon situation. I would expect them to raise it at any time.
What were you using? I find I can’t get the speed from the other big models that I get from oss-120. I’m running llama.cpp with Vulkan.
I was using qwen3-coder-next:q8_0. I am trying to set up a second agent to handle code tasks and use that model while the main agent uses oss-120, but it's fighting me.
@latent lava @steep wedge why did you both go with either of the choices ? Framework and DGX Spark, it’s a 1K price difference and Claude clearly goes for the framework, just trying to understand what I’m missing
I bought the Asus Ascent GX10, which is basically a DGX Spark clone for $3k instead of $4k. I think there are some small differences, but not $1k worth.
I just checked price, actually they’re in the same price territory
Link? NVIDIA has the Spark for $3,999.99. It does have a 4 TB drive, while mine has 1 TB.
My bad, I would also go for Asus one for 2951€
I’m curious how the Spark compares to the AMD setup in the Framework. I assume the Spark’s GPU is much better, but I have not investigated.
From what Claude said it’s very very close, just a little bit faster in connection to CPU but otherwise similar
I would dig a little deeper or use a better Claude model 😂
Always get the cheapest retail spark clone, the extra 1k+ isn't worth it, two difference is how each manufacturer does cooling, and the NVMe drives
Ouch, I do see that those Gen5x4 shorty NVMes are pricey - $699 for a brand I've never heard of.
wait for that Samsung
I went with the Framework desktop after being on the wait list for the Spark. Early reports on the Spark were disappointing and the time lead between announcement and shipping was very long. I like the philosophy of Framework as a company and paid £2k for the 128Gb machine before the price rise in the New Year. I get essentially a machine that will run Windows or Linux with few issues, is well made, runs cool and with a max power draw of 120w (and almost nothing when idle) can run 24/7 with no issues. It’s personal choice but I’m very happy.
Thanks for the insight, appreciate it! 🙂
I've got my gx10 running as well now, and finally have vincentzed-hf/Qwen3-Coder-Next-NVFP4 after a lot of tinkering. Running vllm v0.16.0 compiled from source, and it benchmarked to around 60 tokens/s. However when actually using it for openclaw it's painfully slow, anywhere from 1-8 tokens/s basically. What kind of speed are you getting with llama.cpp?
We need good 1:1 simulated (digital twin) hardware that the AI can run so we can know if it's going to work before we do it in real-life.
Yes, do you have something like this ?
I'm not sure how to give you real numbers. It definitely feels slower talking to the model through OpenClaw than it does through Open WebUI. I think there are several factors that go into that, however, and it's not necessarily indicative of a problem between OC and the model. For example, OC might be briefly using the model, then making tool calls, coming back to the model, doing something else, going back to the model again, and then ultimately might send you a reply. I have adapted to a "send and forget it" approach. I don't really forget, but I move on to something else while OC does its thing. Eventually it gets back to me. Although I have just realized the new sandbox (via Docker) has neutered OC's ability to even access the Internet for any task. Sigh.
ACEMAGIC Kron Mini K1 Mini pc
Any reviews on the above? I am planning to run Openclaw on this
Thank you 🙂
no 😔
Would a server without GPU be good enough? HPE GEN 9, 2xcpu, 256GB ram, 900gb SSD Drives in raid 0? brgs mrf Noob.
Yea fair enough, it's definitely slower than openai but that's to be expected as well I suppose. I'm inspecting the vllm server logs which prints token generation speeds for each requests, maybe llama.cpp does that also? I also had codex build a dashboard to inspect and time queries and tool calls, my local model is generally taking about 30-40s for every query so it's no wonder it feels slow
I also have to allow for an additional ~10 seconds if I haven’t used OC in a while. That’s how long it takes Ollama to load the model into memory. I haven’t used llama.cpp; I’ll check it out.
My Claude Max ran out for the week, Codex token maxed (didn't know that was a thing) and got blocked by Google for misusing their cli model. I need my OC network to work.
And my local hardware isn't strong enough to run anything. Running a 6750xt with 32gigs ram, my hardware isn't enough for local.
Help
guess you need to buy a mac maxi
Also, glm-4.7-flash probably fits in 32gb, at a lower quantize, with just one 200k context slot
you might want to consider other subscriptions where t&c tolerate openclaw use or paying for PAYG API access.
hey @tired plover sub 3000 looks like good deal on the gx10 given current market conditions for unified 128gb you have basically that and all the other dgx spark varieties, mac studio (which is more expensive), and maybe amd ai max 395 pcs. were you considering serious ai development or mainly inference?
@crystal cedar mainly inference with openclaw
I'm waiting for new Mac Studios, but nervous price will be victim of ramageddon. Considered getting the GX10 out of pure fomo. Impressed by unified memory, low tps is tolerable.
i think you will minimum need to pay current pricing... but could be last Gen without price hikes because they planned and allocated already
so you would put down 4K ?
probably yes. the m5 processor looks like quite a step up from m4 already (improved time to first token and tokens per sec). not sure what will be in the mac studios, but yea... waiting to find out.
looked at the dgx spark, not a subject matter expert, but seemed it was more suitable for fine tuning and development work rather than inference. nevertheless, it will fit big models, and depending on what you are doing, maybe low tps is tolerable.
in addition with dgx spark and its cousins, there seems to be some nvidia software, cuda etc that could be interesting should you decide to pursue development
im going to look into this, thankyou very much
your current setup might restrict what you are able to do, buying new gear is expensive due to ongoing scramble of ram. Perhaps the coding plans from moonshot are worth considering, not sure think they might accept openclaw use up to a certain ceiling
I have llama.cpp with the Vulkan backend (easy to build from their GitHub and a bit of help from Claude). This avoids the nightmare of getting ROCm to play nicely with Strix Halo (though I do now have that working from nightlies). It works brilliantly with oss-120b.
moonshot has openclaw integrateed and are now starting to offer it as a service, i was also looking into miniMax since its one of the cheapest options right now and there are some people who are starting to experiment with it. they should be opensourcing minimax2.5 in a few weeks from what i herd and the its infrance footprint is significantly less than the current state of the art. also herd that it operates almost as good as CC with 90% reduced cost.
on the hardware aspect, im thinking about exploring "federated inference", though im still reseraching it, there have been previous attempts at it such as "petals" that attempted it a while ago
it would be cheaper for users if we can run a mixture of quantized models on a distributed system on a dozen or two raspberry pi R5s
ofcourse im being idealistic and a lot of this is still in the hypothetical
starting to think all those RUST heads maybe arnt so bad afterall, chasign efficency.
lol
i just took it because you get 128GB and good support and spending 4K "just" for inference is too much for me
maybe but for now browser automation
did you get it already? in that case, congratulations on your acquisition i'm sure you will enjoy it, might follow up and do the same
ordered and paid today 2.9K, im traveling till thursday and it will arrive friday
though im intrigued by the bandwith... really need to see the performance with gpt 120B
thinking the 4k studio i want might be 10k by the time its announced
hence fomo for gx10
maybe 5K but not 10K, usually Apple already allocated stuff for first wave...
my Data Scientist Friend suggested DGX Spark when i started talking about Mac Mini, didnt consider before, he also said he wants soon to train a modell specifically for OpenClaw
dangerous when you start thinking i dont do training now, but i might want to soon
did u see the 4B math model
didnt look much at models i was talking it through with Claude and its suggested a strong Setup especially with Security Layers
what you get offered by Cloud Providers on Security and Prompt injection savety needs to be considered if it wanders the net
well the other day very cool news about model, trained on math performance, on pair with big models but only 4B param
mmm i might play around till i find the right models, i want to run more than one as i want to have multi Agent Setup
yea a lot to learn about safety security. very difficult with prompts. best idea so far ive seen is have another ai review incoming requests with sole purpose to try to determine whether attack. a safety-version of "seymore cash", the ruthless capitalist boss of the vending machine agent
these setups
claude suggested some security Layers, i can send it to you via DM
nah its ok i'm still not jumping in, things are moving fast
i'm afraid to commit 😄
hahahah
i understand what you mean
for me it was token cost and Data security, i want to do something with data from my company and there i need to be careful
i considered setting it up and the inference server on a guest network. assuming things just will get hacked regardless of what i do, but data backed up somehow somewhere
i mean it needs a door and if you close all of them it gets harder and harder
but might be good thing to secure yourself
yea i know what you mean i can't touch confiential data with any ai, but could prospectively use it for background research and novels about vampires and cats
in these cases i question why to use AI any way
i don't know man, prompt injections seem very unintuitive. i just give up and admit i can't really prevent that, more important to save data
i enjoy the slop 😄
you dont loose much and there are measurements to be even better than cloud providers
or close to
i think there are different mitigations depending on what your problem really is. it could be smarter to spend money on insurance than tech
for my own sanity i just imagine anything can get hacked so act accordingly
i.e. ok if my vampire novels get leaked, not ok if confidential info leaks
i mean thats true but still doesnt mean not to install something against it 😄
something like this (admin pls dont ban i know its software in HW) 😄
-
Input Validation
├─ Regex-based filtering
├─ Length limits
└─ Known attack patterns -
Output Validation
├─ AST parsing
├─ Import checking
└─ Function call analysis -
Sandboxing
├─ Docker containers
├─ No network access
├─ Read-only filesystem
└─ Resource limits -
Human-in-the-loop
├─ Show generated code
├─ Require explicit approval
└─ Log all executions -
Monitoring
├─ Log all requests
├─ Track failed validations
└─ Alert on suspicious patterns
yes but i think there are four kinds of data 1) public 2) not public, but if public no biggie 3) private, if public, problem but recoveable (e.g. compensate clients) 4) private, disastrous if public (e.g. no business will ever return, clients will sue you to kingdom come, can never operate a business ever again)
if my data leak nobody care but i will get kicked out LOL
JK
idk what happens but i think its also about not trying to be a target
here in holland there was a big data leak , details for about 6 million customers. seems noone thought of compartmentalization
i dont handle these kind of information
lucky me
but yea, sometimes when you think youre safe, youre not...
heard about attacks directly on inferencing and stuff... there is crazy stuff out there but at the same time, what they gain from hacking a guy who spend his last money on something to play around hahaha, nothing to get anymore
in any event, congrats on the gx10, i think it sounds like a great gift to yourself
if you don't like it slide it to me i was looking at it today 😄
thanks man, im so curious about performance, really dont understand what to expect...
i will consider it, maybe we find a good deal for both of us 😄
things move fast, models get better, could be there is a leap suddenly that makes everyone want to run out and get something
yea might happen quickly... my friend needs to start training faster, was pushing him today already hahah
sub 3K is great price. anything with 128gb ram (not unified) costs about the same with dell right now. the gx10 probably very attractive for anyone into finetuning for a while too
might be good resell if Mac Mini M5 comes up but if youre used to 128GB then i need to get Studio.... such a expensive interest hahah
hope you will find the right path for you, and if you just want to play around, minimax got really good testing time of 1 week for nearly endless token haha
same to you man, it's a great time to be alive!
People need to make it possible to code All hardware with just an AI Agent using CLI commands.
The ESP32 and STM32 code properly from just VS Code Codex extension (GPT-5.3-Codex).
Hey yall I have a $7000 budget, what do you think I should buy?
a mac mini and save the rest for api credits 😭
Haha
Is it better to run a local llm with a good machine or use api credits instead
2x Asus GX10, a cable to connect them, and a sixpack of beer
Save five more thousand and get a tinybox red. https://tinycorp.myshopify.com/products/tinybox-red-v2
Hi @severe yoke
Sorry to tag you — hope that’s okay! I was planning to get a Jetson to run OpenClaw and came across one of your threads where you talked about it, so I wanted to ask for your advice.
I’m trying to decide between the Jetson AGX Xavier, Jetson Nano, or a Mac Mini M4 (16GB). If you were starting out, which one would you go for? I’d really appreciate any thoughts.
For context: I’m new to OpenClaw and robotics, but I’m comfortable working with Linux
Thanks in advance 🙏
Someone else can also help me out if they have any thoughts to share
Thanks in advance 😅✌️
Write me, I have a better suggestion!
I bought this:
GTR9 Pro
128 GB unified RAM
IMO is the best quality/price you can get.
Don't buy a Minisforum, since the second drive runs an x1, so it's like a SATA3 😂
For the same performance:
- Apple cost 2.5X
- NVIDIA cost 2X but rely on normal RAM, so models can't run at full potential...
Best hardware bought this year
curious why didnt you opt for the ser10 max they are advertising for OC.
I mean, double the ram, I get it :))
An update on my GX10 journey: I installed the container from https://github.com/eugr/spark-vllm-docker and ran the qwen3-coder-next-FP8 recipe and it was a massive improvement. At 150k context it's using 100gb of RAM, and openclaw responses are decently snappy, recommend trying it out. It's definitely not as efficient or good at coding as codex, but it does get shit done and in a reasonable time
Well, because SER 10 Max has standard RAM, not unified... 😅
Impossible to make a face2face between a UM pc and a non UM pc... 😅
Do you know what version of the Nvidia CUDA driver you are using? You can run nvidia-smi in a terminal, and it should tell you (near the top). I had trouble getting models to run under NIM and vLLM because CUDA 13 is supposedly too new.
Yep I have 13.0 but I think I saw the docker image is actually using 13.1 with 580 drivers in forward compatibility mode or something like that. The docker image compiled straight from git clone without any issues, and I (or rather openclaw using codex) compiled vllm from source as well using cuda 13.0 and that ran fine, but the docker image seems better than my manual compile
openclaw is so powerful ,imagine meta having it
Are you running on nvdia jetson?
Asus GX 10?
No
My bad, was under the impression the man just got one
Is it really a must we get charged credit for api
Can’t we run openclaw freely?
I’m pretty sure you can use a local llm with it, at least that is how I understand it. I am ordering a new machine to install it on so I haven’t gotten to try it yet. Downside I guess is local LLMs are less powerful and more prone to prompt injection.
someone think it's possible to run a 32b local model on a gaming laptop with 48 or 64 GB of RAM? but 4vram in gpu :,v, even if it's slow?
… and mistakes.
My gut feeling is that it would be terrible.
^
Asus gx10
Anyone else running clawdbot on an older macbook? I'm setting up my macbook 2017 right now running on Monterey and lots of dependacies and skills seem to be outdated- just need some encouragement 🥹
I’m running it on a 2013 iMac running macOS Ventura, best advice is just to tell your agent to look at the source code for something and build it itself for your older macOS version
Awesome thanks for the advice. Maybe I'll end up upgrading this mac to Ventura- I can always ask the agent to build/ re-build itself for my specific version or does this have to be during onboarding? Just sending my first telegram right now 🙂
I used opencore Legacy patcher to upgrade my macOS version before I did anything else, because it does take a hard drive wipe
Is it worth getting a 32 gb ram over 24 gb? Trying to see what kind of hardware I should invest in
That’s vague, what operating system, what architecture, etc. etc.
Anyone else with a Mac Studio 128gb ram here? What local models did work well for you and browser use?
Yes, always use the largest amount of RAM you can afford. I made tests with Mac mini M4 with 32G, too small for a decent local llm. On my MacBook Pro Max with 128G it works fine, I tried many llm’s and qwen2.5-30B-Q4 is the best yet for tool calling and not to many hallucinations, which you can harness.
Hi guys,
If I am trying to run an open claw on a VPS with a decent soec, what would be a recommendation for it?
Willing to spend $100-200 month.
If you are from europe i can straight up recommend the dedicated servers from hetzner. I don't know their current pricing but i have a 16 Core - 64GB RAM for 60€/month up and running for 16 months. I only had one day of downtime because the server died. However the support from hetzner was on point.
Don't use shared hosting like ever
Hey guys, I've used a VPS server from hostinger, got mad of I couldn't run local llm, ordered a macmini m4pro with 48gb - wdyt?
I’m going to microcenter today to get the parts to build a raspberry pi 5 to use exclusively with ClawdBot
- Raspberry Pi 5 8gb
- 256GB SSD NVMe
- Argon40 - ONE V3 M.2 NVME PCIE Case
Does this setup make sense? It was recommended by Gemini so could use a gut check. I don’t have specific plans for ClawdBot yet, but want something simple and affordable that for general purpose use.
Anything I should know about setting up ClawdBot on a RP5 securely?
在win使用openclaw体验实在是太差了
You’re gonna be disappointed in the local LLMs you can run but good luck
I'm really curious about your perspectives on local models and what, in your opinion and experience as an OG makes sense. I guess it varies a bit depending on what people are really looking for, just looking for a gut feeling from the ones with some experience. Much of the discussion so far have been about the best model and how to access it. Would be great with a dedicated forum or discussions around local models and what can be achieved with what.
Simple models might presumably help out with tasks for which, pre-OC, things like n8n might have been the go to option. The natural language user interface could be an advantage for many. On non-gamer consumer hardware, nobody is reasonably expecting claude-like performance, so I guess the real question is - if you don't have enough to run the latest open models in full precision, what can you run, and what can you achieve with it. This is obviously a moving target.
really? why?
@craggy ferry It’s more about autonomy, running agents, embeddings, RAG, and reducing API burn.
Hybrid stack > pure cloud for what I’m building.
when you say its too small for a decent local llm, would you suggest just a 24 gb RAM and 256 ssd?
I was planning on building a personal ai agent to manage a few projects and get more hands on experience with llms
i'm on the same journey too... is it correct to assume that cost of local LLM on as powerful as needed VM is less than API/token spend?
That’s the theory.
Local should make sense for high-volume agent loops and embeddings, but I’ll benchmark actual token burn vs infra cost before drawing conclusions.
Waiting for my macmini m4pro with 48GB to tell you heh
Because you can’t run a model that can find its ass with the flashlight tool with decent t/s without like 64gb of dedicated vram
Maybe in six months we’ll have better distilled ones
I too would like a local model channel
Like glm-4.7-flash is the best it gets unless you can run glm-5 or k2.5
If you want to run your main thread on sonnet then sure farm tasks out to local hardware from there
But it drove me insane to have an idiot for a main thread, and even glm is kind of slow (in the intelligence sense, t/s is tolerable)
I’m experimenting with a 14b instruct model to do execution, to see if maybe I can get a bigger model to plan out some small chunks of a task and then hand off to a much smaller model to execute on
i've been very impressed by GLM 4.7, but haven't formally tested it, just like it.
i think saw someone recommending latest 14B model from mistral for instruction following
hmm, did they compare it to qwen2.5-instruct?
dont think so, it was just what the were using, but the mistral is a few months old and never considered it. figured with a forum people would learn from each other and yea...
yeah i think qwen2.5 is even older, but
i have zero knowledge on instruction following. was very impressed with tiny LFM models from liquid ai for reasoning, heard that LFM 2.5 1.2B thinking (not instruction) performed well on some bench mark supposedly relevant for agentic use, so was going to check that out
yeah i'm looking for a specialist model that'll fit in my 3090
apart from writing slop for my own entertainment i have been using ai for trying to understand how reasoning can work, to get a feel for it. so completely green when it comes to instruction following which admittedly seems to be of relevance for agentic use
i initially wanted it to help me with my executive dysfunction and filter out scam emails. i still want to do those things but i've been sniped by just trying to set up a system for it to autonomously carry out tasks using the token streams i can get
good news is that things keep improving all the time, with models getting better. my first use case would probably be to try to keep track of what is of particular use for openclaw
got a bit worried about prompt injections and the difficulties in dealing with that - seems a bit similar to your scam email use case.
imagined one solution would be to have another ai peer-reviewing prompts (or several reviewing diffeerent parts of the prompt) looking for suspicious things
here, i'll give you this one for free, it's alien technology, but be warned, it's complete slop
https://github.com/steveyegge/beads
and yeah, my email reader is off right now
because i'm worried there are enough openclaw instances out there that someone is going to start actually doing prompt injection emails