#hardware
1 messages · Page 5 of 1
too old and cooky
I can't understand it. There was a period of time where I wanted an iMac so bad as a photographer and graphic designer, and that lasted for about a year in 2010 or so, and then I realized, wait, I can just calibrate my monitor and it's just as good. Wait, I can calibrate it and then also just buy parts on eBay and put them in, and now all of a sudden I'm performing better than that thing would have done? Oh, never mind, I'm good.
Good thing Obama passed that credit card law, otherwise I would have been stuck with one. Feeling stupid.
No more bucket hats and no iMac for me.
Whoa, I just tripped myself out because when I said no more bucket hats, I guess I was referring to past tense because I went to Ross about an hour ago.
I swear to God, I totally forgot that I even bought this stupid hat.
Oh, you mean me? Fuck, I guess you're right.
Listen up youngins, you guys are trippin'
Okay, get the fuck out of here. I like your photography, by the way, but you're a birdwatcher, and I know damn well birdwatchers are the oldest and the kookiest, and if you're not old yet, you will be, and when you get old, you'll be the kookiest.
Also, did you get a partnership with MyRadar? Cause that's pretty dope.
haha, thank you so much! def true about birders and photographers. Some of the weirdest craziest people out there 😂
my Claw now has roomba capabilities
it can just show up in my room and grab my sock and put on my lap saying "go do the laundry lazy mf"
That's funny, I know. I happen to live on a peninsula in the Northeast in which migratory birds are extremely abundant and rare, and so everybody who lives here happens to have at least one lens with a tripod mount on it per household that cost ten grand each. If you wanna go to the park and go for a walk with your friends and talk, forget it dude, it's bird watching time. I tell them to kick rocks.
Mostly because I can't afford a lens to shoot fucking birds with.
I've had some success with https://huggingface.co/mradermacher/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-i1-GGUF
I’m running OpenClaw on a 16GB M1 MacBook Pro I bought off Facebook marketplace for $400 because it had a cracked screen. I just plugged it into a cheap travel monitor for setup and now run it headless. It connects over my network to another 64 GB M1 MacBook Pro Max which is running LM Studio for the LLM.
So far, this architecture is working for me pretty well. I’ve blown through 17M tokens in the last week and it hasn’t cost me a dime in api costs.
What is your preferred model?
Im still dialing that in. I was running Qwen 3.5 in different variants and liked it, but it kept throwing <tool_call> code into the chat and aborting operations. Once I switched from Ollama to LM Studio that went away.
Currently I’m using Ministral-3-14b-reasoning. I like how quick this model is for operations tasks, but it definitely hallucinates to an infuriating level so I’m going to be switching to another model for creating content. I tried to have it draft an email yesterday for me and told it 4 times to stop putting hyphens in the middle of the sentences and it just kept doing it.
Ok - curious to learn when you have access to a great local model
At first when I read this, I thought, okay, that's reasonable, but now that I'm thinking about it, that's still too expensive. 16 gigabytes, and then it's an M1 that you can't upgrade. That's... I hate Mac.
Also, my bad, I'll stop the anti-Mac rhetoric here. I'm obviously in the way and not contributing positively here. Lol
Just keep in mind you can get a Steam Deck for less than that, way less than that, and be running Arch Linux on a capable device. You can mod it as well if you please.
(My Steam Deck is essentially what i use, similar to how everyone in here uses the' M1 through 4s lol.)
Why would I need to upgrade it? It’s literally only running OpenClaw. Watching activity monitor this machine is overpowered for what Openclaw needs.
Me too. Haha
yes, it's your opinion, and that's all it is, it doesn't bear any relation to reality. it's not 1995 anymore, lol
M1s were good chips. PC makers are still trying to compare their chips to the M2, lol
In 1995 Macs were revolutionary and changed the whole entire computer industry. Learn your history, kid.
That was only a few years after a GUI was made
haha that's not what happened in 95
Thanks to Steve
learn your history. I grew up in it, it's not history to me, kid
I understand. I mean I've seen the computer systems and we certainly didn't have a Macintosh but I sure did have the flag thing with the rapist guy.
Hey hey hey, you didn't catch my casual stoic sarcasm. I was just being a silly jerk, is all. I wasn't really calling you a kid, man.
84 is when the GUIs were made; when the Mac was "revolutionary". 95 was when Win95 came out. Remember Win95? Remember people lining up to buy an operating system, but they didn't know what an OS was, or even owned a computer, because they were so hype about it that they wanted it?
Well, I remember all of that, because I was a conscious human being at the time
And hey in life all we all really have is time and whatever you think and so, other than that, nothing really matters. Let's make cool stuff; that's all I care about. We can still work together.. But like dude, are you team PlayStation or Xbox? Can we fight about that instead?
Yeah I'm aware man but you gotta remember that it doesn't matter when it was made if it wasn't in anybody's home
Or Office. You can recall the iMacs with the color back. The big bubble thing is the first computer let's be real.
Those came out in about '95 and I believe they came out after Bill Gates stole the ideas.
i notice the goal posts are just whizzing by now
We got all those corny commercials
Oh man, I can't even predict anything anymore. I just stopped trying.
hey, did you know windows 2.0 was when bill gates stole the ideas?
I don't even know what the latest hardware is anymore, and that used to be something I always knew.
and yet you have strong opinions on the value of apple hardware for money
cool conversation
No, I don't necessarily know the fine details of everything. I know Windows 3.0 was poppinn my dad3.0 was poppin' though, my dad was mad excited. Fucking doing spreadsheets, haha.
i'm just saying that if you don't know what modern apple hardware is capable of vs modern pc hardware maybe don't write a page long screed about how much worse it is
I didnt say i wasnt aware of price to performance or benchmark comparisons
I follow the economics like more than I should, because, again, I'm not familiar with the components or like, I don't know how much VRAM is in a NVIDIA GPU these days. I just don't care anymore because I'm not buying one.
and don't act like you know the history of computing and look down your nose at other people who literally lived through it when you don't actually remember anything lol
I guess maybe I studied that because I'm like planning for the future of buying old hardware someday, sometime, when life is less expensive, but it doesn't seem like it's gonna be coming anytime soon.
Bro, I don't look down on anybody. What are you talking about?
ok great. so why do you care so much about what other people are buying when you don't know what they're buying
If you wanna be a sensitive little bitch, I'll call you a bitch and you can be a bitch, and that's fine, but like, I was just having casual conversations and friendly banter here, and that's why I made that clear after every time I said anything negative.
wow
I'm just saying, I was trying to be cool with you.
i'm sorry you've lost so hard you don't have anything else for me but insults
No, I don't have any insults for you. I said if that's what you wanna do, then that's what we'll do I didn't actually call you that, though, you know what I mean?
I'm saying if that's what we're gonna do, then that's how it'll be then. But like, I'd rather just be friends.
always funny when the person claiming someone is sensitive is the first one to actually resort to insults
Just hop on a call with me, bro, because I don't think we're gonna solve this through reading texts alone at this point.
honey, don't hide behind silly word games
you called me that
just own it
wow
what a coward, lol
And again, I don't care about arguing over computer specs or anything. It's just friendly banter. It's also one of them things where, like, I've had an Android for years and like, just because I thought the phones were cool. And I mean, like, I get excluded from group chats and shit and people just are always giving me shit. I go on a date with some chick and it's like, oh, green text, red flag every single time. And so I just, I talk my shit, but like, I don't really care. What the fuck? What the fuck do I care about what other people are doing? Though, if people are gonna ask me for advice, like, you know, I did say, I don't think it's worth it. And then, really, this all started from a genuine question when I was asking, like, like, for the purpose of running Claw, I just didn't understand why one would buy a Mac for it rather than like, uh, one of those mini computers. That, that's a genuine curiosity.
lmao
It's not word games, you're being a bitch, so I'm gonna call you a biznatch.
right. I know. I know you did that. That's why I said you did that. Then you tried to claim you weren't doing that.
Anyway I hope that was entertaining for the peanut gallery
Enjoy your mutes both of you
For everyone else, go read the #rules again, esp rule 3
Thanks for reminding me that there is a mute button. Ha
Does anybody have a DGX Spark cluster? I bit the bullet and ordered a second Asus GX10, and I want to make sure I buy a compatible interconnect cable. The Asus branded cable appears to be out of stock.
Looking to buy mac ultra m3's with 256GB or more of ram. Anyone have a lead or wanna sell or invest it hit me up.
Anyone have a "Jarvis-like" voice interaction app that i could install on a Android smart watch to "PTT" or keyword trigger / TTS response style interaction with OC? My Kids are requesting that.
Interesting, which watch?
I bought a ClawBox... The 67 TFOP one. Brilliant..
what is ClawBox and how much?
I have 2 running in cluster
Which cable did you buy, or did you buy the pack that included everything?
it was all in a package we got from Dell ill see if i can get more info
what is a good model to use at home I am trying the nemostorm i think is the name 122b i think.. but context window is a bit small
@stiff tree openclawhardware.dev
Some very cheap qsfp cables that wont do the 200gbs is what it seems we have 🙁
Oh no, that sucks. Especially for a bundled solution. I’m cautiously optimistic the one I ordered from NADDOD will do the trick.
yeah have not looked into replacing them yet. trying to find a good model to run for the claw
Samsung Galaxy Watch 5 in this case
That’s not really carbon isn’t it hahahah looks like 3D printed case with Jetson for 550€ 😂
When loading local models, having 1GB storage doesnt seem to be enough. Is an external thunderbolt 5 drive good enough or should I be boosting internal storage with NvME, eSATA ? Comments , thoughts, insights.
are you talking about storage to save the model files? You'll need a lot more than 1 gb. if you're talking about gpu, 1 gb isn't going to hold much re: model size. Personally I think you need 24 to 32 gb vram (gpu) to start getting working / useful models. 12 gb vram is probably a minimum.
Asking mor about internal storage like harddrives / SSD / NVME the things that hold all my pr0n
There's entire petabyte SANs for that
Asking how much storage you want is about like asking how much house do you want - the answer is generally something like "as much as you can afford". Personally, I think most people should get a minimum of 1TB of fast NVME storage just for OS, applications and frequently accessed files. Then consider exactly how much storage, and of what sort of of frequency you access it. Try as you might, its pretty hard to watch 10 pr0n films at the same time - so that can be exported to slower, external storage if needed. But if you want to keep your entire Steam library downloaded at all times, then you probably want 2 or 4 TB of fast storage (games generally don't run as well from external storage).
Personally, my OS drive is a fast 1TB nvme. My game drive is 4TB fast nvme. Then I have 30TB external storage in my NAS for backups of all my linux ISOs.
If its just openclaw on a mac mini/strix halo/dgx spark, then a 1TB nvme will do fine. Even 120B models aren't that huge (storage wise - you'll run out of VRAM long before you run out of storage space).
Ok. Given I have never played with this stuff and am looking to get a new machine to play with this stuff. Are the models sucked into RAM initially and then accessed or are they kind of like a database where you read stuff here and there.
You could think of the models as a database that must entirely be loaded into VRAM (doesn't work well in system RAM). Loaded onto your video card's RAM. But exceptions here are important - all 3 of the items suggested (mac mini/strix halo/dgx spark) have shared video RAM and system RAM. The fully shared RAM works well. If you got a more traditional PC, then you'd need to look at both seperate RAM and huge amounts of VRAM (which only comes with whatever GPU you buy).
ah perfect. Thanks
what are the buzz words one would google / duckduckgo / askgeeves to understand this more ?
If you look for "openclaw hardware" you'll get a ton of hits
@spiral vector thanks for all this. Appreciate helping me get started.
I'd recommend spending some time to understand the basics first. Then read into https://github.com/explaindio/ClawEval/tree/master - I think ClawEval is probably the most comprehensive list of what local models are good for OpenClaw. My only real complaint with this analysis is that they ONLY compare the open source models (and for some workloads - you really just want to run Opus 4.6 or GPT 5.4, despite how expensive those API costs can be.)
But ClawEval generally approaches this discussion from - here's what different models can do (from which you can derive what sort of hardware you may want). They do have a good docs section that goes into detail of what you get at each of the various levels of VRAM, but they don't directly recommend hardware. And because they don't consider HW (they just do cloud hosted LLM comparisons), they don't really analyze the 3 setups that I think are probably the most interesting (mac mini/strix halo/dgx spark).
I see ClawEval added "Which Tested Models Fit on Your Hardware: section with that for Mac Mini M4 (16 GB) Mac Mini M4 Pro (24 GB) Mac Mini M4 Pro (48 GB) Strix Halo (96 GB GPU) DGX Spark (96 GB GPU)
Hey guys
Sry if stupid question
Mac Minis are sold out everywhere in Aus, looking to get this instead. Thoughts? 7 MAX Mini PC (2026 Flagship Performance) AMD Ryzen 9 7940HS 16GB DDR5 (Up to 128GB) 1TB SSD Mini Desktop Computers, Radeon 780M Graphics/8K Quad
Depends if you want it run local more or just open claw. If just open claw pretty much any comp with 16GB will do
why does openclaw run thru my tokens way faster then my claude does
Becuz tokens r like mc Donald's nuggets for ur bots :p
had one agent only and it went them in less then a hour doing simple tasks
Are you able to run any local llms?
im not sure, new to this whole thing as of today how would i know if i can
Whats ur pc specs?
the pc my open claw is on, 4060ti 32gb ram and 2tb ssd
Good hardware. U should be able to run a local 7B ollama model for simple stuff so u dont burn credits
Then maybe have an hourly cap for api models
i was just using sonet 4.5
figured its not too bad
U should try setting up qwen2.5-coder locally
how would i learn how to do that
Whats ur Operating system?
windows
HTTP 401: authentication_error: OAuth token has expired. Please obtain a new token or refresh your existing token. (request_id: req_)
also do you know how i fix this
i cant use my openclaw at all rn
bc of this
U most likely reached an api limit
Check this vid out https://www.youtube.com/watch?v=z7fhyKBAfzE
nono im far from it. how do i re-enter my token
Ur using claude right? Maybe log into claudes panel n regenerate ur token
i cant use claude it doesnt know how, and my open claw wont work due to the issue.
Try this
openclaw doctor --fix if ur running from command prompt
how
my actaul claw has no brains rn since it doesnt have a token
forget this i have a api key, how do i use it
ill just take on the costs
also that didnt work just so u know
Here check step 6 and 7 should be helpful
https://open-claw.org/docs/openclaw-setup
im assuming ur using the interface?
i just did powershell -c "irm https://openclaw.ai/install.ps1 | iex" and followed the steps
worked for hours
then when i came home and tried to prompt it failure happened
okay bet when u type openclaw into powershell does it run anything
yes it loaded a TON of stuff
okay good paths setup
try this
openclaw onboard
(should guide u through the setup dialog where you can place ur api key)
would this be making a new agent?
No just for setting up api key stuff
whats the best model
for just all around use
sonnet?
and if so witch one
wont drain tokens as quick
do this after ```
openclaw gateway restart
turns out.... i was using sonnet 4-6
explains token use honestly
i think sonnet 4-5 is good enough right?
ok
should i do skills during onboarding
idk anything about what that does
hmmm
depends what ur goal is
For cheap all around models i'd suggest Gemini 2.5 Flash-Lite GPT-5 Nano DeepSeek V3.2 Mistral Nemo just wing it n go off vibes
thats like an elite model xD
yeah i figured, i mustve misclicked and thats why i drained tokens so QUICKLY
xD
double check ur key usage has locks so u dont wake up with a $50k bill tmrrw from openclaw trying to draw ascii art in a loop
i'd prob try n change to a diff model honestly
4.6 costs same as 4.5 and is more efficient
whats a better model
ur premuch trying to wire in expensive models n hitting ur limits within an hour
yea.
one of these
or Claude 3 Haiku
hm ill look into those tonight
Claude 3 Haiku if ur sticking with claude stuff,
Gemini 3 Flash if u want it dirt cheap
ur paying $0.25 per 1million tokens (Claude 3) Vs googles gemini 3 flash at $0.075 - $0.50 per 1 million tokens
Damn Claude 3 cheap
how could I make a agent who uses that
To do easy work while my other sonnet agent can do hard tasks?
shear will power and coffee
lol.
I am so interested in learning all this stuff
I don’t want to fall behind the inevitable
knowledge is delicious 🎩
dont think like that just have fun ;^
it’s hard not too
I’m having fun while figuring this out
gun did you make it to the other side yet
Kinda I just got a Claude api key instead of pro plan. I rather use pro plan and just extra costs but idk how to make it work
I can help
how
If you purchase the claude pro sub, you just go to your terminal and run 'openclaw onboard'
it takes you back to the initial setup
and on page 2 or 3 where you select your AI model, you arrow down to anthropic, select it, then select OAuth.
It opens a webpage, you sign in with the email you subbed under
then boom youre in
really?
openclaw gateway restart
I’ll try this when I’m done eating
I warn that its against their ToS so you risk a ban
if you dont wanna risk that for $20 you can try it with openAI instead
but I havent been caught yet
Try to keep a text file for future reference for commands you use. so u dont have to research the same stuff over n over
you can also just call apcx subagents with the claude CLI which uses the pro sub
but you didn't hear that from me
this isn't an option, it asks for setup token or API key. Setup token route is broken, expects a token with a prefix that's not there.
Planning to run local llms, tried to run via tokens and its burning through like oil 😭😭 At that point 16gb of ram isn’t enough is it
Unless I get the Macs?
i recommend github copilot as the provider, much cheaper than token based providers
O wat interesting
Is the Mac minis chips actually just that much better
I should just be patient for the Mac mini restock tbh 🥀😭
opus is 12cents a request vs when I tried the same call on opencode cost me $4 for a similar request
yea it's wild lol
a mac mini isn't gonna run local LLMs very good, you're still gonna be using API lol
Ahhhh
my pro tip as well, get a claude pro/max plan to build your bot
don't use the bot to build the bot
it's way too expensive to do that, learn from my pain
So at that point these specs are fine?
I tried doing that yday with Claude to teach me to teach the bot 😭
Then Claude started getting impatient with me
XD
I don't fuck with windows anymore other than gaming
you have to run it in WSL
and last time I setup WSL it was a pain in the dick
Linux only?
linux or mac, bash based
I think that's what's reccomended on the official website as well
Probably only a dual Connectx-7 dgx stack can run LLMs properly locally
A ryzen AI + 395 Max, 128 LPDDR can run 70b model and 120bq4 - gmteck evo 2 or asus gx10 are couple of machine that have that chip
dgx stacks don't run 395 Maxs, they run blackwell
But yes those are also solid - NUCs like gmktec, geekom but also Dock-extended BeeLinks are all capable of running them locally (imo a lot more valua for money than anything Apple has to offer)
I do think a dual stack dgx - like the gx10 you mentioned, there's a couple more I think one from hp and from msi as well that run the same blackwell chips; basically dgx architecture - those are currently the peak of mini-hosts for LLMs
aren't people daisy chaining mac studios to do it too?
Boils down more expensive and less value for money and I think it throttles a bit more no? Not sure on the full details
yea not looking to drop 20stacks to experiment lol
can normies readilly able to buy those nvidia boxes?
I mean you can buy 1 with like a 200B parameter tollerance roughly for about 2 grand which is pretty standard nowadays seeing how a phone is 1500
like people can just buy these nvidia boxes? like on amazon or something?
look up gx10
and 400/500B?
what do i need to have to run those models
400 you can do a bit at rate limit with 2 DGX Sparks in parallel as I think they can do up to 405B with a connectx-7 connector
above that, you have to stop looking at mini-PCs and start looking at H100s Platforms or so
But you're looking at 10x the cost for a leap like that
Still also have to factor in that the throughput difference of those machines are like 10-50x the difference as well tho, like H100s, H200s, B200s
for 400b (like qwen3.5-397b) a mac studio is actually pretty reasonable at running them for the price
prefill sucks so keep contexts short, use smaller models to summarize / call tools, etc, but for one or two convos it works
also just run quantized models, you can fit the 6 bit 397b in the 512gb with tons of room for context that you will likely never fill because see aforementioned point about prefill sucking
but is there a big difference between quantized and full model?
almost nothing at 8 bit, 6 bit you shave a bit more but it's still really close, 4 bit is like the last stop before real degradation happens
but you don't need to go below 4 bit unless you're trying to run glm-5 or k2
256g m3 studios are like 5k with the education discount. do you know literally anyone who is currently in school or works in education
me
blam
how much is the discount?
go search "apple education store"
it's like 10-15% i think? it covered more than tax for me
im living in europe so its 6,5k euro
after edu discount? damn, that's annoying
but with the 60 core gpu
jap
its like almost 10k without discount full setup 256gb
yeah without edu discount it's 6k before tax in us
are u from us?
yeah
or should i wait for the m5?
Ye, can't run em on the minis
Gotta go bigger
I was thinking that with the stock grade Mac mini
I was thinking about getting a 32GB
Idk maybe I'm just biased but I don't see the value - I think you can literally get a Geekom/gmktec/beelink for roughly the same price but like 96-128GB Ram
is that for one of the apple models ?
Yes apple makes macs
someone using lepotato soc hardware?
do you know a good model for cold emails?
i just saw 700€ discount from 7300€, so not really 5k...
gonna be getting a Mac Mini for my bot, Clawy (and switching him to a local model, hopefully that doesn't affect his ability to post on Moltbook), because honestly the 128k usage token limit for cloud models that Ollama offers for the free tier is pretty limited
@silver ginkgo, Openclaw isn't affiliated with Moltbook. Moltbook is a separate user-developed project, so we would prefer it not be discussed in this server.
ok
Why are all the Mac minis sold out.
Intersection of 3 points. 1 - the success of openclaw has really driven demand more than apple expected. 2 - Apple is in the process of switching their M4 lineup to the new M5 lineup. 3 - global RAM shortage (Yes, I know the built-in memory on mac silicone is different, but at some level people will go for whatever PC parts they can get)
Yeah it has gotten insane.
Its crazy how much demand openclaw specifically, and AI in general has caused the prices in the market to be SOOOO high
anyone still using a 2018 mac mini for their agent?
Yeah. But for a tiny sandbox box or a lots of ram lets load a model box
we can use a VPS right? why Mac Mini 🤔
Is a 16GB M1 Mac Mini good enough?
Good enough for what exactly? Yes, its good enough for some things, no not goot enough for all things. If you scroll up a bit I repasted the link for "claw eval" on github. They do a great job of detailing what the different models can do.
Personally, I played with openclaw in a VM on my NAS - connected to various cloud service APIs. I never was able to get it as locked down as I was comfortable with - although now with nemoclaw from nvidia that seems improved (but still not completely fixed). When Claude code rolled out their remote access and now claude code channels I jumped back into that. (Which fits great on any old mini-PC.)
claw eval just posted MiniMax 2.7 tests for openclaw agents
Hi all,
Evaluating my setup's cloud cost equivalent and curious about your experiences. Here's what I'm running locally:
Compute Nodes:
Node CPU RAM GPU/Accel Cloud Equivalent
Brain Ryzen 5 4500 (6c) 15GB RX 550 4GB ~$40/mo
Nebenhirn Ryzen 7 2700 (8c) 31GB GTX 1650 4GB ~$60/mo
Muskeln - 62GB RTX 2070 Super 8GB ~$150/mo
LubanCat 4x ARM 3.8GB - ~$15/mo
Pi5_1-4 4x ARM 16GB total - ~$20/mo
Kleinhirn 2 RK3588 2GB 2GB NPU n/a
Kleinhirn 3 RK3566 2GB Mali GPU n/a
HP Notebook Ryzen 5 5600U 14GB Vega iGPU ~$35/mo
LLM Stack:
Brain: Ollama with qwen3:8b (local), ATXP fallback for complex reasoning
Nebenhirn: SD + Ollama (GTX 1650)
Muskeln: SD + Ollama (RTX 2070S)
Totals: ~160GB RAM, 16GB GPU VRAM, 70+ cores, 2x NPU
Cloud equivalent: $320/month
My cost: Hardware already owned (€800 invested), ~€15/mo electricity
For those running local LLMs: at what point did you break even vs. API costs? And what's your "too big for home, must go cloud" threshold?
Context: trying to justify keeping this running vs. just using GPT-4 API for everything. The privacy aspect weighs heavy, but so does the electricity bill. 😅
I will check it out thanks
Appreciate any feedback.
Just out of curiosity as I've tried every other issue, is this HONOR MagicBook Pro 14 2025 14.55 inch ARL Ultra9 UMA 32GB SSD 1TB Grey Windows 11 good enough to run Ollama and qwen3:32b with open claw?
maybe it common knwoledge, but does an 8b model works ok? what could you do with it? (for claw I mean!) Im spoilt with larger local model and Im not sure I have tested this one.
No for local model. Openclaw itself can run very well with cloud llm providers
Thanks for replying. Do you know if any of the Ollama models will run on the machince with Openclaw?
any LLM that fits in your Ram... use LLM Studio + Hugging Face, with the one click option on hugging face you will see the size and if i fits befor downloading.
Openclaw itself is a kind of binary, does not take much cpu. It is the ollama or any other local tools that runs you can use. In fact run those in docker continer
Hey Im new here. Heard about openclaw for the last 3 months and now finally have time to jump in. I guess this is the section where to talk about hardware. I realize mac minis are hard to get nowdays so I may have to redirect to macbooks, for running local llms is it ok a macbook pro m5 with 32 gb? When I ask to llms they say yes and no and I can see on forums people saying yes and no. So before jumping in I just wannna make sure I can still run some models. i dont need the high end ones, I just want to have a feel of jarvis at home and go from there. If it becomes vital then Ill upgrade to mac mini or studio. In the meantime so is a macbook pro m5 ok for a 32b model or lower ? Thx for answers, let me know if theres a section where I can get those answers already
I would look at peoples results with the M4 mini 32gb, if it works there, it should work on the MBP M5 32gb, but yours would be a little faster I think, though 32b might be tight, you need some room for the OS and other processes! asking claude about this:
*Yeah, that's solid logic — same unified memory architecture, same memory tier, so if a model runs well on the M4 Mini 32GB it'll run at least as well (and ~25% faster on token gen) on the MBP M5 32GB. The chip difference doesn't affect what fits, only how fast it runs.
For their specific question about 32B models — that's actually the tricky boundary at 32GB. A 32B model at Q4 quantization needs roughly 18–20GB, so it fits, but leaves little headroom for the OS and context. Q8 of a 32B would be too large. So the honest answer is:
7B–14B models → runs great, multiple quant levels, no issues
32B at Q4 → fits but tight, performance will be acceptable not great
32B at Q8 or higher → won't fit cleanly
70B+ → no
So for a "Jarvis at home" vibe, they'd actually get a better experience targeting a well-tuned 14B (like Qwen or Mistral) than a cramped 32B. The 14B at Q8 will feel snappier and more capable than a 32B squeezed into Q4 at the memory limit.
The M4 Mini 32GB benchmarks would be a perfect proxy — same answer applies to the MBP M5 32GB, just faster. *
Thx Kevin. So basically, bringing back down to 7/14B models quantized a bit should work from what I can read. The thing is Im so used to macbooks and I dont wanna wait x weeks before getting hold of any mini if I can already create something that just works on a nice mbp config. i think Ill jump on the mbp. Cool
yeah, you definitely should be able to get started and try some things out before committing to new hardware. i.e. maybe it's "Jarvis like" , but smart enough? The more vram the better, but it's not clear to me where the jump is in functionality between 24-32 and more (48, 64, 96... ?). It all depends on what you are doing with it. check out claweval as well, they don't specifically show the 32gb mini, but do show 24 and 48, so somewhere between the two... https://github.com/explaindio/ClawEval/tree/master?tab=readme-ov-file#-which-tested-models-fit-on-your-hardware - not sure if there are speed results in there though, just model test results.
actually, re-reading your response, I thought you already had the MBP M5... the nice thing about the mini is that it's kinda meant to be running all the time, at least more than a laptop? I am actually running my OC on an old windows laptop I was longer using, put ubuntu on it (had it on an M1 mac mini, but I have other personal stuff on there, and wanted OC on a fresh machine without access to any other personal stuff), but I wonder if having a laptop running in my wiring closet 24/7 is the best long term strategy. next step, raspberry pi 5 🙂
Just so you know Ive worked at home for the last decade with a mbp constantly wired to the wall socket. Never had any issue. So imo a dedicated mbp for OC seems the best fit for me. Well see anyway.
Question: Is anyone running a Mac Mini cluster?
hi, I am AI Hardware Engineeer, new to this wonderful group.
I dropped a new roadmap article comparing the Mac mini M4 as a 24/7 OpenClaw server to a Jetson Orin Nano 8GB edge appliance—when each wins, how to squeeze real inference out of 8GB UMA, and a privacy/security stack from hardware through skills (no vendor hype, just tradeoffs).
Have you had any luck? I’ve spent two days now trying to get openclaw running on my 2016 15” MacBook Pro. When I started it was running Sequoia via OCLP and had irreparable dependency issues. Based on bad info I wiped the machine and downgraded to Monterey only to run into the same problems. After struggling all day today I’m worn out. Both ChatGPT and Grok have led me in circles trying to repair the issues and get it running. Now I’m wondering if I go back to Sequoia if I can maybe run openclaw in Docker? Turns out Docker is not supported in Monterey anymore so that was a dead end. Sigh.
Ask Claude to diagnose issue.
Good morning,
I've been trying to set up the OpenAI, Gemini, and Anthropic APIs for a few days now, but I haven't been able to get any models other than OpenRouter to work.
I’m thinking of buying a PC to install some models locally since OpenRouter. I’ve seen one with a Ryzen 7, 32GB DDR5 RAM, and an RTX 4070. Will it work? Can it be configured to use the models locally? Many thanks
I keep hearing about cloud models. Is it not possible to run a local llm on my Mac mini m4 16gb without it being either super slow or unresponsive ? Wondering if anyone’s cracked this code yet.
Hey good morning! You don't need a new PC for this. The API keys for OpenAI, Gemini and Anthropic should work fine — its usually a config issue. What error are you getting when you try to connect them? Happy to help you troubleshoot.
If you do want to run models locally thats a different thing — that setup with Ryzen 7, 32gb ram and RTX 4070 would work for smaller models through Ollama. But honestly for openclaw you'll get way better results using cloud APIs like Claude Sonnet or GPT-5.x. Local models are slower and less capable for agent tasks. I'd fix the API setup first before spending money on hardware!
Thanks a lot for the help, bro.
The problem is that when I try to run models using an API from any provider other than OpenRouter, I get errors like:
⚠️ Agent failed before reply: All models failed (2): openai/gpt-5.4-mini: Unknown model: openai/gpt-5.4-mini (model_not_found) | google/gemini-3-flash-preview: ⚠️ API rate limit reached. Please try again later. (rate_limit).
Logs: openclaw logs --follow
I've tried renaming the templates to create them, but nothing works for me except OpenRouter...
Plus, the token consumption is high for relatively trivial tasks like following companies on social media.
Thank you for help
OpenAI-codex limits are quite brutal. Also, have you upgraded the openclaw for gpt-5.4 support ?
On telegram, use the message, “/models OpenAI-codex “. This will show you the models that are supported ..
I originally tried ollama with a 4070, but personally I don't think 12 gb is enough gpu for local models. I was using ollama with that, hadn't discovered llamacpp, so could have gotten better times, but was limited to smaller models... qwen3:8b or qwen3.5:9b with 32K context. I've upgraded to a 3090 with 24gb and am much happier with the results and consistency of the bigger models. My test results here with the 4070 https://github.com/khaney64/ollama-model-tests/blob/main/reports/recommendations-4070.md
I definitely haven't updated—I just saw that I'm on v2026.3.13... Thanks a lot.
I haven't tried Opencodex yet; I'm going to see if I can run the “doctor” tool to clean up all the misconfigured models and reinstall them.
Thank you very much 🙏
I'll definitely check it out. Thanks a lot.
Yes, sir, it was the update... thank you very much.
What’s the best Mac mini config? 32gb ram?
Hi, i'm running deepseek V3 and R1 on VPS (much cheaper) but i was wondering how is the difference running on local infra, do thinking mode is different from an llm to another changing the infra will not solve the model issue. i moved to deepseek because anthropic cost were high. i configuration on VPS is 16go memory and 200 Gdisk space
i had this issue as well, it is just config issue, you can easilly fix it with claude code
max it out if you can afford it/get a good deal etc.
i got 8gb ram
did anyone else's windows 10 LMstudio stop yesterday. some trojan reported, likely false positive as per reddit, for version's 0.4.7 main/index.js file
my windows 11 has not reported this file.
I did moved the file back from quarantine but LMstudio since didn't launch its gui anymore on windows 10. file size is identical to the working windows 11 version.
so you follow localllm eh?
I read the thread and the bad version was live for ~hr.
If you feel you were exposed in that hr, rotate all affected passwords/keys.
If you are not cutting edge update person, simply update to the latest.
Thank you for this tip! I didn’t realize how much better at this Claude would be than ChatGPT and Grok. It got me up and running so fast it was shocking!
Yeah I had the same sentiment. glad to help
hey guys, im in doubt about what OS to use for hosting openclaw with an anthropic model?. A bit of context, i have an HP elitedesk 800 g2 sff with extra ram and im gonna use that for the hosting, in general im gonna use claw for little things like, read all of my newsletters and create like a newspaper for it, set reminders via whatsapp and/or by using voice messages, use the Productivity skill in the clawhub and so on.
are you comfortable running linux?
yes, my daily drive is linux
link does work
Any opinions on hosting openclaw on android phones? Do they work?
it's the LocalLLaMa reddit I have read that, from a search result.
oh, was it actually a real thread? I had version 0.4.7 running for quite some days and suddenly the windows defender reported it, but just on windows 10 not 11. it was my first ever install of LMstudio. straight to this version 0.4.7
What is the cheapest way to run a 24/7/365 agent?
Yes I have running on S25 Termux Terminal, it does have some limiation on multi agents but works. And is stable 🙂
Kind of depends on what you want it to do. Like, technically, qwen3.5-0.8B exists.
tho far from capable 😄
Correct!
I'm using glm4.7 flash and qwen 3.5 30b and they are still quite dumb at the moment.
reinforcement does help over time, but i dont wanna nudge them from time to time
which models do you guys have experience and are generally good doing tool calling? (LLM)
mini pc or even a Pi.
I’m running on Raspberry Pi and very happy with the results
Hate to say it here, but cheapest option for 24x7 agent lukely isn't openclaw at all - its just a $20/month claude or antigravity subscription - depending on your level of weekly usage. I dont mean running through their API (that's the most expensive option), rather just use the remote access options and run locally in a VM and access it from your phone or via VPN. You get a good amount of sonnet 4.6 usage for $20/month - even more haiku.
But, yes - if you want high usage, or if you're OK with running small models, then open claw on a cheap mini PC works. Cheap(ish) used mac mini's with apple silicone, or AMD Strix work best for local models. Local models really need 32GB VRAM. If you're just using openclaw to connect to cloud model API's, then you can run on any cheap PC with 8 or 16GB ram, 2-4 core CPU and minimal storage is fine.
I always point people to claweval on github to get an idea of what models you want and what you can run.
I am using qwen 3.5 35b a3b and i had almost 90% sucess rate on performing tool calls 🙂
now i have to hook it to comfyui as tools
i just read about comfyUI, it sounds fascinating. can it be applied elsewhere?
Hey guys I'm new here. I'm tasked with building a OpenClaw setup but I'm having trouble figuring out the specs for the hardware. My boss wants it to mainly work as a bot that will search market data and trends on the internet for a specific market sector. The think is i don't know if i should move foward using a local LLM or Claude API. The RAM specs for each situation differ a lot and in my country MACs are much more expensive than other hardware. Should i still get the mac mini? I'm know a bit about LLMs but that's not my expertise.
local llm or api
api of course
ok
OpenClaw will run on just about anything. However, based on the description of what you want it to do, I don’t think you even need OC. I know you could schedule those sorts of recurring searches on Perplexity, and I imagine on OpenAI and Claude as well. Pay $200 a year and be done with it.
I kind of agree with you, but unfortunately I have to follow orders 😅. In this case, I was told to use OpenClaw specifically. That being said, I’m trying to do my best not to spend too much company money (since I might be blamed later), while still building something that works well enough (I’ll later have to maintain it)
Understood. In that case I would go with an API solution. That keeps the upfront costs low since you won’t need powerful hardware. It does result in a recurring monthly spend, but they can adjust or turn it off if it’s too expensive.
Thank you, i was really lost there
love the respect here
wrong tool for the job
under orders
ok, well heres how best to work with that
🙂
Hi Watson, thank you for your reply. I have developed an application to control the android phone camera, sensors, calls, torch, etc but it is not stable.
Wondering if installing openclaw on android phone would be not be a good idea unless the phone is powerful enough for stability.
for research probably want a high-end cloud model (different from what hardware to run it on) from claude (opus/sonnet), has to do web search, go through all the data, put together analysis, present it in whatever way you want it (email, telegram? ppt?). maybe try it first using the primary path (ie. claude website/cowork or chatgpt website) with the queries you want to try and see how well they work before you hand it off to openclaw to run on its own and hope it works. Using frontier models aren't cheap, just a warning, and it takes a bit to get used to how openclaw works with memory/context, tons of info out there, just have to play with it.
A pc powerful enough to run a web browser will work. I dont know about if windows will run openclaw but I can tell you a low end pc running linux works just fine.. all the heavy work is done by the model provider (anthropic, or whoever)
Thank You
Hello fellow DGX Spark owners. I have my two Asus Ascent GX10s clustered, and I was running Llama-3.1-Nemotron-70B-Instruct-HF for most of the day. I hated it. 😂 Super annoying personality, but the real problem was I had to drop conext window to 32k to squeeze past the CUDA graph step when bringing the model online. Anyway, I just nuked that, and I am going to give Qwen3.5-122B-A10B-FP8 a try. Any other recommendations on models you have liked running on a 2-node cluster?
You’ll like 3.5-122b
Wish I had pulled the trigger on a gx10 while they were still cheap but
I hear you. I bought the second one for too much through Best Buy, but I had gift cards that I got for a 20% discount, so it basically cancelled out the extra markup. I figured if I waited any longer, it would only get worse before it got better.
I'm running a very limited PC, I have two rtx 2070 supers with an NVLINK bridge installed on pop os linux. Right now qwen3.5 9b seems to be the only one that fits - is there somthing I'm missing here? Every time I try to run 27b it grinds to a halt
(openclaw is running on an unprivledged lxc container on my proxmox host)
Question, if i'm on like a budget is this a good build for openclaw?
GPU: ASRock Radeon AI Pro R9700 Creator 32GB — $1,299.99 — https://www.microcenter.com/product/702444/asrock-amd-radeon-ai-pro-r9700-creator-single-fan-32gb-gddr6-pcie-50-graphics-card
CPU: AMD Ryzen 7 7700 — $253.99 — https://www.bestbuy.com/product/amd-ryzen-7-7700-8-core-16-thread-3-8-ghz-5-3-ghz-max-boost-socket-am5-unlocked-desktop-processor-silver/JXKQHH52X5
Motherboard: MSI MAG B650 Tomahawk WiFi — $219.99 — https://www.microcenter.com/product/659662/msi-b650-mag-tomahawk-wifi-amd-am5-atx-motherboard
RAM: Crucial Pro 128GB (2x64GB) DDR5-5600 CL46 — CP2K64G56C46U5 — $1,242.99 — https://www.bestbuy.com/product/crucial-pro-128gb-2x64gb-ddr5-5600mhz-c46-udimm-desktop-memory-black/JX8PSKCS2V/sku/6637048
SSD: WD Black SN770 2TB — $264.99 — https://www.microcenter.com/product/682892/wd-black-sn770-2tb-112l-tlc-nand-flash-pcie-gen-4-x4-nvme-m2-internal-ssd
PSU: Corsair RM850e ATX 3.1 850W — $124.99 — https://www.microcenter.com/product/689529/corsair-rme-series-rm850e-850-watt-cybenetics-gold-atx-fully-modular-power-supply-atx-31-compatible
Case: Corsair 3000D Airflow — $94.99 — https://www.corsair.com/us/en/p/pc-cases/cc-9011251-ww/3000d-tempered-glass-mid-tower-black-cc-9011251-ww
Air cooler: Thermalright Peerless Assassin 120 SE — $39.99 — https://www.microcenter.com/product/704460/thermalright-peerless-assassin-120-se-cpu-air-cooler
Total: $3,541.92
That seems like a good all around PC that also supports open claw. If you want a pure-open claw system, with local LLM support, you can save some more by looking at unified memory systems (mac mini or strix halo are good). But the unified memory systems are not as good for tasks like gaming if you also want to use the system for that.
Intel just released their B70 GPU also - $949 for 32GB. But that'll come with even more software/model compatibility issues that AMD will with ROCm - cheaper if you're willing to fight through it and/or wait for others to build intel specfic versions of the models you want to run.
Oh - there's also 2 newer versions of the peerless assassin - about the same price, or $5 more, slightly improved performance.
I looked up RTX 2070, and it looks like they have 8 GB of VRAM, so 16 GB with your pair. It's not exactly 100%, but as a rule of thumb I equate model size to RAM needed. In your case, that 27b model would need 27 GB of VRAM. 27 > 16 so it won't fit. The actual RAM needed is not exactly 27 GB, but that model is almost certainly too big for you 2x2070s.
So stick to 9b. Is 9b smart enough to analyze logs and run administrative actions on a small virtual network?
Thanks for your attention in this matter, I should have been more considerate and listed the VRAM sizes. Apologies
thanks :D I prefer having good compatibility so not very intrested in the b70 thanks for your feedback though
That's an interesting idea. I have similar aspirations but I have access to some larger models. The thing I am slowly learning is that I am not always a great prompt writer, and the smaller models need very tight, well-structured prompts to produce the best results. I often get the best results when I use one of the online models (e.g., Gemini or Claude) to help me write a better prompt for the little local model(s). Now that I think of it, I should probably have the big boys help me write better agent/soul/memory files for the little guys.
if i buy 128gb of ddr5 will there be any finetuning or anything with the bios ill have to configure? if so what?
Check your mainboard details, there mostly limits how big the ram modules ca be.
New project: Dell R740xd with three Nvidia Quadro P5000s. It's a solution in search of a problem. I have some ideas, but open to others.
Shouldn’t be as long as your motherboard supports it.
Openclaw? Overkill. But if you are trying to run a local LLM then you want be too impressed with 32GB vram.
could you explain what u mean by " local LLM then you want be too impressed with 32GB vram."
32GB VRAM is not a lot, and you won't be able to run models of much size/quality. Your results with OpenClaw will generally be poor.
i mean i'm only looking for 32b quantized
qwen 32b runs pretty well for openclaw with my testing atleast
When you say it runs well, do you mean speed or accuracy and usefulness?
Larger models give better results. With that you need more vram to use the llm effectively. 32GB wont allow you to so much. Of course it depends what you want to do. i guess.
Try quantization. Look up unsloth. No one runs unquantized models locally.
Qwen3.5-27b runs fine in 32g and it’s useful enough
And qwen3.5-35b-a3b as well, quicker than 27b but not as "smart" but depends on your tasks (and prompts!)
The memory prices are just insane! And system memory isn't going to help with local LLMs - I'd get half the memory and put the money into a bigger GPU if possible, or an Nvidia GPU if they get you better performance / compatibility - you can always add memory later.
Thank you. Does quantization affect the quality of the answers generated?
How about using an online or large model to help you structure your prompt for your smaller models?
A small bit, 10% or so at the 4 bit level
Thank you! I plan on using my local free model just to execute routine tasks like log scanning for emergencies.
The more difficult tasks get online models
I only have 16g VRAM. From the beginning my models were going to be limited
May I please have your opinion on abliterated models?
I'm probably not going to run them - but I've also heard some people say abliterated models are slightly smarter than baseline
I do that too. Turns out I’m not great at writing prompts.
It’s the opposite in my understanding
It’s sort of like lobotomizing them in a very particular way. The bits that light up when they refuse to do things because of their safety guardrails are also the bits that light up when they refuse to follow prompt injection attempts
You’re trading off safety basically
re: 16g vram, I had good results with qwen3.5:9b, and qwen3:8b and :14b on a 12 gb 4070, they'd probably work for you. not fast enough for primary agent / chat sessions IMHO but fine for tasks. the key for me is create an md file for some task, have the cron job instruction say read and follow the instructions in the md file, see how it does, give the results and the md to claude code, have CC tweak it, lather, rinse, repeat until it does what I want. I mentioned here or in another chat that I also have CC generated proxy between OC and ollama/llamacpp so I can watch the traffic, see what the model is doing, see where it gets confused or stuck, feed that back into CC, adjust the prompt.
memory is currently expected to grow in price so i'm buying more now because ill need it later fgor other projects
Im looking to get a mini PC. Would like to get a good spec that i can upgrade up to 128gb ram but starting at 32gb ram 1tb nvme. Gemini is recommending me the Minisforum AI X1 Pro (obviously one of the most expensive options). I would like to know if anybody has experience with the X1 or if anyone recommends something else. I do not want apple as i want to run linux. Appreciate the time in advance!
Gmteck evo-x2, get the 128Gb one as this is the SoC 'cheapest out therr
Are you using this one aswell? If so, how are your experiences with it?
I heard from my wife that memory just dropped in price yesterday. Is what I heard just cope?
just the news, nothing really happening
I am planning to get a Mac Mini with M4 cpu and 32GB ram. What is the biggest model that I can use on it with OpenClaw ? Do anyone have experience with a 9b Qwen model on such hardware ?
it might, had to do with some weird deal with openai and another company so idek
I think it has to do with Googles new findings that can reduce to 6x the amount they need to run LLMs without compromising results. https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/
wait what
i thought it was some openai deal..? because they failed to fufill its commitment of 40% of the world's ram
Could very possibly be, I just know of the research google released
also the quantization wouldn't have much to do with ram... right?
The idea is that TurboQuant reduces memory requirements and improves response performance and latency while maintaining accuracy. In practice, it would allow AI models to access more contextual data while using less space and avoiding hallucinations. Source
Together, they could help alleviate the memory bottleneck. Although it wouldn't do much for training data centers, which also require monstrous amounts of memory, it could thin out the RAM needs of inferencing systems. It probably wouldn't do much to solve the current memory crisis, as deployment would take time, and memory orders are already locked in for many months. But perhaps it could help bring the RAM shortage to a close before 2030. Same Source
@lyric orchid i was gonna ask you if you had tried qwen3.5-27b opus destilled v2 via turboquant as it allegedly fits 16gb
Turbo Quant not just for KV, can use it on weights.
︀︀
︀︀I bought an RTX 5060 Ti 16GB around Christmas and had one goal: get a strong model running locally on my card without paying api fees. I have been testing local ai with open claw.
︀︀
︀︀I did not come into this with a quantization background. I only learned about llama, lmstudio and ollama two months ago.
︀︀
︀︀I just wanted something better than the usual Q3-class compromise (see my first post for benchmark). Many times, I like to buy 24gb card but looking at the price, I quickly turned away.
︀︀
︀︀When the TurboQuant paper came out, and when some shows memory can be saved in KV, I started wondering whether the same style of idea could help on weights, not just KV/ cache.
︀︀P/S. I was nearly got the KV done with cuda support but someone beat me on it.
︀︀After many long nights (until 2am) after work, that turned into a llama.cpp fork with a 3.5-bit weight format I’m callin…
I have not tried that one
https://github.com/explaindio/ClawEval I think this is the best source for determining what sort of local models can do.
https://www.amazon.com/NIMO-AMD-Ryzen-Max-395/dp/B0GQ2L4CQL seems to be the cheapest ($2500) I've seen a 128GB Strix Halo (96GB useable for GPU). Others seem to be all $3k now. Never heard of this brand though.
does this work well to run models ?
Strix Halo are generally the "cheap" option to run local models (relative to DGX Spark's or Mac Mini's ($4-$5k+)). AMD Strix Halo doesn't work as fast as the similarly sized competitors, but they're a lot cheaper. So "working well" is a cost vs speed concern here (think 20 tokens per second instead of 30).
yes ! thanks, what do you think about the mac studio M3 ultra chip 96gb ?
can a X1 Pro-370 Mini PC AMD Ryzen AI 9 HX370 handle running open claw okay without issues?
ok but not that much; it depends on what you want to do with it
I really want to explore what is possible without to many limitions. but also would like it compact like a mac mini if it was possible, but mac minis in my area are sold out and even online can't get the specs I want until july it says lol.
my budget is like 3k tops
you can have decent machines with 3k
it always depends on your usage
the thing is that when you start to explore you don't want to be limitated..
There's a signifigant difference between running open claw, but using cloud-hosted models vs running with local models. The former takes almost no hardware, but will come with a monthly bill for API costs. The later take much more local hardware, but then no (or at least less) per month for API costs
For some tasks, only the best frontier models are good (for some tasks even those aren't good enough yet). So its hard to say that even with $10k+ hardware that you can do it all.
good to know yeah im not sure what ill be able to do or where ill have bottle necks
Have you looked at the DGX Spark clones, or do you need this for general purpose computing as well?
If you’re using cloud APIs anyway, why bother with a Jetson?
speaking of electricity costs, I'm wondering if there are any easy ways for me to pull power information from the GPU, I'd like to see if I can build out something that would match up the GPU power consumption with the jobs that I run in openclaw, and try to come up with a "cost" for each job. I really would like to see if it's worth having this local setup to do what I've been doing, vs. just find a 10-20 month plan or pay for tokens, i.e. compare costs. I know programs like HWInfo show me the power details... maybe time to have a conversation with claude code and see what we can build! Maybe I can set up a few solar panels to power the "inference" machine and let the sun pay for the GPU time!
You could pretty easily setup a beszel server (monitor hub) and install the beszel agent on the gpu server. that would give you the system info real time/history. then you just match request timing to gpu power draw. its not going to be 100% "from the wall" pull but should give you a decent idea.
I’m running a 64GB Mac Studio M2 Max. Local llm results are slow and nowhere near as reliable as Anthropic or OpenAI models. But depends what your goal is.
For some basic reasoning tasks it’s not bad.
I'll have to look into the beszel server. I hacked some code into the proxy I've been using between openclaw and llama.cpp to monitor, it uses nvidia-smi --query-gpu=power.draw. I may set something up to push this data so influxdb, then I can do some charts in Grafana!
2026-04-02T17:42:17.350Z [done] job=downloader-summary qwen35-35b-a3b reason=stop prompt=49 (0.1% of 40960 ctx) gen=319 ratio=651.0% pp=492.8tok/s(99ms) tg=96.7tok/s(3.30s) total=3.40s elapsed=3.53s gpu=330.3W(+315.3W) peak=343.4W 0.3028Wh(+0.289Wh) $0.000057(+$0.000055) (13samples) session: prompt=30633 gen=5772 elapsed=83.59s energy=5.0181Wh cost=$0.000951
local is overrated and dangerous$
you don't want a stupid agent
if you're running anthropic/google/openAI models, you'll probably be better protected against malicious stuff
like your agent reading something stupid on this discord
one of my original reasons for exploring local was to prevent sending sensitive information, credentials, api tokens, etc. to cloud providers. one of my first skills I built was to scan the session logs for leaked credentials... early versions of openclaw was constantly leaking creds (trying to get somethign work, it would pull in config files or .env files). if it does that locally, no big deal. but yeah, in general local for me is good for very specific tasks that I don't necessarily have to wait for (cron jobs), with limited access to data. I don't give agents any "go out on the web and find this information", most of my stuff is using skills that reference specific APIs for the data, and the agent can then do some consolidation, answer questions about it, etc. but for actual "talk to the agent" things, in openclaw I'm using a cloud model (minimax) and staying within it's limits.
This weekend I’m excited to get my shiny new Mac Studio M3 Ultra 512gb running as my OpenClaw secondary LLM for bulk text processing and basic tool use. If it goes really well, I might get it to do some code generation stuff too. Qwen 3.5 is my starter model, but I’ll be exploring others too. (Using paid cloud LLM api as primary.)
I managed to put an order in a few days before Apple discontinued selling them.
Check out vllm-mlx, and Soon(tm) I’ll be publishing a fork I made of it basically targeted at using this hardware to best run 397b
Tell me more about vllm-mlx... advantages over others? I'm doing my experimention with LMStudio, but happy to switch to something else.
Paged prefix cache, metal native
Thanks! I'll check it out immediately 🙂
Question:
I'm looking at acquiring a mac to run open claw on, it's either between the $600 Mac Mini or the $2000 Mac Studio. I'd like to run a local model if possible.
Is it worth the price though? Am I going to blow through over $1,400 in Claude Sonnet 4.6 tokens in a year's time?
@craggy ferry so for the 512gb max your recommending 397B for basically everything?
Conflicted with some of the new releases like genma4 even though they’re much smaller
Gemma4 does look better, haven’t played with it much. Wish they’d released a 70b
My experience. I bought a Mac Mini; it cost me 2800 euros here in EU. 64G with with 1Tb. Big waste of money in my opinion. Can run locally, but slow replies. It just plain doesn't do the job like Sonnet 4.6. Do yourself a favour and use the money on the tokens and try to get some revenue coming in to feed the AI, in my opinion. Build your way to free tokens
claude
Check out exo too. Can cluster Mac’s with tb5 to expand the memory pool and with two of those you could actually run Kimi 2.5 locally at a reasonable rate. Wish I could get a 512 studio myself. 🙁
I only have the one beast machine. I also have an old M1 Ultra Studio with 64gb that I use for smaller models, and they are connected with a tb5 cable, but I'm pretty sure the M1 doesn't have tb5 capability.
it doesn't, only the M4 Max and M3 Ultra Studios have it. And you need TB5 for the RDMA support apple released in 26.2. BUT if you got an second machine running tb5 you can create a mismatched cluster even.
I was looking at the 256GB ultras and it's currently a 4-5 month lead time 😂
(there are minis and laptops that have TB5 too fwiw)
Yeah, I might consider a set of 256gb ultra's, but I'm kinda hoping that they release some new ones during WWDC... and that they offer a new 1tb ram option. One can dream. I know those would be crazy expensive, but I'd consider it seriously.
Sadly it looks like ram constraints will continue through at least 27
i only bought one m3 512 because i was hedging on them having an m5 512 or 1t this year
Me too. I’m slightly regretting just buying 1. And although I know the ram constraints are going to be around for a while, I’m hoping Apple got their supply contracts settled before it was a big issue.
lol, what is this? 😂
So, fwiw, I'm using Rapid-MLX on my 512gb mac studio with a minimax-m2.5-8bit (243gb downlaod) model for text, thinking and tool use, and quen3.5-vl-112b-a10b-8bit (131gb download) for vision... loaded at the same time 🤯 I am loving this Mac Studio setup!!!
My OpenClaw setup is hot🔥 with this equipment.
That is exciting. I am waiting for a new Mac Studio refresh to appear, and then I will probably pull the trigger. Making due with my DGX Spark cluster in the mean time. 😂
I am also waiting for the refresh. I’m betting a LOT of people are!
Ordering a souped-up MacMini M4 now vs waiting: Do we think that if we order a max spec Minin M4 Pro now (5 month wait time), that Apple will just refund if/when an M5 mini is announced, or will they offer to switch to the new chipset (assuming some small price difference)? Has anyone done had this kind of experience with Apple before?
just started playing with oc.
with the sizes on those models y'all are talking about, what can those things do and are the mac studios fast enough?
If a new model is released before your order is fulfilled, I suspect they would either cancel your order or swap to the new model assuming specs and price are comparable.
The local models are definitely more challenging. They are more prone to mistakes in general, but seem especially adept at breaking their own OpenClaw config. I have taken steps to assure mine can no longer touch the config files. I get better results with a large cloud model (e.g., Kimi 2.5), hosted by somebody like Ollama or Synthetic, as the orchestrator. It then manages the local agents/models and tries to double check their work. Nothing is as good as the frontier models like Opus 4.6, but $$$. As for the Mac Studios, they appear to be excellent performers, especially with a ton of RAM which allows running larger models.
small price difference? are you new to apple?
Are you? Apple most often keeps the same price for a refresh
Hey all.
To get the full potential of OpenClaw, like "computer use", browse, UI, etc, should I have a MacMini or is a "Linux PC box" (Beelink, etc) as easy ?
If MacMini makes it much more easy to setup and use, that won't be a problem, we're talking $200 difference, and I have that budget
I like some of the specific macOS integrations, but Linux could work fine. Are you more comfortable with one or the other?
Im a macos user since way back, Linux only in VPS.
What are the macos specific integrations that ate not bc of iCloud?
From memory I’m just using a couple: iMessages and 1Password. There are some others that I’m forgetting.
Thanks!
I did end up having my proxy throw data into influxdb, including GPU power information, and had claude code build out a dashboard for grafana to see the data. The proxy is becoming a bit of a monster, but I find it useful for debugging model response to instructions, fine tuning, etc. Also learning more about how kv_cache works and others discussing "warm" and "cold" cache, so added some stats related to that to see how well my cache is being used across requests.
https://github.com/khaney64/llm-stuff?tab=readme-ov-file#overview
https://github.com/khaney64/llm-stuff/blob/main/README.md#gpu--energy
https://github.com/khaney64/llm-stuff/blob/main/README.md#kv-cache--recent-requests
Hey guys I am looking to build a openclaw server is a 5060 ti with 16gb of ram a good starter card?
Getting a Mac mini is hard right now. June should be when the m5 models will be released. The GPU cores had NPU with them on a m5.
Yeah, I got mine at -$100 of what it's now on Amazon, in ~Jan...
This one is to assemble a solution to a friend, who's not Mac user at all, but the Screen sharing utility makes the whole difference, to control the computer
You can run small models on a Mac mini or on your 5060, but they will likely be inadequate for a meaningful OpenClaw deployment. I use some small models, but as part of a mix of models that include larger cloud models.
1Password is usable on Linux though, it’s a command line utility
All OSes have a screen sharing service, it’s not unique to macs
Oh interesting!
Put another way: is there any benefit for a non mac user, to have a mac mini?
I mean, they are amazingly price efficient at the base model
But that’s only if you will actually use the specs
Probably gemma4 capable, without gpu?
Its a ongoing process.
I learn, he tries. Close feedback loop
All Macs have a GPU, their “integrated GPU” is far better than anything called that on the pc side
You could probably shoehorn gemma4 into one, the small ones obviously (e4b) but the large ones even at 4-bit quant would want a higher memory config than the base model
lol a 32gb mini has a 4 month lead time
Yeah everything that isn’t the 16gb base model is super delayed. Makes sense I guess
Yes, you become a Mac user. 😇
How did you set it up? Any link/tutorial that you followed?
just get ollama
write on google ollama + gemma mac mini
its 10 minutes job including downloading 6gb of model from internet
thanks @fathom steeple
happy ro help
hey, is anyone looking into upcycled phones with custom forked Lineage/CalyxOS?
Hi guy I'm lookig for reviews I'm just started with little models as Qwen3.5-4B for my Agent. But I'm lookig for recomendations is this a good start if I want to use my agent to aske code solutiones? I have limited resources i have rtx 5070ti, 32gb ram ddr4 and ryzen 7 5800x. My dubt is is this software necessary to run which models or what is my limit model I can run?
Google just came out with a new open source model not hard to run, with your hardware I think you should be fine 👀
https://ollama.com/library/gemma4
Just try out different models and see if they run, if they don't you can always move to a lighter model
I found gemma4 does need to be fed a detailed prompt, this video helped me a lot: https://www.youtube.com/watch?v=pwWBcsxEoLk
With a rx7900xtx, I should run like a 31b model with less context or smaller model with bigger context. Always seems like the onlything blocking me from local is the context thing
You should be able to run models a little larger than 4B, but I like qwen3.5 in general.
Do you think is good idea run it in vllm? Or I should use ollama ?
I use both. Ollama is pretty simple and a good first try. vLLM probably offers better performance, but can be a little more involved to set up.
I'd start with ollama as it's a little easier to work with, and see what kind of performance you get, and assume you may get better performance with vllm. For my experience, substitute llama.cpp for vlllm. My performance was almost double with llama.cpp over ollama, but more of a learning curve.
I pulled the trigger on a maxed out MacBook Pro. I hope I won’t regret it! Anybody running all local with this or similar machine?
anyone suggest some hardcore tests for my M4 mac mini and M2 mac mini, both 24GB? M2 is currently running two bots without an issue
Do you guys run a server with specific hardware in a datacenter to run local LLMs? I'm trying to find a solution on how to provide an "always-on" AI assistant. I'm currently running a cheap second-hand dedicated server as mail- and fileserver with not enough power for AI. Because it is in a data center, it has a good up- and downlink. Buying a new computer for AI apps/assitant at home (RAM prices 🥲 ) and move the datacenter-server to this one and make it publicly accessible on the internet through DynDNS might be to unstable and slow for file transfers. My goal is to transform the fileserver into an EDMS with document Q&A, summarization, auto categorization. Just upload a document and let the LLM handle in what cabinet the document needs to be stored.
what bit of gemma 4 should I run with 4070 and 48gb ram
you can run the q8 or full precision easily i guess . depends how many parameter model of gemma4
Why am I finding M3 512gb studios on eBay for 2k? Are those scams you think?
I hope you had done the research on this. Assuming you have tkaen M5 max chip with 128 GB Ram - you will be able to run gpt-oss-120b q4 (or 70B and full precision) - but they will never be near the 5.4 or similar frontier models. Hence, for now those who are yet to make the trigger, do a economical comparison for next 2 years (where more hardware would be available cheaper) whether it makes sense to splurge of say $5K or use $100 per month for 1 year (for frontier models) and see whats available next year.
of course, if you need the M5 Mac for other activities, yeah this analysis in invalid
is it worth to get 2x rtx 3090 or ryzen 395 128gb mini pc?
I actually find a lot of them for 1k
Some of the photos are definitely AI
Hi all. I’m thinking of getting a Mac Mini M4 chip and wondering what others would recommend in terms of memory and storage. I want to run some local models on it, too.
How? Are people just offloading waiting for M5?
its almost certainly a scam
$100/month will last for 4 years, till that time hardware will be heavily depreciated, not worth to invest
Yep exactly, and in coming 1 ot 2 years. There would ve more hardware choice due to competition and if current ram trends are any indications, it would be relatively cheaper
any one here working on humanoid and robotics ?
I got a question
You got me excited, but all the cheap prices are from folks who just created their account, and most are in other countries (I’m US). I’m leaning toward scams.
anyone got some advice on dipping a toe into Abliterated/uncensored in a local context? Just tried supergemma4 26b uncensored fast, and, 171 t/s. I need bigger and more reasoning, Tring to push architecture and brain to local as much as I can. But maybe just have to escallate to cloud for that. Good worker be, but for local rag + internet rag + images, I have more hardware headroom to burn
gemma4 31b is significantly better
agreed, done with everything below 30b
this one still a cannidate as a worker bee tho
ya, but I talk in screenshots, both of web, task manager (sorry win guy here) and, life
I am actually after slower (and more reasoning)
i think both are useful
but yes you definitely need at least one high end model like that
Need to find my main first, then the worker bee(s)
gemma-4-31b-it-uncensored-heretic
57 t/s
shit, I can go bigger
I have noticed a correlation with uncensored and q4, which makes sense but, I dont actually need that
i am really sad there's no gemma4-70b, yeah
70b is such a sweetspot
but 1yr old lamma, ya no
havent touched the quens yet, nemotron 120b NVFP4 thats prob where I end up
model size, context bandwidth, speed, holy crow
so I am literally here on local llm cause perplexity dropped Kimi K2.5
Gemma 4 31b, straight from nvidia, ya, thats my baseline for sure
that's my fav thing about local llm
you have the thing you have until you decide to move
still want to test gemma 4 31b uncensored tho
gemma-4-31b-it-uncensored-heretic is rocking my vision + uncensored test
this is the one by llmfan46
models will be compressed more, if our system cant run 70b model, in next 6months chances are more it will run
and with turboquant, its getting more interesting
I think it will be more like multi-LLM operation. In the sense, there will be generic model which can be cloud, but a smaller specialized model(s) for each area. For example, lets say you are coding only in Java, then only that specialized models would need to be run locally. Same for the enterprises as for example a frieght logistics will only have spefici smallmodel and a cloud generic model. The idea of context separation, identity isolation, etc will need to be handled and thats where AI industry is going
I saw some with 100+ reviews
Were they Buy Now prices or auctions?
One I’m looking at is buy now
I believe you, but I'm not finding them. When I filter by "Buy Now" and US only, the lowest I find is $4,400 and they have 0 reviews. The remaining results reach into the $20k range (which is insane).
Let me know if you snag one; I'll live vicariously through you. 🙂
lol I’m too scared
Send me the link then? I may be too scared too, but I am curious.
Here’s one https://ebay.us/m/B7T38o
That seller has 30+ items for sale at great prices that would be beneficial for an AI user......but zero sales......
Yeah, good luck with that.
Hi, I am new to this discord. I am trying to get openclaw working on my 16 GB Thinkpad T14s running AMD Ryzen 5650, running Ubuntu 24 LTS. I want to use LMStudio running IBM Granite 3.2_8B as my main AI with Anthropic as heavy lifter. But even though LM Studio works fine in chat mode on its own, any prompt, even a simple "Hello" becomes huge (~20,000 tokens) when coming from openclaw. Naturally this bogs down the system, and I have never received a response from "Hello" when in OpenClaw TUI. I am a complete newbie to OpenClaw and AI in general so I wonder if anyone can help me. I have spent hours with CoPilot working on this and it has not increased my respect for AI very much - what a waste of time! I think maybe a human expert might be a lot more helpful.
I’ve seen that one, but I don’t see a buy now option. It only shows me options to watch or contact seller. 🤷🏻♂️
Little bit more but still…..
hi, can you try out: agents.defaults.localModelMode: "lean"
should be documented in docs/gateway/local-models.md
Hey guys. I am wondering what setup would be good for creating a local Ai server? Is this reasonable for 6k? Any advice helps
these are scams. Why would anyone do this? There is no logical sense other than it's a scam
Yeah that’s what I’m thinking
it 100% is a scam, zero doubts
does anyone try radxa 5t
reminds me of the time I got a nonverbial code completion AI model... all it would do when I tried to talk to it was scream "NO"
So I have been trying to get openclaw to work locally on my old M1 MacBook Pro 16gb ram. The idea was to have an ai personal assistant to perform relatively simple tasks. I started setting up openclaws workflows and tests with my OpenAI plus subscription which uses codex 5.4 and it has been working great. Once the tasks and workflows were tested, I tried changing my main LM to a local using ollama and Qwen3 4b or llama 3.2 3b to handle cron jobs and general tasks.
Every time I have tried this, clawbot dies and stops responding.
I have checked ram consumption, total approaches 15gb but doesn’t overflow or reaches HD swapping
I have checked openclaw health, and it’s running fine
I have checked ollama directly in the app or terminal, and it runs and replies fine
The tasks: as simple as read my email or check information on a website
What am I missing? Is my MacBook Pro not powerful enough to run openclaw with a local lm locally?
Are you seeing anything useful in openclaw or ollama logs at this time? Errors? Looping (ie maybe model is stuck or timing out). Use a tool to monitor GPU usage (not sure what that is on Mac) to see if it's busy. What context size are you using for those models? I wouldn't go straight to primary with a small model, create a cron job with some simple instructions and point to a small model, and get that working first so you know the model is working. I've found it useful to put a proxy between openclaw and ollama so you can see the traffic/interaction/errors.
Mac has its “activity monitor” with spikes from 4gb memory usage to 15gb when the prompt is sent, and ollama uses that ram. Besides that, logs don’t show any error besides timeout after a while
I initially had openclaw running on a 8gb M1 Mac mini, but moved it to ubuntu. I was always hitting a GPU on another machine though. I think there is a default 60 second timeout in openclaw, you may have to bump that up. out of curiousity, when running a prompt of some sort, and you see the ram spike, does that extend (run) longer than when openclaw times out? i.e. model may still be processing? you should give qwen3.5-9b a try - that works well for me on an RTX 4070 (12 gb). as I said, you may want to put a proxy between openclaw and ollama so you can see the conversation. you wouldn't believe how much stuff openclaw adds to the prompt. one thing that can help with that - go into the dashboard, select agents, pick your agent, and go into skills, and disable ALL of the skills you don't want or need - they used to be enabled by default - any enabled skill ends up having data sent in the prompt to describe it. I built out this proxy with claude code, it's become a bit of a monster, but works well for me for debugging. I originally wrote it to work with ollama, but moved on to llama.cpp, but it should still work with ollama. https://github.com/khaney64/llm-stuff/blob/main/proxy.js
if you try LM studio on developer mode, the developer tab has a log on the bottom part
regarding hardware, is it anyone using the nvidia dgx spark or its counterparts for other OEM for running openclaw in local mode? or is it an overkill?
2 sparks arrived today . easy setup with qwen 3.5 397 using codex (it did everything using eugr's vllm docker setup.
Only really needed if you NEED to run local due to the data one is using. otherwise 9k for two machines that will go obselete makes no sense in comparison to even the highest tier OpenAI subscription
and its a breautiful machine
All fair points. My current Intel MacBook Pro is from 2017 and cost nearly $3k back then. It was time for an upgrade due to no more updates available. I could have gotten away with the 64gb RAM version for work but, the $800 upgrade to double my RAM just made sense financially. I’m hoping to get near or on par Haiku level operations that will give me complete privacy. For me, if it runs OSS 120b at 60 tok/sec, that’s a big win! I’m tired of being throttled and rate limited. I use Chat and Claud a TON. Not to say I would expect that level of LLM locally - It will get there though!! 14 day return policy’s are always useful. Anyway, appreciate any user feedback of this specific equipment.
And yes, M5 max with 40-core gpu. Max bandwidth in this puppy. 128gb.
well, you have bought latest and greatest - njoy! as I stated earlier, if you have just bought from local LLM, then it is definitely not worth IMHO. You are better off using AI+ max or GX10 hardware but if you have other editing works that you do - then you know the best.
for modesl comparison of how gpt-oss compares - see here https://artificialanalysis.ai/
what are you running on this? I've been running gemma 26 b
What token speed do you get?
I'm not at 128GB, but I'd have to check. I've been trying different paramters and haven't yet looked at logs for number of tokens
Honestly pretty new to LMstudio, but did the math and my api usage was such that it's cheaper to run a local model
so gemma-4-26b-a4b q8_0 with MAcbook M5 Pro Max 48GB RAM
180,000 context window
GPU offload 30
CPU thread pool size 4
prompt eval time 458 tokens per second
eval time 68.45 tokens per second
I was running it with the heavier models last night on some cron jobs and it crashed 4 times, so I'm still figuring out safe parameters
I don't know if that's a good or bad token speed
you are running at q8 - with your hardware I think you can even do FP16 (half precision). So here it is not about speed - it is about quality.
I'm using it to do web scraping on publicly available data so I need a beefier model
tried the gemma-4-e4b and wasn't really happy with the results
yeah but Q8?? with with the same model - go for FP16 and you should stil get decent token and improved quality. I think it's this one - https://huggingface.co/mlx-community/gemma-4-26b-a4b-it-bf16
brb
I think that model is too big. LMStudio is saying so anyway
Hugging face seems to agree
Hmmm. With your 48gb, it should run
Try out and see.
Or use q4
I'm going to try q-4 tonight. Going to bed after this because I can't keep staying up this late but I'll report results in the morning. Thanks for the help
what are you use it for
i am not sure about mac coz i dont know how their unified memory works but for a 26b model , you can easily run it at q8 with enough buffer for context window. you dont need to offload , you should be able to fully load the 26b model but dont try to load the fp16 model of it ( 1billion paramter fpt16 = 2gb vram)
I tried to run Ollama on K8 Plus 32 GB with terrible results and returned it. In the meantime my OpenRouter bill is shocking. What models are y'all running locally with decent tool calling?
it depends which model you tried to run exactly
you know OpenRouter has :free models, right? some of them (most?) super popular with OpenClaw (according to OpenRouter charts)
With limited hardware you need to try Q8 and Q4 models.
I tried Gemma4:26b lately - it did quite well standalone with a small number of tools - but in the context of OpenClaw it just needed too much VRAM.
I had tried a few but I think I started with Qwen 2.5 coder Q4 KM
I haven’t received it yet. Due in next week. I’m still throwing ideas around. Likley try Qwen3.5-122B-A10B and Qwen3.6-35B-A3B. I’m thinking I’ll need 2-3 different models with dedicated use case. How’s the gemma running?
why use qwen 2.5 when there is qwen 3.5?
What models do you suggest for mac mini m4 24gb
Hi there, anyone using a remote ollama with an rtx5080? I use qwen3.5:9b now 130contextlength eslewhise its starting to cpu offload. Cant seem to get a bigger model to run in quantization. When i do its offloading like 30/80. and then i get timeouts…
Super Noob Question: "Openclaw and local LLM. What's the absolute minimum Hardware requirement?"
Hi everyone,
Openclaw is quite cool and I want to "play a bit" with it. I've got it running, but I hit my session limits quite fast. So I am wondering if there is another way.
I use Claude Code (Pro) and Ollama (Pro).
I use Claude for a bit PHP / Website tinkering and Ollama for Openclaw.
I got "naked" Ollama running and even got some LLM downloaded.
Ok, low Token count, but it works.
I understand the hardware requirements for Openclaw, but the LLM is still a bit of a miracle for me.
So my questions are the following:
ONLINE
- What model should I use with minimum cost?
- What model would you recommend?
OFFLINE
I can chat with Ollama, but Openclaw is not responding ...
What is the "absolute minimum Hardware requirement" to run Openclaw / Ollama offline?
I don't need absolute performance, it should just work.
Thank you for your help.
Bernd
PS: If you have usage credits left or even run your own LLM server i could use, please speak to me. 🙂
if you have claude pro then why not use sonnet with openclaw?
i would say use glm 5.1 since you have ollama pro too , i use glm 5.1 from the glm coding plan
for the offline part , i will say you will need 16gb vram to load a decent model of atleast 28 - 30b paramters at q4 on your pc without offloading , if you want to offload then you can go beyong 30b but it will be more time consuming for each query , for local you can try qwen 3.5 / glm 4.7 flash / gemma 4 etc
I have a 3090 24GB vram and I run Gemma4:27b. When I use openwebui I don't see it offload and it's very responsive but if I use openclaw I do see CPU going nuts.
Probably need 24 gig for the 28-30b models?
What context size, and what are you running the model on? I've had success with qwen35 35b a4b on 3090. Need to do a little more experimenting with gemma4 26b
I installed openclaw today so I haven't tweaked it yet. But I find things very very slow.
The model runs on ollama in docker on my server
I started with ollama but moved to llamacpp for much better performance and ability to fine tune. I love llama-bench for figuring out all the parameters to use
I just switched to qwen3.5:9b and it's faster (ofc) but still not very useful. Plus I'm concerned the 9b will be too stupid in the end.
I'll checkout llamacpp
I guess you need to decrease "contextWindow": 262144,
This blows up your memory
Then it starts to offload
What's a more realistic context window?
But I set it in ai-agents config right?
it depends on the quantization , for adjusting a 28/30b in 24gb vram then you need to go below q4 and anything below q4 is just useless
Here are my llamacpp settings for the bigger gemma and qwen35 models in 24 gb
https://github.com/khaney64/llm-stuff/blob/main/model-test-report-2026-04-11.md#llama-server-launch-commands
I set it in the models part in the .json
Well that limits what openclaw will use, but the model side has a setting too - ollama might be smaller by default
Thanks. I'll dig more into this tomorrow. Right now the time between me sending message and ollama workers going to work is what's taking the most time. Maybe there's something there that I am not fully grasping. I dont' see why it would spin up workers on all my cores.
So that was just me looking at htop the wrong way. Those were all threads 😄
I want to buy a mini-pc for local llms with openclaw and my files. I put my eyes on this one:
https://www.bee-link.com/products/beelink-ser10-max-amd-pro-ryzen-ai-9-hx-470
what could you tell me about it?
Can anyone suggest a model that can run on a VPS with 96GB RAM and 18vCPU with no GPU. I've tried qwen3.6, Gemma4 and qwen3.5 but no joy.
I think the issue is that those models are better suited to GPU inference. 96 GB RAM sounds great, but RAM alone won’t save you if the CPU is the bottleneck. What model size and quant did you try?
I tried qwen3.6, gemma4, phi4, qwen2.5, llama3,2 and gemma3 but I couldn't get a response after they were running. I have since realised the VPS isn't dedicated so it's all shared. So I'm guessing no model will run efficiently.
okay
good choice
they literally promote openclaw in the desc lmao
CPU with RAM will not do - you need GPU or AI nodes
check modal , rundpod or hugginface , they provide gpu compute at a reasonable price
I was also looking at the same one, it seems good
Now I'm checking the difference with the apple chips, because it seems that the bottleneck is the memory bandwith (RAM, if the model is loaded in it). With dual channel the theoretical speed is around 89.6 GB/s
Device | Bandwidth | TTFT | Speed | Feel
------------------|------------|-----------|-----------|-----------
M1 Ultra | 800 GB/s | ~1.1s | 45-70 t/s | Great
M4 Max | 546 GB/s | ~1.2s | 40-60 t/s | Great
M1/M2 Max | 400 GB/s | ~1.5s | 35-55 t/s | Good
M4 Pro | 273 GB/s | ~2.2s | 25-40 t/s | Okay
M4 (Base) | 120 GB/s | ~3.0s | 10-18 t/s | Tight
Beelink SER10 | ~90 GB/s | ~3.0s | 20-35 t/s | Slow
this is a comparison made with gemini, could someone confirm this token generation could feel slow?
hello, can i use openclaw in my android?
In theory you can do both:
- install openclaw gateway on an android via CLI (I have not tried this)
- install the android app to connect to the gateway (this is standard, the app is in the play store)
okay, thx for your information.. i'll try it.
I probably should have started here... 😄 - Anybody have any hands on with the dell pro max w/gb10?
i need some advice. i currently have 3070ti 8gb. im thinking about upgrading to amd r9700 32gb. i do some light kilocode and recenlty been toying with openclaw. should i upgrade to r9700 or just use gemini api?
Gemini made some great comparison tables for me as I designed my new machine for work. I would look carefully outside of the nvidia ecosystem. Ask it to do a deep dive analysis for you. Just rmember that no matter what GPU you get, your data still has to move across your PCIe bus.
Server setup like:
Motherboard: ASRock Rack GENOA2D24G-2L+
CPU: 2x EPYC 9535P
You could get 1.2 TB/s (614 GB/s each CPU)
And motherboard can go as high as 12TB DDR5 RAM
considering major self hosted models are sometimes 1.5TB in size, 12TB potential capacity is a great way to be prepared to run HUGE models in the future
or perhaps someone has a use case where they want to load more than one model on their server
Hey, I'm contemplating buying a Mac Mini, could it possibly run models from Ollama like (Qwen 3.6, Kimi 2.6 or MiniMax 2.7)
Hi All, i'm receiving my mac M3 ultra tomorrow. what model do you recommend to "start/run a company" with voice AI calling inbound and task automation with visual compute
depends which parameter model you are talking about and which quantization
I'm assuming you wanna stay local:
- the qwen models are solid open source options for VL -- the bigger the better (for visual compute)
- ironically the qwen models are also solid for your core LLM if you want to stay local but there's a plethora of options (I specialize in coding so unsure if there's a better suited one for your needs)
- for voice TTS/STT you can search online but there are literally a dozen or so options and all of them are fairly solid and dont sound like a blatant robot
Curious which qwen model(s) do you use for coding, and via what tools? I've got openclaw usage with 3.5 35b-a3b working well, want to explore local options for coding. Do you use different settings like temperature for coding vs openclaw?
ironically the new qwen models are pretty solid there too! 3.5 and 3.6 -- the bigger OSS models are still better but most people can't fit those
Hi guys, im just getting into all this ai stuff and i want to run claude code or openclaw locally, i have a ryzen 7 7800X3D cpu, 32gb ram and a rx 6700xt. what model would yall recommend me? i want to get the most out of it running locally for it to be able to code as well as possible, and possibly run fully autonome tasks on my 2nd burner pc for security reasons, while still using the main pc computation power
ive tried gemma 4 27b and it just halcinated and wasnt really able to do any real coding
try the qwen models , even tho cloud hosted oss model will be better than running oss models on your own gpu coz of infrastructure and configurations
wich qwen should i tryN
?
try with qwen 3.5 models . pick any model of q4 or more quantization
make sure the parameters isnt huge or else it wont load on your gpu
yeah its good too but its quite old as well
like right now glm 5.1 is the latest
and flash version of any model is kind of nerfed
so.... ishould try the qwen 3.5?
wow, bee-link makes Clawd-colored PCs now
https://www.bee-link.com/pages/openclaw
damn thats insane
Hello. I want to create local instance of openclaw and ollama gpt-oss. What are the recommended pc specs for the start up? I appreciate the response, thank you
dont consider gpt oss
Then what is your recommendation?
You can do a local instance on basically anything
But the only local models you can run are small models because the bigger models just run too slow to do much of anything
Unless you spend A LOT. But that amount does not make sense imo compared to just using your ai subs
use some latest and top performing oss models . like within 30b parameters and no less than q4 quant
Hey guys! Can you recommend some cheap Android phones for the OpenClaw? Do I need root?
Have you explored everything you want to do it in pc? Android is just another remote node that you can even run as a docker contiane rin your pc
similar question to @dawn cosmos
@sturdy gazelle , what are you looking to do? looking for an existing OpenClaw instance to control/access android or looking for it to live (gateway) on the android?
It's possible to run OpenClaw on a Android?!
I want to follow the openclaw guide and keep two-phone setup described here https://docs.openclaw.ai/start/openclaw and at some point have openclaw installed on this device
in theory...lol -- I have not tried it!
wow... didnt know that!
ahh okay, so for communication channels (signal, whatsapp, etc) -- yea that's easy -- no you definitely DO NOT need root for that. You dont even really need antoher phone technically (I used a Google Voice account to setup my Signal connection)
The 24.4.26 version is giving problems, right? I'm new to this and installed that version but it doesnt work, even if I follow the docs.openclaw.ai page...
I was gonna tell you ask Krill! ...but it seems like he' nappin on the job again. ping me in the users helping users channel we can continue there
Run openclaw in a docker container (as gateway). Android installation will avt as a. Node to connect to a gateway. For now, I suggest do not use android app files unless you have compiled them.
Is there a MLX specific channel where we can post a question on ?
Stop scamming people. And if you have really successful, why not showcase it here for everybody to benefit. Typical scammers!
Im 12 years old and I recently found out about openclaw from my father who works at Microsoft, he told me that I can easy make $5000 every day just by using an openclaw bot to trade. So I did. I was amazed that after 5 seconds (and with nearly $20000 of my dads money I wont disclose to you) I found that I was making $5000 every single day.
Every. Single. Day.
Now I am looking to help young entrepreneurs like myself get into openclaw agents. If you’re young and want to learn the mindset and simple steps that helped me start making this happen, msg me “LEARN MORE” and I’ll show you what helped me get here.
LMAO
I am 8 and my father just found out I was looking at openclaw, now I make 20000 a day and my mother doesnt know. text me to find out how my brother made his 1st billion with openclaw that inspired me to look at it. my sister also uses openclaw and we are all making tons of money every day. my openclaw just told me I made another 500 in the last few minutes while i was typing this important announcement.
there should be age limit for openclaw or internet access full stop
I'd prefer a minimum IQ
why so many scam baits here suddenly
oh spare me codex, give us one more free reset lol
Telegram doesnt directly work if you havae ssh tunneling on? Its telling me it has to be taillscale or remote URL
There was a claude code offer which gave $500 creds valid for 6 months! Expired now. Keep a watch out for more. There is one more of xiaomi mimo is running https://100t.xiaomimimo.com/
100 trillion token is insane
Around the 4.22 update my gateway started using like 50% CPU at idle. Has anyone else experienced this? Any solution?
Looking to setup with a dgx spark, Mac Studio and a NAS. Any feedback on this type of setup? Was going to have openclaw on the Mac Studio as orchestrator, dgx for inference, and all data hosted on NAS. I’m trying to create an enterprise setup for a company.
They just want to grab the market...they would have got some free subsidies from govt. Data is precious!
yeah sadly . humans are less valuable than their own data nowadays
Hey folks, currently added my 3rd V100 32gbGPU now, total of 96VRAM, whats your opinion i should run now, previously using unsloth/Qwen3.6-35B-A3B-GGUF at Q8
Two weeks ago before i started this Ai journey i got myself a new RTX5080 16Gb, what a waste of money 😂 , now i want an RTX6000 96GB, 16gb is nothing.
Resale on RTX5080 should still be good
Yeah, mostly just using it for simple stuff and kicking off cron jobs, heartbeats etc. Would never use it for any coding tasks and such, dont see the point. Thats where Claude Code and Open Code etc shine for me.
Yep you can technically run openclaw in rpi. And use other nodes to run the tasks
which model you use from opencode mostly for coding? k2.6?
yes recently pleased with k2.6
how good are the deepseek v4 models compared to kimi?
I've not tried any of those.
After doing some research I've preselected these 3 devices:
- ASUS Ascent GX10 128 GB LPDDR5X (3700€)
- Corsair AI Workstation 300 Ryzen AI Max+ 395 128 GB LPDDR5X (2800€)
- Apple mac mini M4 pro 64gb (2400€)
I want it to be a 24/7 node of openclaw, to work as an assistant and also do some research on AI. Is it worth going for the Asus with the nvidia gb10 or is it better to pay for suscriptions of cloud?
do you have experience in using the local models? before spending huge amount, rent a GPU VPS, install and run local models and see if it fits your pupose, then plonk down your cash to buy one of 2 above (m4 would be a waste in that config) - I would side with GX10
Thanks for the advice!! I added de mac mini to the comparison because all people is using them to run openclaw
Yeah, those mac mini are almost a scam by these influencers. You do need mac mini to run openclaw with cloud llm and they are not enough to run local llm. So unless you have any other usage for mac and its part of your ecosystem, it might make sense. But just for openclaw, it is not
╭─────────────────────────────────────────────────────────╮
│ Ollama Model Benchmarker │
│ Reasoning | Coding | Knowledge | Instruction | Creative │
╰─────────────────────────────────────────────────────────╯
Found 25 models to benchmark:
- qwen3:30b-a3b
- qwen3.5:27b
- mistral-small3.1:latest
- qwen2.5-coder:32b-instruct-q4_K_M
- gemma3:27b
- deepseek-r1:32b
- qwen2.5-coder:7b
- dolphin-mixtral:8x7b
- codellama:13b
- llava:13b
- mistral-nemo:12b
- mistral:7b
- phi4-mini:latest
- qwen3.5:4b
- qwen2.5-coder:14b
- deepseek-r1:14b
- qwen2.5:7b
- qwen2.5:3b
- llama3.2:3b
- llama3.2-vision:11b
- qwen2.5:14b
- gemma4:e4b
- qwen3-vl:8b
- deepseek-r1:8b
- qwen3.5:9b
Estimated time: 50-125 minutes. Please wait...
⠼ Testing: qwen3.5:27b ━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4%
⠼ -> Knowledge ━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━ 40%
^^Lets see how they Rank, runnin on RTX 5080 16Gb
Fun! Have you seen ClawEval?
No, that sounds fun! lol
Pretty cool site!
Yeah it has tons of good benchmarks to really differentiate models
That’s the canonical one
I run G and H on every new model I start running locally to get good comparisons. I should also look at the subjective writing sections (but usually don’t lol)
I'm considering buying an RTX 5090 because I realized my 64gb of system ram and RTX 4046 8gb VRAM I bought for local AI won't work.
Before I spend $4000 on an RTX 5090 & 1200W PSU upgrade.
Will I be able to run decent local LLMs on my PC if I get this graphics card? I'm been talking to Perplexity to wrap my head around these buying decisions but I don't really understand the implications of what I can and can't do.
I was quite disappointed when I realized System Ram is not what Local LLMs require and I'd like to avoid a $4000 disappointment if possible.
you dont need a 5090, for playing around you can buy a used 3090. Use it for a few weeks.
You will see that the quality of the answers are not as good as a provider with 200b++ Model files, but probably good enough for your usecase..
with 32gb vram you can fully load a max of 65 - 70b model on your gpu (q4 quant) and more if you offload some to the ram , and i dont think any 60 or 70b parameter model will be decent enough
I found this to be a useful speed tester to help me visualize the speed of which the models would take.
I'm thinking I don't really need the frontier models.
Most of what I'm doing is convert transcripts of my speeches into different forms of written content.
I'm not a coder or developer so there's not advance coding needs I have. I do want to have agent swarms that are able to work together for researching content ideas, structuring content in my frameworks, and designing slide presentations running in parallel
I agree with @oak frost , get a 3090 on ebay and try that first - you can always resell it. You still need the beefy PSU though. 3090 is only 24gb, 5090 will give you another 8b, but that might not get you much more model, but maybe a 27b or 35b plus a smaller one like a 9b. I don't know how much swarming, especially in parallel you'll be able to get though. I've been happy with 3090 and qwen36 35b a3b, and qwen36 27b for my needs. I also have a 4070 8 gb) on another machine with qwen35 9b for simpler tasks. I want to do some experimenting with coding too .
I'm surprised I don't see people suggesting AMD R9700 (32GB) cards more. For $1300, I think its the best you can get for local LLM. Sure, that means you're dealing with ROCm, but I would think that's a good tradeoff.
$950 Intel B70 (also 32GB) may yet prove to be worthwhile as well - but their software stack is probably worse than where ROCm was at 2 years ago.
I think that's the major reason, much of the software and tuning is CUDA focused. Regarding 32 gb vs 24, I don't think it's a big enough jump to be able to run a larger model than you could with 24. @rocky violet I don't see how you could fully load a 70b or even 65b Q4 fully in vram? Those would need more than 32 gb? But back to the original question, you'll need to figure out whether you can do what you want with a smaller model. Others have suggested renting a vps with GPU to try it out, but I'm not familiar with that, the cost, or if you can allocate specific GPU size.
Personally, I think 32gb is the current sweet spot (used to be 24gb last year) for best local LLM without breaking the bank. Between Qwen 3.6 and Gemma 4 models if you're limited to 24gb, then your either limiting context down significantly, running really small (Q3) quants which limits usefulness, or both. (Or, you're stuck using older, worse models.) But, it does really come down to what you're trying to do with it. 32GB + good processing speed seems to be the floor for "good enough" local coding (R9700 is OK, but I do wish I had the speed of a 5090 here).
If you're OK with slower responses then mini PC's like M4 mac mini with 48/64GB RAM or similar Strix Halo can also work, but they're both much slower than I think a lot of people are comfortable with as a chat bot. If you have workflows that you can just pass off to let run overnight - then M4/Strix halo are great (mine just spent days churning out AI subtitles for a bunch of old obscure media - speed was no real concern). Next I have mine slowly chewing through something like 600 government documents (probably 20k pages of text and tables in PDF's (many without OCR) and building that into a searchable database - works great when I don't need speed.
But neither mini PC seems good for local image generation if that's your thing - 3090 TI/4090/7900XTX (all at 24GB) are probably still best fit there.
Ultimately, I think many of us are stuck in this world still where we need frontier/SOTA models for real work - local LLM is just a thing you can offload stuff too when it neither needs to be "as good as possible" or "as fast as possible", but instead I just want "as cheap as possible" (when measured over multiple months).
I still pay over $200 in month for Frontier/SOTA models for real work - in addition to offloading what small bits I can to last year's Strix Halo and this year's R9700 in my desktop. But I like to think that offloading what I can keeps me from paying even more for cloud models.
Same, though I'm only on the $20 plan for both Claude and openai, but keep running up on the 5 hour window when coding in Claude code. Just started playing with codex a few days ago and am really impressed with it , and it seems to be more token friendly than Claude code. Now trying to figure out which one I want to give $100 a month to for a bigger 5 hour window, or just continue jumping back and forth between the two!
with offloading as i said
anyone using openclaw on a raspberry pi 5 8 GB?
Let me know if it works for you
I like the idea of learning and trying things out on a VPS first, with the goal for figuring out what hardware I might later choose to buy to run everything local. I am not a programmer, use windows, and a newbie to openclaw.** What VPS service + guide would people reccomend? **Oracle cloud seems like it would emulate a local server pretty well but also looks to be stretching the edge of my knowlege.
i had it running on my pi 5 8GB no issues, several others use lesser pis
It’s hammering my pi cpu, cpu utilisation reaches 100% regularly
anyone using openclaw on a raspberry pi
Oracle pay as you go plan provides a 4 cpu, 24gb ram, 200gb completely free. Hence experiment there. You can also install docker engine there and run openclaw as containers which will provide you with more control and better upgrade paths
they provide that vps plan anymore? i tried before but doesn't seem like they accept everyone
You need to upgrade to pay-as-you-go it is difficult to get slot in pure free tier qupta. But even when you upgrade to pay-as-you-go that combination is still free forever, you only pay if you exceed that
Great tip on getting the free option, will give it a go!
Of all the benefits of using AI, I think having it explain anything to you, at exactly the level you want to receive that information at - that's probably the best.
Which Mac Mini is preferred to buy? Is it the base model or is it the model with the higher RAM? Basically I don't want to run local models on my system. I just want a personal assistant to run openclaw
Even bigger question: should I consider buying a Mac mini for this or should I stick to Cloud VPS only for this?
steer clear of VPS, local hardware plays nicer.
i had a 2012 mac mini, threw linux on it. running openclaw fine. also have a beefy set up. it depends waht you want to acheive though.
If you're just doing cloud models, any mini PC, or old laptop is good enough. If you're already a mac person, the cheapest mac-mini will work well.
Wait for WWDC, could be possible that Apple releases the M5 Mac mini 🤗
anyone knows why i end up on gateway-injected when i refresh the webui? been trying to find an answer / fix but can't figure it out
Is that a hardware question?
i'm using AI to help me build a selfhosted Ollama/Openclaw Team of Agents... I talk to it through Discord.. 3 days so far, at the Discord stage with partial personalities working...
ha ha don't waste money on mac if using cloud LLM anyways
Local bro Ollama 0.19 and the dynamite goes booom 🤗
If you go cloud raspberry pi could do the trick 🤗
sorry i posted in the wrong channel, was sleepy 🙂
this sounds hard. Why would you want this?
i have a potato pc and i cant run the model stay thinking 10h and dont do nothing
If you are using cloud you could migrate to local with the M5
With that cost of M5, i could run cloud sub for 24 m atleast at highest tier and have exposed to latest frontier model. Local models don't cut it vs frontier models
you could run frontier models 24/7 for 24 months? pretty sure your math is faulty there... As for not being up to the tak, depends on what tasks you are doing. I would still architect with a claude pro subscription, but most of the grunt work could be done by a local model, and they are getting better all the time... you can run them 24/7 without rate limits ... while sonnet 24/7 would be
24-Month Cost Estimates
(Sonnet 3.5 API)Low Usage (Simple Agent, 24/7): ~$300–$600
Usage: Only processing data when a user acts, infrequent, short prompts.
Medium Usage (Constant Monitoring, 24/7): ~$10,000–$20,000
Usage: Constant summarization, low-volume coding, high context maintenance.
High Usage (Active Coding/Data Agent, 24/7): ~$50,000–$100,000+
Usage: Rapid, continuous coding tasks with multiple files/retry loops.
and that isn't even opus... thats just sonnet
For Opus 24/7 High-Volume Agent (API)~$9,000+~$216,000+...
so naw... I don't need frontier quality for every single task... there is plenty I could do with a large model on a mac m5
the problem you think is that openclaw is the only solution - which is not. I primarily use n8n for daily workflows and use openclaw for research activities, hence that $200/month is sufficient (with regular small credits that thrown across different events /partners) . Openclaw is a token hogger. For example, if you have an execl sheet to be read and based on that do some infrencing for say couple of columns, in Openclaw, eveyrthing is infrenced. In n8n, using code node you can just simply seggregate the data without using LLM and send for infrencing only the data you required. in my use case, if I use openclaw, it would take about 72K tokens/call vs 5-10K in n8n. Now if there is a new excel format, then invoke openclaw to determine best strategy, once that strategy is developed, turn that into n8n workflow. This way most of the determinintic tasks don't even use LLM. It is used only for those data that requires it. Hence, my context is lean, infrencing ability is fine tuned and I use multiple agents for specific tasks, which keeps it specific. Opus 4.6 (4.7 is bad) is used only when things are completely random.
Own hardware is great - I have a full scale home lab + self hosted cloud solutions - but i won't recommend investing in a hardware which is destined to become obsolete in next 9 - 12 months due to AI architecture improvements and hybrid scaling via vLLMs.
https://n8n.io/ or even https://flowiseai.com/ which can be self hosted. Before openclaw came in n8n ruled the AI agentic world for clear workflows. in Openclaw, you cannot have predetemined workflow like you can do in n8n or flowise
I will check them out... what I don't know would fill a book, but I am working on learning
exactly, and it will be hybrid approach in future - heavy generalized LLM + Finetuned small LLM (using LORA, vLORA) which could be running in optimized software for the likes vLLM or even sg-lang
start with n8n - flowise does not have all the bells and whistles. You could be surprized that many tasks that openclaw was infrencing can be a simple Code Node (without LLM) - hence only infrence what is truly needed and the results would be very reliable
while you are at it - learn docker, openclaw can also be run in docker and much better when new versions come out, since you could spin a new container while the last known good version container is still running. This way your business does not stop because there are breaking changes in new version. With reverse proxies you could also route your work to like 80% to existing proven openclaw container + 20% to new version of openclaw container. However, this requires a bit if knowledge of docker and revrese proxies like Traefik/ Caddy
Ollama's Cloud models can now be used inside Claude Desktop
If you are self hosting - use liteLLM - its like openrouter for you.You cna configure any local or cloud. Openclaw can then only call litellm.
In litellm you can also put policies to route what when
anyone here thinking much about the control model for computer use?
feels like a lot of current stuff assumes the agent should just live on the target machine and poke around from inside it.
i’m starting to think a sidecar model makes more sense:
• run the AI on one machine
• keep the target mac separate
• send input in from outside
• cleaner boundary between thinking and acting.
curious if that feels more sane to others, or if people still think direct-on-box is the better model
isn't this already how nodes work in openclaw?
I think to the extent possible you should in fact avoid having agents poke at the machine running them
Openclaw already works that way. You can run openclaw as nodes (even within docker cpntianer) and let ot be controlled via other machine runnign as a gateway(again in docker contianer)
I currently have 8GB VRAM and 32GB RAM, do you guys have any recommendations for which model I should use for lightweight/local agent tasks?
Currently using Dolphin-X1-8B-Q6_K in LM Studio just for testing purposes and I am getting around 30 tok/sec initially (for longer sessions it stabilizes at around 15 tok/sec), but the model feels rather dumb.
Current model/settings info:
Model: dphn/Dolphin-X1-8B-GGUF
Quantization: Q6_K
Architecture: Llama
Size on disk: 6.60 GB
Context Length: 131072
GPU Offload: 32
CPU Thread Pool Size: 8
Evaluation Batch Size: 725
Unified KV Cache: Enabled
Keep Model in Memory: Enabled
Offload KV Cache to GPU Memory: Disabled
I’d like recommendations for:
- better models for OpenClaw/agent use
- good balance between intelligence + speed
- settings optimization for my hardware
I am willing to sacrifice some context length if needed, but I would prefer not dropping it too aggressively.
What's your current context length?
Context Length: 131,072
Have you tried the new qwen? It's sort of designed a bit better for tool use
i'm currently using Qwen2.5-7B-Instruct Q6_K
it's running pretty well and I am liking it given my limited hardware
once I get more comfortable with openclaw I'll just rent a runpod instance then i can use whatever
Have you tried Qwen3.5-4B Q4_K_M? (I haven't - just seen that recommended here before for 8GB VRAM setups). I'd guess between Qwen 3.4 4B and Gemma-4-E4B
I'll plug https://github.com/AIgenteur/ClawEval (not my work - no connection to the guy who built it). I think this is generally the best (what LLM for my GPU for openclaw) that I've seen.
On the 32 gb, try qwen36-35b-a3b or qwen36-27b, Q4_K_M works well for me on 24 gb, you could probably do higher Q
True — OpenClaw already supports distributed control through gateway + nodes.
What Sidecar Dot adds is a different thing: control of a separate Mac without requiring OpenClaw, Docker, or any installed agent on that target machine.
So I’d separate them like this:
• OpenClaw nodes/gateway = software-native distributed control
• Sidecar Dot = external control of a real Mac that AI can operate directly
That difference matters when the target machine is not already part of your stack.
sidecarbot? you mean this https://www.sidecardot.com/ why would you want to pay for that hardware which is essentially looks like RPI Zero - besides, there is no need, if you just use the openclaw nodes
Do you think a Mac Pro will be able to efficiently run openclaw using qwen27 on Ollama while running Claude code because my Mac air with 24ram is struggling a lot rn
Depends on the specific models? Which ones? Google memory bandwidth for your particular models, but the airs aren't all that fast, up to 153 for m5. Pros will be in the 200 to 600 range
What would you recommend for 16 gb vram?
If you go to the models on LM Studio’s website you can see the minimum ram requirements for each one. That doesn’t include KV cache etc but it gives you an idea of where to start.
@lyric orchid which Mac do you think I should get. I want to be able to do everything comfortably. Also what are those numbers you’re saying: 153, 200, 600
The numbers are the memory bandwidth
I understand but I'm talking best performing model
I'm really not a Mac expert, just relaying what I've read. The point is, you can have two Macs with 128 gigs of ram, but the memory bandwidth speed can be considerably different depending on which model you get. I believe the m3 ultra chip is the fastest. The higher the bandwidth the better the performance .
Use https://www.canirun.ai/ to answer that. Pick your mac model in the hw select and you'll know
I looked at the docs and it seems to focus on the Ollama models - do you think this be a problem if I used it with a different model?
OpenClaw nodes are the right solution when you can install software on the target machine.
Sidecar Dot is for the cases where you can’t, or where you want out-of-band control/recovery.
So it’s not replacing nodes — it’s covering the gap nodes leave.
Yep — agreed. That’s already the native OpenClaw model.
If you can run a node on the target and coordinate it from a separate gateway, that’s usually the cleanest setup.
The reason for Sidecar Dot isn’t to replace that — it’s to handle the cases where you can’t install or rely on a node on the target at all: locked-down machines, third-party devices, broken OS state, or out-of-band HID/KVM-style recovery.
OpenClaw nodes = in-band software control
Sidecar Dot = out-of-band external control
Sure.. it may look like a Pi-class device, but the value isn’t the board, it’s the role. The point is having an external control plane for machines where you can’t install a node, can’t trust the OS, or need out-of-band HID/KVM-style recovery. If OpenClaw nodes can do the job, great - this is for the cases they can’t.
In those cases, use ansible from https://semaphoreui.com/ and invoke any machine. You might need to have python in some nodes
I didn’t consent to you pasting slop at me twice just in case I didn’t see it the first time. Congrats you built an IP KVM.
Apologies. I'm new to Discord Forums & still finding my way re. how to reply in the thread etc.
Sidecar dot is only for mac. And mac is always a compromise 😃
Sidecar Dot
anyone here running a mac cluster?
You have a cluster! You run local models?
mac clusters are a fun project, but I got tired of the latency issues when sharding larger models across nodes
Yeah this seems to be the standard response. Tempting but think it's not worth the time/money at the end of the day
Exactly. I just couldn't justify the mac tax
Instead of macs you could get cluster of gx10 or ryzen Ai + Max 395 systems.
But even those will be outdated in an years time defeating the ROI. Hence I always state thay for now getting credits or paying for subs is best as you work out your token appetite
Yeah I already have a strix halo I barely use with my OC living on it because of a lack of bandwidth/power
Find myself with a collection of various AI/compute hardware but with minimal unification neglect the 6 3090s+6000 on one machine but that sucks up so much power it's only spun up when required.
Will probably move the 6000 to an always on system to better utilize
If you have experience with k8s, you can build up a cluster which will provide some optimization for your disparate gpus https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
this would help if it was all the same arch, but between Nvidia dGPUs, Jetsons, Strix Halos, Mac products, you'd still need dynamic images which is miserable to setup and maintain
maybe one day there will be a fully unified underlying translation layer that doesn't lost stupid amounts of performance (or at least one can dream 😂 )
This is where k8s comes into picture- differnet nodes running diff arch/gpu can operate as single cluster unfoying all gpu, storage, etc
it doesn't function that way if you have different architecture GPU/compute. If you have various nvidia dGPUs then yes, it can work that way. But if you have a mismash, best case is probably running Vulcan between nvidia and amd (unsure it works for intel or mac). even then you lose performance
You can't combine different gpus into a single unified gpu. But multiple GPU can be presented so that you can run multiple pods of ollama/anythingllm, etc. So with openclaw you can run different agent thay can use different ollama. Since everything is in a single cluster, it is seamless
negative; not without the container images and configs needing to support multiple underlying GPU architectures -- which I am not aware of a package that fully supports all out of the box (as the image would be larger than anyone wants too support in a single image)
have you used k8s, do you know about how pods are run?
yes; it's literally a core part of my job lol
have you?
then probably you need more experience! if you have mutliple nodes each having a different GPU, you can run the node with specific container image with plugins and Ollama pods can use them. Hence, as a cluster you will have exposure mutliple gpu nodes. In openclaw (running as a gateway or seperate nodes) you can configure multiple providers from each of those ollama pods to run different agents. This way you setup is optimized to utilize disparate gpus. yes, you cannot comnbine as a single GPU and do a slicing
Kubernetes is an orchestrator, not a magical GPU translation layer. It can schedule workloads onto different nodes, but the container images, drivers, runtimes, and configs still need to support each GPU architecture. Presenting multiple Ollama pods/providers is not the same thing as making mixed AMD/NVIDIA/Intel GPUs behave like a single GPU architecture type.
Please do your research.
its not about k8s itself, it is those device plugins . - yes, I already stated the multiple GPU cannot be offered as a single GPU types, but pods can use mutliple GPUs seperately. Then there is a concept of paralellism which I have not even mentioned given that you need to understand above. Using paralelism you can run model across different GPU (yeah some would need simialar arch). So bottomline, the way you are describing that you differnet GPU are waste and non-performing, they are not. If you had good architecture knowledge of k8s, you could utiilize most of them.
we are on the same page and thinking each other are not; your statement "GPU cannot be offered as a single GPU types" clarifies that.
as for the rest there was never an argument there -- just like trying to string multiple macs together there are still bandwidth issues between the nodes that render it not worth it at the end of the day when such cheap and secure compute is available online (even in high security settings)
At that point one is better off just balancing what they have with different services (k8s or otherwise) -- which is what I do (STT, TTS, rerank, embedding, VL, security, etc, etc, etc)
Hi guys, looking for hardware advice. Is it worth getting a MSI desktop with a rtx 5090 32gb if I can lift it for 3k USD?
if you can get an entire desktop with 5090, RAM, MB, CPU, PSU, SSD, etc for 3k, that's a buy and a great deal in right now -- though I am unsure your endgame (aka personally I wouldn't be trying to drive OC fully local with it, but would use it for lots of fun supplemental stuff for my OC like TTS, STT, embedding, reranking, VL, etc)
Thanks for the feedback , this is very interesting points you raised
absolutely -- either way, right now it is a great investment for $3k; good luck!
you can run most local models using Turbo Quant just fine on rtx 3090 which is ike 1k and you can put it in 0.5k used computer and be 95% there for local LLM Check ClawEval https://github.com/AIgenteur/ClawEval
Bro I’m telling you lot, you don’t need a £3k GPU to start building AI stuff.
Everyone thinks you need one mad machine but that’s not the only way.
Build it like an organism.
One cheap PC does the routing.
One GPU runs a small local model.
Another cheap camera or old phone gives it eyes.
CPU handles logs, memory, scripts, Telegram, all that boring stuff.
Then cloud AI only gets used when the job is actually hard.
That’s the whole point.
You don’t need all the GPUs to magically become one big GPU. Most of the time it don’t work like that anyway. You split the work.
Eyes.
Brain.
Memory.
Hands.
Nervous system.
That’s how you build it.
You can start with an old office PC, a used GPU, Linux, LM Studio or Ollama, OpenClaw, Python scripts, and a camera. £300–£500 if you buy smart, maybe less if you already have parts.
It can watch a room, send alerts, run a small local AI, search its own notes, store logs, speak through Telegram, and only ask the cloud model when it really needs help.
Rich people brute force everything with one monster GPU.
Broke builders have to be smarter.
Use what you’ve got.
Split the jobs.
Make the system survive when one part goes down.
Don’t build one giant brain.
Build an organism loool.
Inspiring !!!
make sense, I run openclaw in rpi then local model in mac mini and things work absolutely fine https://dev.to/anup_sharma_86fa94612fe3c/i-built-an-ai-that-decides-which-ai-to-talk-to-running-247-from-my-living-room-211p
if only i had that kinda money 😭
5090 alone goes for 3k (even as an openbox) Soph, sounds like a steal.
did I undersell it? lol
Yes
damn...should have lead with "Whole PC w/ 5090? Don't think, just buy"
I snagged a 96GB Mac Studio refurb....not sure why.... 😂
96 GB, say less 😄
My thoughts exactly.
Does it power on? Yes, Deal
HI, im looking at getting into openclaw, not sure how to go about it. I have my main desktop at home with a 7800xt in it (im aware this could come with extra steps.) along with a 2009 macbook pro and a latitude 5410 in the mail. I looked into what I want openclaw to do for me, which would be to use my local desktop's compute power to run the llm and be able to message openclaw from my phone or interact with the web ui from my laptop away from home. How would you all go about this? I read the macbook can be used to integrate imessage without having to pay. Does anyone know if this idea is possible?
are you planning on using a subscription or are you trying to be fully local? fully local might be a bit more than painful without at least a 20-30B parameter model ( I would personally not even fathom it)
if you do a subscription (OpenAI, MiniMax, etc), I would probably avoid the 2009 Macbook pro still unless you want to play the "will it work!?" game on hardware that's 15+ years old with only a couple of cores. I would install it on your latitude or your main desktop depending on how you're feeling it should work okay on both with a subscription.
as for your imessage, yes technically it can do iMessage ...but...I don't think it'll work here because your macbook is just too old and will lack support to install what's needed
Plan was fully local and have the main desktop run everything and be able to remotely work with it as a chat or web ui for any device, and I know open core legacy exists but idk how well it will work. Only reason I brought up the MacBook was because I read you needed one for iMessage capability if you don’t want to pay
Hi, I tried to set up a local openclaw agent on my pc. Specs: 32gb ddr5, RTX 4060, Ryzen 5 7500f. I don't really want to spend money. I set up Ollama's qwen 3.5:9b and it was working fine for a little bit, but now it's just replying "NO" to all my messages. I mainly want to use it to set up connections in Notion and Obsidian to track progress of things, and help me with my career in cybersec. Does anyone know why it may not be working, or what model I should run?
Just replying No is weird
Where are you interacting with your agent is it Discord, telegram or where?
I had issues with the default context size on ollama, I needed like 131k context size for it to work
i tried telegram, stopped working. then web ui, then tried discord
ahh okay, what model did you use?
I was thinking making some sort of prompt or corrupted memory is just forcing it to say no
yeah maybe corrupted memory, i’m gonna delete everything and reinstall. i reset my settings and re did onboarding last night but it didn’t do anything, so i might just start from scratch. thanks
do you think qwen 3.5:9b is powerful enough to just automate things into notion and obsidian?
As far as it has enough context window to remember everything which many local models lack (or maybe my machine lack power to)
The qwen 3.5 is a very powerful model for coding and light task also use a better distilled version so you can get good output
yeah okay cool, what do you mean by better distilled sorry?
Distilled models are basically smaller models trained using outputs or knowledge from a bigger stronger model.
Some distilled versions are done better than others, so even if two models are both “Qwen 3.5 9B distilled”, one can perform much better depending on what it was distilled from and how well it was trained/tuned.
So I meant using a well-made distilled version gives you better quality responses while still being lighter/faster to run locally.
This is just really optional tho, usually done by the Chinese more
That’s why deepseek give good result for half the cost
ahh okay gotcha thank you so much
Currently using qwen3-coder-30b 131k context 8k output tokens
Two actually, the other is ollama/qwen3.6:35b-a3b-nvfp4
what are the specs of your pc?
Mac mini m4 64 GB
ahh okay
Guys, is there an open-source project that tailors LM models for OpenClaw usage?
who was the guy who has his own setup
@austere turtle I deleted everything openclaw related and downloaded it back then set it back up and still getting the NO error. I'm stuck. I even tried using openrouter and used a free one and it instantly said I was out of tokens.
And the Ollama model works fine by itself
If your openclaw in docker or running it normally?
And is your ollama running the server that connects to your model
Try this let’s confirm the server is running
curl http://localhost:11434/api/generate -d '{
"model": "qwen3.5:9b",
"prompt": "hello"
}'