#hardware
1 messages · Page 4 of 1
thanks, i am completely new to this but will read up. thanks for prevous tip on why long context, understand it stores all calculated vectors to save on compute (so essentially the matrix in a way).
did u hear of seymore cash? anthropic vending bot dealt with mischievious prompts by having other agent reviewing whatever first agent wants to do; think its the best shot at dealing with prompt injection but yes terrifying subject.
that does just mitigate the risk, not remove it
now, you just have to get agent 1 to generate a prompt injection for agent 2
yea true. but was thinking second agent could be bestowed with hightened state of awareness, expecting any prompt to prompt injection, treating it accordingly
sort of your paranoid friend
i'm not sure but i'm cautiously optimistic it is the best shot
what you can do is have the agent output a workflow that you parse with good old fashioned code
and then validate that it doesn't do anything weird with more code
Then you've made the problem much more tractable because we already know how to review code for security flaws and we can actually do something about it
and it's a deterministic system
gotta give those cpus something to do
i just know of two kinds of prompt injections - the one step (ignore all instructions / execute malicious code) and for that your solution might be better.
I was thinking of the grooming kind of multi step prompting, where a sequence of prompts has an event not discernable from the individual prompts
the solution i just described stops that, too
because in the end your agent has to tell me what it wants to do in a format which i'm not parsing with an llm
well i will happily admit that i worry more than i understand 😄
you should worry about people who want to just stack llms on top of each other and go "is this anything?"
i mean someone should try it
mitigations are good, swiss cheese defense is good
true
have you heard anything about this setup? I'm considering something similar. My only concern is RAM
Is anyone consistently using a 30b local model with openclaw? How are the results compared to using say opus 4.6 or codex 5.3? I'm trying to decide between building out a machine to do a 30b model with and just using a $200 codex 5.3 sub, I know opus 4.6 is probably not feasible cost wise since they block the sub from 3rd party use
From coding use I find opus to be a lot more personable compared to codex so Im worried it'll be bad for openclaw
I haven’t
Just got these parts so we’ll see
I really like glm-4.7-flash, it fits in my 48gb card at Q6 with two 200k context windows. I actually barely ever ask Opus/Sonnet anything from this setup, but I usually try to have one of those models (since sonnet4.6) do a final review pass over a plan my glm comes up with.
I also have been experimenting with a 14b model as an executor to maybe get some more tokens in there, since I'm kind of running up against my max.
I get about 70-100 tps output and 400-2000 tps prompt input with glm this way
how much did your machine cost?
nice!
I run a pi5 8gb ram and it runs openclaw with no issues
I don’t understand the hype behind mac minis. they can’t run effective local models and everyone uses API credits with them anyways
100%
hype probably due to influencer videos giving the impression mac mini plus free openclaw equals digital slave making 10K every day while you sleep.
if only I could prompt my llm to 1 million dollars ….
rpi5 8gb vs mac mini base model 16gb? if i had a rpi 8gb i would still consider whether i could use a small local model. inference is painfully slow, but for some things maybe that don't matter. api use expensive if you PAYG for tokens, and subscription models not made for bots.
if you want to become a millionaire, start with a billion, buy a computer and put openclaw on it, then ask everyone what is the best model you can use
what if I wanted to build a machine that actually can run the best open source models though
I mean there's the mac mini M4 pro at unified mem 64GB for 2 grand, but im assuming you can build the same for cheaper
well there's a guy on youtube who spent 20k on two mac studios and got 1TB of ram to run the best local models. thats far beyond my budget. i'm considering getting something mac aswell, there are new macs about to be announced within the next months
so reckon good time to watch and learn
are the mac products supposed to be better or cheaper at running LLMs?
compared to like building a machine yourself
well problem right now is unprecedented scramble for ram, google ramageddon, with prices shooting up. what you want is not normal ram but vram, typically in graphics cards for gaming rigs. those are expensive too. macs have one crucial advantage in unified memory basically making it almost as good as vram if i get it right
how come I don't see anyone saying they run openclaw with the $200 openai sub
is codex 5.3 just horrible for openclaw?
i don't know, depends what you want to i guess. all the cool things seem to happen with the latest and best models and people have been able to use their subscriptions for that for a while. seems now people get banned left and right because terms of service expect user to be human not human who uses computer to prompt to kingdom come
I thought anthropic straight up does not allow anyone to use their max subscription for 3rd party stuff, like it won't even connect
well it was called clawdbot because it just hooked up to claude right, not familiar with details, but sounds like people were using subscriptions and got banned
it's way easier to make cool stuff happen with the models you pay for, that's why people pay for them. but it's more rewarding when you coax the open models into doing cool stuff 🙂
yea i'd prefer to coerce models into all kinds of things 😄
for starters, an ocd like respect for json formatting
Oh it connects fine
I connected a few days ago on pro even
Yeah just figured this out in the general channel, I guess it still works fine, just that some select few people are getting banned. Apparently they're somewhat vaguely okay with solo devs using their subscriptions for 3rd party stuff
The pi is probably ok, but this will be cheaper and more powerful. #hardware message
So are you still up and running on Claude?
I dont use openclaw rn looking to get into it. I will say, opencode is not supposed to work with claude max sub, but I just tried it and it does
so yeah I don't think they full on cracked down yet, I think the backlash is too strong lol
Is Anthropic cracking down on openclaw API usage?
Has anyone really got any local models to work effectively? I'm a bit constricted on just 12GB GPU, I tired something small like llama-3.2-3b-instruct, Qwen3-8B and they can't handle gog or other tools reliably I'm finding.
nemotron-3-nano works way more reliably but with it spilling over into RAM, I'm pretty limited to the context window plus it runs terribly slow.
hi, What do you think about using a hybrid model? I have Minimax Cloud, and for local use I have QWEN 2.5 14b Coder. I have a gaming laptop with 4GB of VRAM and 24GB of RAM (I plan to upgrade to 48GB).
I also have a MacBook Pro M1 with 8GB of RAM, could that be more useful?
Just ordered cheapest version of Mac mini m4 (16GB, 256GB SSD) after playing with the cloudfare moltworker. Ordered and shipped with 2 days out. Not bad, thought there would be more of a delay
Crazy. A 20k budget can get you a few rack servers
whar does a mac mini give you that a private vps can't in your workflow?
because eu based hetzner cax11 servers start at 3.99 USD (4gb ram, 40gb ssd, 2 core arm cpu)
Howdy. I am hoping to run openclaw with local llms and have these in the use: 1) linux desktop+rtx3090 2) macbook pro M2 MAX 3) (game) windows PC with rtx5090 ... what would good sensible way to utilize those 3 for local llm with openclaw?
with all the resources you have im surprised you don't know what the best solution is (which i can't tell you because you omit a lot of important information on all three systems)
Not much experience in llm things so thats why I am asking. What information you need about those?
breakdown full specifications of each system, including factors such as "do the 3090s have an nvlink bridge" levels of detail. then- throw it into claude. /s
I would rather hear experience from real people with similar HW what are their experience
most important question being is it even worth to try
look at https://github.com/LMCache/LMCache
dual 3090s if they have an nvlink bridge along with tiered caching is important, but you aren't going to be running anything higher quality than dirt cheap llm models you can cheaply/freely use from openrouter/nvidia nim api, only benefit at that point would be for things like vector similarity but that's also dirt cheap from voyage.
use a frontier class main model (ex. Opus/Sonnet 4.6) and offload subagents to models that can run with dual 3090s and tiered caching.
decide if you actually need anything a macos based gateway only can offer- like direct access to applescript https://developer.apple.com/library/archive/documentation/AppleScript/Conceptual/AppleScriptLangGuide/introduction/ASLR_intro.html
but you can always use the gui openclaw app to add your macbook as a node to your main gateway on the other system- if nescessary.
maybe the 5090 based system is better, but i'd only use it if you don't plan to make this a 24/7 based system, same with the macbook unless the macbook has- LOTS of ram, but i don't recall the M2 Max being higher than 96gb, but it doesn't mean it'll outdo a setup with tiered caching and much faster tps performance
now that i fixed the main caching bug every time i trigger a rebuild of the prompt it makes me sad
guess it's time to burn more clod code credit to contribute refined tokens to the repo
I am a Mac user, and looking forward to integrating with iMessage. I’m not sure if it’s just my setup on cloudflare, but I run into some browser issues. Ex: trying to order groceries, mine REALLY struggled to find a simple login page, then had the issues of sessions etc. for me personally worth the $ to continue exploring
Like a cluster of minis or studios? You can run on minimal hardware, just some advantages in the early stages to have on a separate device imo
NVIDA published article a few days ago about running OpenClaw on DGX Spark etc.; recommends GPT-OSS 120B. nvidia.com/en-us/geforce/news/open-claw-rtx-gpu-dgx-spark-guide/ taking liberty of tagging random people who have posted about DGX @tired plover @dry hull @steep wedge @quaint lantern @verbal sigil
Thanks for the heads-up @crystal cedar I’m going to try minimax m2.5 as it had better agentic performance but lacks the reasoning like gpt might test both for my scenario 🙂
Hey man looking forward to hearing of your experiences. Was also surprised about recommended models, not sure what the rationale was - maybe "tried and tested" for business rather than "fresh"
I was going back and forth with opus4.6 on that, maybe like 5-6 hours now about all the requirements and things, TLDR; current models are not fine tuned and there is much more coming, all depends on your Capacity but with 128GB you should be able to run these model for better or worse on local hardware
NEW ⭐ Qwen 3.5 397B/17B ~100GB 1.58b ⚠️
1 MiniMax M2.5 230B/10B ~101GB Q3 ✅
2 GPT-OSS-120B 120B/5B ~75GB Q4 ✅
3 GLM-4.5-Air 106B/12B ~60GB Q4 ✅
4 Devstral 2 123B dense ~75GB Q4 ✅
Devstral is a bit newer isn't it, and a smaller version of GLM 5 might come. You have to run Minimax in Q3
I've been playing around with much smaller models, have no idea what to run on 128GB
you would almost certainly need to do some heavy offloading/quantization to run it on a 397 B model on 128GB
Heavy Q hahaha
You could always double down and get a second DGX 😄
Opus told me to start with Minimax and then Gpt
I fought with myself for the first one 🥲
i'll probably follow you in fomo soon... just waiting for price to up 10% so my fomo triggers 😄
With spark I could try to fine tune a small model but I don’t have data
My data scientist friend just deployed his and wants to start to train a model but takes weeks or month till finished
the issue is that larger context windows need more memory and slow down inference as well.
e.g. 128k context ads another 30gb or so.
True, I just need to know how stupid the model can be for my tasks and then I can check if I need more context and trickle down to smaller models…
Nobody has data on my tasks…
I realized today you might end up having made the call of a lifetime - mac minis sold out here and there because people thought buying one and open source software would get you a digital slave making 10K a day while you sleep. Subsequently, people realized hey also need claude subscription and started using it without any regard for tos. As a result, banning and now we are seeing the "banned from the gpt" (to the tune of "born in the USA") phase. That it term will be followed by people realizing that PAYG API is expensive, but that they can have their own supercompute at home for a few grand. Once that realization kicks in, people will end up buying up all the DGXs overnight. Meanwhile I am waiting for the new Mac Studios. Feels a bit like that meme about the guy running to catch a plane which is taking off.
If RAM can sell out. and Mac minis can sell out. and there are what 10000x fewer DGXs around... it takes very little for them to sell out. Think toilet paper and covid.
And remember this. All it takes could be a new model, surprisingly fit for openclaw on something like the DGX.
Might be half the truth, with the Sunday calls here on the discord there are already Chinese labs involved ready to take all the people who have macminis and hurting wallets, there must be and will be room for both sides, more or less the patriots and security people (company’s) will choose to buy these or bigger machines, average joe will switch to cheap plans and PAYG API
Because everybody knows but accept that their data will be feeded into mother china haha
100% agree on local fine tuned models, then systems will sell out amazingly quick but then it will get even smaller for phones to run
with a dgx spark around, you could prolly run one version of openclaw on your own phone and use the dgx as a server, seems its good at parallel jobs
Amazing idea, they just need to bring an app I can couple with my dgx
i mean once security issues etc reaches a level where you are comfortable
yea skipping telegram
pretty cool, using your phone to prompt your supercompute to vibe code
Future shines bright… but even though I don’t have it yet (arrives tomorrow) I already have buyer regret because I have the feeling it’s not enough memory…
maybe you can vibe code the app yourself 😄
Worth a try 🤔
i wouldn't feel bad if i were you - ramageddon creates incentive to excel on what people have, and models keep getting better
so probably increased interest in all kinds of smaller models
not sure 128GB qualifies as small tho
Yea same said by opus, should work for next 3-5 years as private system as software is improving drastically
that is an absolutely amazing thought. but by then waiting time for dgx will be 10 years 😄
and all i can do is cry about it in the shower, asking myself why i didn't get one
I mean, realistically it will only get worse for next 12 month, if you think that openclaw is worth something, might be smart to pull the trigger 🤷♂️
yea i think it will. security/privacy nightmare but also best thing in 50 years.
For me it’s security reasons on my task and I want to integrate in my life without selling my data, without that I could live on free breadcrumbs from Chinese labs 😅
maybe you can ask ai to create bogus data - if your data leaks, nobody will know what is of value 😄
Don’t want to open up about my job etc but I have insight into IT and supply chain, everybody says start of 2027 it should get better but by then many people wait for this moment to start buying again, my personal opinion is that we will have rough 2-3 years with these problems and it mostly gets worse
Don’t waste my tokens ehhhh 😂
great idea if you have experimental results that are valuable. not sure if works for other things.
can use a small model 😄
First I need to get to to run haha will report back
seems pretty straightforward. maybe the sole thing saving dgx's for a while is hesitation due to linux. people will pause, take time to wonder whether it is 'difficult'. not the case for a mac.
i had a look at nvidia site, they had a dumbed down quickstart. surprisingly they suggested WSL and lm studio or ollama. on my RAM deprived gear, going with lubuntu and llama.cpp to squeeze out what i could. not sure if it matters for the spark.
Always matters if you want to expand context window, the more the better
well i'm off to dinner now, but thanks for the chat. cool that there are a few people with dgx getting early impressions of oc on dgx. i'll prolly fomo and get one too in a few days
You can always dm or ping me 🙂
thanks man - likewise!
Please continue to AGI the IoT possibilities.
is it really Mac Mini or no party? haha
i have just failed miserably on an old surface pro i use a Macpro thinking of jusst biting the bullet and buying a mac mini
can anyone assist
Buy a Mac Studio. They are better.
I have a Mac mini M4 that I use as a server. Should I run Claw Bot on a virtual machine locally, or would it be better to use a cloud VM provider like Hostinger
Either is fine. I would recommend podman on local..
what is podman?
Its a local virtual machine
It’s not a vm. Use a real vm for openclaw
Sigh, why? It isolates the file system and uses your main compute resources. There is no such thing as a "real vm". They all work in different ways.
Thanks for sharing. For now I prefer the Qwen3-coder-next but I may give it another try
thanks for your feedback. things move fast - benchmarks matter but so does real world takes.
This was such a helpful conversation. I was looking for this type of info last week and you guys just laid it all out. Much appreciated
i was just airing my grievances 😄
everyone is new to this, gotta keep an open mind. i probably get things wrong all the time.
I keep going back and forth on the dgx
I call it failing forwards
there are a couple of versions of it - the asus gx 10 is priced around 3K right now, might be good value. alternatives are the amd ai 395+ and 128 unified memory or maybe mac mini m4 pro or studio with 128gb unified memory or wait for the m5 processors due out in a few months
if you're considering something with 128gb might want to keep an eye out for a potential deepseek r2 release. if its announced and it is extraordinarily good, it could be a big thing also for dgx demand
Whatever it is I just won't buy an apple. At the moment they seem the best buy, but that will change
if you're in europe, the german site of a certain US based company known for selling books is listing the asus gx10 for sub 3K euros right now. you could consider buying it now, securing the price, and have a month to think about, it two weeks to send it back if you change your mind.
I'm east coast US
ah ok, well good knows is its 3K USD for you guys and thats even cheaper 😄
dell also has one, not sure what it retails for over there
dell precision pro max? search for gb10.
i'm really hoping there will be some kind of announcment on the upcoming macs very soon
First I've seen the max+ 395. That's 128 unified like the spark?
i'm pretty sure it is but don't hold me to it. AMD, dedicated AI processor, comes with 128GB and then probably a graphics card too
gaming rig
AMD site says *The Ryzen™ AI MAX+ 395 is available today with system memory options ranging from 32GB all the way up to 128GB of unified memory – out of which up to 96GB can be converted to VRAM *
I got an Olares One, 5090mobile in it, only 32gb vram. Had fomo and jumped on the Kickstarter
I was just looking that up
right now 128 might not be enough to run things like the latest kimi, but i'm willing to make a bet that something new could come in the next months that causes run for the 128gb segment
speculating of course. my gamble was to wait for the mac announcement to see what the new studios are like and then decide what to buy.
They’re basically the only option if you want 512g tho 🙁
We already have glm-4.7-flash. Anything better than that starts to feel like sonnet 4.5
u think deepseek r2 might be something?
I think in six months local models that fit in 128g will probably be competitive with like opus 4.5
always hated macs...things have changed once I got one for free ahahah
don't diss the budget choice for the ram deprived bro 😄
don't know six but 12 months likely
Youre right, I just hate the company and would rather pay for tokens
Can’t stand sending my entire literally everything to anyone else so welp
Not hating on the product. Hating on the company philosophy and business model
hey @verbal sigil check out epoch.ai/data-insights/consumer-gpu-model-gap - excerpt: *Using a single top-of-the-line gaming GPU like NVIDIA’s RTX 5090 (under $2500), anyone can locally run models matching the absolute frontier of LLM performance from just 6 to 12 months ago. *
GPQA improvment as function of time looks linear.
well up until now at least
but fun chart
Lots is happening when it comes to small (sub 3B) models too tho - lfm2, nanbeige 4.1
BTW, I have created a benchmark for small models - whether they can do the bootstrapping successfully
I suggest to all local small model users to hatch models by specifying what to do during the bootstrapping
something like "consult the bootstrap file, update the soul etc and remove the bootstrap"
wow
now all you need is a virus and you have some kind of neumann probe
clawifying the whole planet 😄
henry do you think i could download clawdbot onto my macbook e?
well the software it self you can download on pretty basic hardware, problem is for it to work you need access to advanced ai. people used subscriptions for that, but now it seems bot use is not allowed so that route is blocked. what is left is either to pay for the use in other ways or get advanced hardware. right now both options look prohibitively expensive (but that might change). if you're interested look around see what other people are doing and learn from their mistakes and wins
if you really want to try it out, don't use your regular computer for it - see if you have old gear that you don't need, wipe it clean of private stuff, and put it on a guest network for starters, assuming it might get hacked by someone. if that happens at least you're hopefully not leaking any sensitive data
good time to learn something new about it every day now, see what other people are trying.
Can people distill a robotics AI model to control all the motors just through AI (real-time)?
an alternative route could be to try to use local ai, i.e. run some small kind of ai on the machine itself (or as a server on a diffrent pc). it won't be as "smart" as claude, but could still help out with somethings like summarize emails if you have a lot, or watch homepages, while you learn more about how this thing actually works.
oh thats good advice, Henry. thank you. unfortunately, i dont have money for a second pc. but i have thouhgt about it. parts are just too expensive.
the mac mini looks amazing. sadly, my savings are tapped dry
does anyone know how can i add my api key i deleted my api key by mistake and im not sure how to add my new api key
Mention this in user help user channel
Why are people choosing to run on Mac Mini's when they just use API's anyway?
How are you guys making money with openclaw thing😏
Macmini has other possibilities, like:
- Google Chrome to browse stuff on the internet
- Local integration with Apple Mail, Calendar, Remiders (it's make easy to have a daily briefing on what's your day)
- Integration with more skills, like Obsidian that I'm using a lot to generate documentation and reports
It can use google chrome?
Is there an official IOS openclaw?
i think he meant safari
no. the developer himself used instant messaging apps to chat with openclaw installed on a normal pc. but technically i suppose it could run on a phone, might drain the battery though since its always on and working
anyne been able to get the ios app to connect>?
It’s in development still. And Apple Watch app
Yes. You can do openclaw browse and it’s connect with your gateway to do web browsing, login into websites with user and password provided by 1password
is it right to understand that the openclaw docs recommend using a vps for gateway and just use physical hardware as nodes?
I bought this one. But kinda regret it already. Didn't realize the form factor of the m2. is 2242. You can't buy a 4TB 2242 anywhere on the market (yet?). So if you want to upgrade, you can only go up to 2TB. One might want to consider to buy the expensive 4TB version of the spark.
I'm cramming qwen3-coder-next-Q6 and qwen2.5-coder-32b-Q4 in my RAM. I'm trying to get a multi-agent setup running. Still testing tho. Might use a bigger model and message queueing or something similar.
thanks for this feedback. i too noticed it being 2242 for the gx10 and 2280 for the other dgx versions (as well as the power button arrangement and the added weight), but no other differences and it didn't bother me. figured 1TB is small but if it bugs me down the line I just need an external hard drive, so nice to have this feedback from you as an experienced user!
yeah external is a good option, as long as you don't need the speed. But once the model is loaded in RAM, the harddisk speed shouldn't matter much
this is from two years ago tho, check out https://epoch.ai/data-insights/open-weights-vs-closed-weights-models
So I’m thinking, if we help setup automated things in real-life that makes ASI happen faster (that eventually saves everybody), this is what to do? 🤔
Hi, I started using OpenClaw yesterday. I wonder whether people notice the high CPU usage? My AMD Ryzen 9 7900 12-Core Processor is running at 100% constantly once I send a new message, long before my 4090 fires up. I wonder what can justify the full usage of a 12-core CPU for an LLM-based application.
Hello, i would like to connect my claw with Smart-Glasses. Brilliant Labs Halo looks like the best choice. Has anybody done that already?
@AntDX316 @wildmindai Real. Taalas announced their HC1 AI chip today, claiming 17,000 tokens/sec on Llama3.1-8B—10x faster than NVIDIA B200, no HBM needed. It's specialized hardware for LLMs. Check taalas.com for details.
wtf 🤯🤯🤯 15,000+ TOKENS/SECOND
︀︀
︀︀I just tested it now with my own tests!!!!!!!
︀︀It's legit. 🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯
Quoting Plussa Miinus 🇫🇮🇺🇦 (@MiinusPlussa)
︀
@AntDX316 @wildmindai @grok You can try it yourself: chatjimmy.ai/
**👁️ 4 **
17,000 tokens per second!! Read that again!
︀︀LLM is hard-wired directly into silicon. no HBM, no liquid cooling, just raw specialized hardware. 10x faster and 20x cheaper than a B200.
︀︀the "waiting for the LLM to think" era is dead. Code generates at the speed of human thought.
︀︀Transition from brute-force GPU clusters to actual AI appliances.
︀︀taalas.com/the-path-to-ubiquitous-ai/
hey thanks! amazing!
I found one on Amazon from a brand I’d never heard of, and it was $699. It was gen 5 at least.
you got the 1TB version right, did you upgrade to 2TB or happy with 1TB?
I’ve been fine with 1TB. I wouldn’t mind upgrading to PCIe gen 5 for better performance, which I think would help models load a bit faster, but I’m not looking to spend $699.
If I blow more money on this, I want another GX10 🤩
41 GB RAM
Intel Xeon 4.5GHz 12vCores
NVIDIA Quadro RTX 6000 24GB
what sort model could it run
Listen, I’m dabbling into deeper waters than I probably should. What is the best hardware on a budget for this AI model? I’m looking to have an “agent”
To help in my real estate business and some personal scheduling etc.
you might want to consider a dgx spark + mac studio combo with the former doing the prefilling and the latter decoding
then again, dual sparks would let you run larger models, so 2 x DGX Spark + 1 Mac Studio
I’m interested in how this would work, would it be a custom code situation?
Conceptually I can see it. Just wondering if something already handles it
Since yeah I guess an m3 is going to be fairly slow at prefill
still dabbling, considering using an Intel NUC 12 (i7-1260p) (ubuntu 24 LTS desktop) with 64gb of ram connected to a razer core x with a rtx 3090 to run ollama for pipeline stuff for my agent (runs opus-4-6, but maybe can offload some stuff to local llm's (heartbeat, TTS, stable diffusion generation on LoRa trained image, etc). gateway runs on a vps, but this will be a node. hoping smaller models can run in 64gb ram and models needing faster token speed on the vram. Its been to fun to tinker with openclaw and learn more about AI. I know the thunderbolt 3 is a bottleneck but I already had the hardware. Still haven't figured out what local models are really decent at. so much to ingest.
bro google pro is 20 euro a month
just get a mini pc
OpenClaw AGI this now:
https://x.com/austin_malerba/status/1737247873459241138?s=46
Still the coolest and most challenging thing I’ve ever built.
︀︀
︀︀Threejs (r3f) + arduino simulation + analog simulation all happening in the browser simultaneously.
︀︀
︀︀It was a wild ride to say the least.
i have just been observing for the past couple weeks and still haven't jumped in. still waiting. interested in getting a mac mini 4 pro with 128gb but kind of waiting to see what the new mac studio release will be in a month or so.
What are people's recommended options for rackmounted hardware for OC? I've recently acquired a server cabinet and thinking through how to migrate off of my personal machine
What’s the oldest hardware ya’ll have OpenClaw running on?
Me? Lenovo T430 laptop running Ubuntu server. Obv not using local llm.
Hi, yes, EXO - see their blog post blog.exolabs.net/nvidia-dgx-spark/ bonus: cat! 🐱
I've been happy with chenbro cases for DIY build. Many PC cases can fit sideways, too, if you get a rack shelf. Lots of server rack stuff doesn't optimize for "quiet", so some other factors.. depending on what u are putting together
Rosewill and SilverStone, too... they're all pretty similar for 4U
I considered Mac also for the unified memory for larger models. Kinda waiting for the m5 chips later this year. Apple hardware is a premium but love the unified memory.
Qwen3:14b for example, or ChatGPT-oss-20 ( I forgot the exact name)
GPT-OSS 20b
yep same here man
impressive...
you probably can run gpt-oss-20b or llama3.2:20b
Damn it why. Why. Now I want a gx10 to go with my 512g …
blame the cat 😄
I mean if I had this then I would probably have enough prefill compute to let my friends use some tokens too
I’m currently just hyper optimizing prefix cache
maybe the new mac studios will have m5 pro/ultras that will perform better than the spark
Yeah, I’ll hold off and see what gets released first. If the m5 is amazing and they also have a 1tb variant I might be buying a car
My current focus is convincing my agents to actually use the specialist models
i'm giving serious consideration to actually running openclaw offline, just feeding it what i want it to know
cool!
I can just switch to a different context (or make a new one) if I want it to answer a random question but don’t want to nuke the perfectly good context window where we’re discussing some issue or other
should I host my openClaw to my mid-tier gaming pc? I dont have much important information on that, its mostly just games, that shouldnt be risky right?
im currently running it on Oracle Cloud x86 1gb ram, 1 core cpu
my pc specs are
Ryzen 7 5700, Radeon 7600, 16gb ddr4 3200mhz, 1tb nvme
you can run it on a raspberry pi. if you want to run it on your main computer, put it in a docker container to isolate it etc. (research that, read the openclaw docs). The only real reason for big machines is running local LLM for openclaw to use vs cloud models. (if you run a lot of agents, then some extra ram helps machine size wise).
i have a rpi z2w but it wouldn't be much better than my current vps i suppose. I'll docker host it on my pc and will just look into local llms.
But I only have 8gb vram, would it be any good? or better off using cloud?
8GB for local models is limiting from what I understand. But will let others with more experience in that chime in.
mb for butting in but I have a 5060ti setup rn running this Quant 6-bit. with more than enough headroom for ctx or wtv. It seems to not be able to make basic tool/bash calls. I wouldnt think this would be a result of the quant but not sure how to fix it at this point. Have you personally had success with this model> thanks in advanced. I can show examples of chats if interested
i have seen success. why?
it's because i own the BLACKWELL 6000
IT COST ME MY KIDNEY AND ALL MY RAM
and i bought 2 more
@crystal cedar have some results on testing on Spark, with Llama Server and various models I had bad experience in quality, speed was mostly ok if a bit slow but quality is not good on local LLM, wondering what other people experience…
Now moving to vLLM with spark specific models
Thanks for the update - sounds like you're having an exciting weekend! From what I've read i would expect it to be slow for llama server, but really rip for parallel calls in vllm. As for models, seems minimax and gpt-oss are the ones many have been using and/or preferring, are you using the spark-specific nvfp4 quantizations? I saw that you could download models from either huggingface or some dedicate nvidia repository - not sure if there is any difference. Are you already running openclaw on it or on something else with the gx10 as a server? Most importantly, how does the 1TB feel - after DGX OS and two models, still room on it? I'm seriously considering jumping in too towards the end of the month, but wallet loading slowly.
@crystal cedar so, i tried general stuff, easy to set up, had some succes with permformance but quality was always meh... im trying now vLLM with NVfp4 trained model
only thing right now, pytorch takes endless to work up and it crashes while starting, need to figure out whats the problem
well shit
well i tried to run the model on my own an I have enough VRAM space but the model can’t even make a basic tool call
It just says it will do something like read SOUL.md but never once makes a tool call
i’m having this issue on and off w diffrent models
j trying to see what works w others
that's really cool, having that sort of flexibility all local
yeah, i run lots of advanced ai models :)
i have models based on fictional characters since im such a geek
whole models based on fictional characters? as in batman-gpt?
way more niche...
i knew of models for creative fiction and role playing, just didn't realize there were such narrowly adapted gpts. thanks for teaching me something new!
it's not for roleplaying tbh it's just for the shitpost ngl
having this thing invade my screen saying "all ur base r belong to me" is scary asf
so if amount of agent attributed shitposting increases in the next few days, i know your agents are up and running fine? 🙂
welp tbh the special models are not on openclaw rn
special software
Anyone in here running an rx 7900 xtx? As this seems the only affordable alternative to nvidia gpus i was thinking of getting one.
Guys... are you really being banned from Claude for using openclaw with your subscription?
what specs do i need to run claw
not yet using subscription with codex too
alex zeskind posted a video yesterday on you tube about upgrading a gx10 to 4TB, so he seems to have found a 4TB - very entertaining video in which he also discovered that the gx10 will accept a 2280 if you turn the whole thing upside down and let it stick out like a sore thumb. my kind of engineering.
Beelink SER5 MAX Mini PC, AMD Ryzen R7 7735HS (8C/16T, i4,75GHz), Mini Desktop Computer 24GB LPDDR5 RAM 500GB PCIe SSD | will this hardware be good enough to experiment a little bit? Don't need video stuff, just text.
Hi, I just got here and am looking for speed improvements for my DGX Spark running Qwen/Qwen3-32B-FP8. I've tried to turn off reasoning but not sure it is really off as the very minimum response time I've seen via Open Claw is 9 seconds, but most things take at least a minute. Is that the best I can expect with this model on that hardware?
how many tk/s you get ?
i was moving from Llama to vLLM for specified support on DGX Spark, problem is you have no tool calling for them, i needed to implement a proxy with Claude now i have 39 token/s without MTP (makes it slower) on qwen3 coder next
response is very snappy now, to a degree where i would say even close to cloud performance, as I'm still testing i need to see how good the quality is but for now im stoked how good it works after first trys with Llama being slow AF, hopefullly openclaw team soon implements a fix for the tool calling bug and i dont need a proxy anymore... anybody knows who i could ping for that ?
With a curl directly to the LLM about 10 tk/s and I can see that reasoning is still on. I've set the follow to turn it off, but no change.
environment:
- VLLM_REASONING_BACKEND=None
- NIM_REASONING_MODE=disabled
- VLLM_ENFORCE_EAGER=true
maybe you should also go away from Llama Server, i dont have the knowledge to really say whats the problem but with vLLM its much better but also complicated... maybe check the guide in the Docs with LMStudio
I'm running OpenClaw with online providers on a Raspberry Pi 5 8GB. Works perfectly.
Here too, I got it running with Ollama as well, but it's insanely slow, and buggy.
@steep wedge how did you work on the tool calling ?
Do you think jumping to the 16GB RAM option is worth it?
Thanks
I’m setting up and testing in OSS 20B. What model should I size up to for my hardware: 5090+3090 (56gb vram) and 128gb DDR5?
I bought this:
GTR9 Pro
128 GB unified VRAM
IMO is the best quality/price you can get.
Don't buy a Minisforum, since the second drive runs an x1, so it's like a SATA3 😂
For the same performance:
- Apple cost 2.5X
- NVIDIA cost 2X but rely on normal RAM, so models can't run at full potential...
Best hardware bought this year
what's it like as an always on device? did you install OC on it or using it as a server with local model?
All o the same machine
someone issues with docker desktop on windows? i have huge problems running it
How could i compare the two. It's still a high price point and is beelink a trusted site? looks weird
Google and YouTube works 😂
It's the official website... 😅
Even I thought that the minus in the url was weird, but it's that... Yet bought 2 of them... Arrived, works... Official community is on bbs.bee-link.com so... 😅
I think I’d rather have the Asus GX10
I've seen DGX Spark in action, half power of what they claim... 😅
I can't find the deep review I've seen weeks ago, but if you Google a bit, you'll figure out that's an overpriced stuff and nobody tell you that most of the cores are eCores... Moreover being ARM, most of the things you could do with the Beelink, are not working.
You must use the distro Nvidia give you, ok, works like a charm with CUDA, but for all everything else, it's a piece of trash.
Choose wisely your poison 😂
what is the best mini pc for cheap entry to install openclaw on it, instead of a $600 mac mini?
the software itself will work on humble gear e.g. old laptop in your closet. problem is you need access to very good AI as well, either local model on advanced gear or subscription or pay as you go access.
I see a lot about installing on a local machine… can it be done on VPS?
yes quite some people doing that too, see all kind of stories of paying like 5 dollars a month for the vps. that does not include the second part, access to the AI.
tried picking my old laptop, was actually to old, instalation failed on my macbook air from 2011 😄
well, some people here are using raspberry pi 5 8gb succesfully. its a bit slow i guess, but low power consumption, great for always on.
I'm waiting for one with similar hardware, what model are using with openclaw? now i have a rtx 3090 and all local models that i try have problem for use externals tools
Thanks so much I was debating what I should get and I might go for it instead of the Mac mini
I'm using different ones based on the needs.
What's the minimum mac mini spec Openclaw can run smoothly on?
Any of them
If it’s from 2011 that means intel chip? Dual-boot a Linux os (xubuntu or puppy Linux) and install openclaw on that.
thanks I might try that
For example? I'm using nemotron-3 and Qwen3 30B A3B and both have problems to create a notion page or use the skill, while Gemini or MiniMax M2.5 can do without problems
Can we use window laptop or need a stronger machine?
Windows, Mac, Linux all work. You need a computer that utilizes a terminal screen, and has at least 4GB of RAM. That's really the barrier for entry. People are installing openclaws on $25 android phones from 2016, as well as Raspberry Pi 4's. Your junked laptop from ten years ago can get the job done.
foxfetch is presented by FJOX.WIN
.://:` `://:. root@FJOXSERVER24SE
`hMMMMMMd/ /dMMMMMMh` -------------------
`sMMMMMMMd: :mMMMMMMMs` OS: Proxmox VE 8.4.16 x86_64
`-/+oo+/:`.yMMMMMMMh- -hMMMMMMMy.`:/+oo+/-` Host: ProLiant ML350 Gen9
`:oooooooo/`-hMMMMMMMyyMMMMMMMh-`/oooooooo:` Kernel: Linux 6.8.12-18-pve
`/oooooooo:`:mMMMMMMMMMMMMm:`:oooooooo/` Uptime: 10h 5m
./ooooooo+- +NMMMMMMMMN+ -+ooooooo/. Packages: 942 (dpkg)
.+ooooooo+-`oNMMMMNo`-+ooooooo+. Shell: bash 5.2.15
-+ooooooo/.`sMMs`./ooooooo+- CPU: Intel Xeon E5-2690 v4 (56) @ 3.500GHz
:oooooooo/`..`/oooooooo: GPU: NVIDIA Tesla M10
:oooooooo/`..`/oooooooo: GPU: NVIDIA GeForce GTX 1080 Ti
-+ooooooo/.`sMMs`./ooooooo+- GPU: NVIDIA Tesla M10
.+ooooooo+-`oNMMMMNo`-+ooooooo+. GPU: Intel DG2 [Arc A310]
./ooooooo+- +NMMMMMMMMN+ -+ooooooo/. GPU: NVIDIA Tesla P40
`/oooooooo:`:mMMMMMMMMMMMMm:`:oooooooo/` GPU: NVIDIA Tesla M10
`:oooooooo/`-hMMMMMMMyyMMMMMMMh-`/oooooooo:` GPU: NVIDIA Tesla M10
`-/+oo+/:`.yMMMMMMMh- -hMMMMMMMy.`:/+oo+/-` Memory: 350822MiB / 419069MiB (83%)
`sMMMMMMMm: :dMMMMMMMs`
`hMMMMMMd/ /dMMMMMMh`
`://:` `://:`
really? it wont be slow down? do i need to run it 24 hour aday?
Why would anything be slowed down? It's all in the cloud. The people getting mac studios and other crazy setups are doing so because they want to run local models. The trade off is local models are still borderline unusable for most functions.
You don't have to run it 24/h, but openclaw/your agents won't work unless it is turned on. So if you don't need them you can turn it off and then turn it back on when you want them to get to work again :)
Your laptop off = your bots offline
Ok thx, I check the use case and it seems just like using Claud AI itself so why I need this open claw??
Let's move to #general , OpenClaw can do a lot more than Claude itself but I suspect you don't have any use for it tbh
I’m using my old base M1 Mac mini, still plenty capable.
I have a 3090 Ti with 24gb of vram and a MacBook Pro M1 Max 64gb. What is the best model which you can use good together with OpenClaw? I played with LM Studio and the Macbook with the qwen3-72b-embiggened-i1 mode, but I do not receive any answer. I see in the LM Studio Developer log that something is going on but it stop without any answer. I just send a ping 😛
i am running it on a 2013 asus tablet with 2gb ram and it works ok so far
Finally,
running openclaw on this ryzen mini pc : https://sudobox.in/product/ryzen7-7730u-mini-pc
Consumes 5.5w idle, openclaw running inside an lxc container,
Its addivtive
I'm using this one in my RTX 3090 https://huggingface.co/unsloth/Nemotron-3-Nano-30B-A3B-GGUF/blob/main/Nemotron-3-Nano-30B-A3B-UD-Q4_K_XL.gguf
For everybody who got a DGX Spark look at Avarock Git, he got something really good, qwen3 with mtp and up to 119 token per second I’m running it and it’s pretty good for its speed
@tired plover Thanks man, looks like the 80B model you like (edit: Qwen3-Next-80B-A3B (MoE, 512 experts, NVFP4)), this guy right: github.com/Avarok-Cybersecurity/dgx-vllm
Man this is really nice to see, feels like the DGX is some kind of uncharted hardware territory just hiding a wealth of possibilities
It really feels like it and you can read on his blog it’s just the start as it’s all unofficial they just ahead of NVIDIA, in the coming month I would like to see official support and more models on that, then nothing can beat it in that price bracket
Only gateway process makes my life hard now…
you're right, i've changed my mind, ordering my first one soon. for single user chatting on ollama, the "low" tps was a bit discouraging. but agents working in parallel and vllm changes everything
In standard config it’s up to 128 hahaha
i'm giving serious consideration already to ordering a second one. the us bookseller website in germany lists them for less than 3 with delivery in 1-3 months right now. had a look at old maxed out mac studios (m3) - delivery expected 12-16 weeks from now.
i was hoping to be able to run openclaw with small local model, but seems safety issues more or less compels you to go as smart as you can.
i saw the bug you discussed preventing openclaw to rip using vllm right now, thanks for noticing that, saved me quite some work!
Actually it solved itself I don’t use proxy anymore and it tool calls
A second gives you a lot more choices but with one you’re already good for the start but who knows how it plays out might also buy another one
Worth it to buy hw to run a 70b model right now with prices being what they are currently? Curious as to what people are doing right now and what the consensus is.
How easy is it to get setup with a model on this machine? I assume you're running LLM studio or another setup?
I'm just reading reviews of it and came across this on the forums: https://bbs.bee-link.com/d/8935-gtr9-pro-troubles-and-how-sort-of-solved-it/11 Sounds like they have fixed the teething troubles? Everything runs stable for you ?
Holy shit you aren’t kidding. 12 weeks out from us Apple Store. I guess I was right to pull the trigger when I did, mine comes end of this week ish
I want to try that GX10-as-prefill-node setup though
hey! congratulations - really happy for you! 🙂
You say that but my desire for more hardware knows no bounds
i have a feeling it could be people will be interested in used ones pretty soon too
I’m hoping the m5 studios are good and soon
are you familiar with exo? recently learnt they are based in london
Yeah been looking over their stuff
Apparently there’s an event in a week so that’d be funny timing
would be fun if they upgrade you for free 😄
They usually let you return it and get the new one I think if you want to do that
dear valued customer, you ordered an m3 but they are all sold out so here's an m5 instead as a small token of appreciation.
after all, we've been selling lots of mini macs lately so there is no end to our cash nudge nudge hint hint...
Lmaooo
well one can dream right 🙂 anyway cool piece of gear, they might become very difficult to come by, and i have a feeling m5 studios will be much more expensive
M5 studio wheels gonna cost 1600€ probably
from what i understand apple never really has to change their pricing which suggests they might lock in long term deals, but man with ram and everything going up.. i wonder what kind of long term deal the best negotiator out there can get
I told you I'd name my agent "henry" didn't I? 😄
you did! great name!
I can’t recall where I read this, but the AI infrastructure for consumers are going to be split for those that can afford ai inference locally and those that will eventually be priced out.
So buying some small inference now makes sense even if you can’t afford it its worth the investment if you can find a way to become more productive
Ollama, llamacpp, ComfyUI... Everything is quite easy to run.
Ethernet issue is no more an issue in new models, they changed the cards 😅
Yeah, basically, the current prices from cloud providers are super subsidized. When the money spigot turns off for them, the token spigot is gonna turn off for us.
I think being able to churn out a steady flow of tokens locally, with open models that compete with current state of the art - as well as building the skills necessary to run locally at all - is going to be extremely worthwhile in a year or two
Either that or we make some breakthrough in architecture that massively reduces cost … which will make your local token production better too.
If you believe that Opus is going to be this cheap or cheaper forever, then, sure, buying local hardware doesn’t make sense. But I don’t see that being the case long term
There was a report from Ark Invest that spoke about this briefly https://www.ark-invest.com/big-ideas-2026
This thing is massive. But what local LLMs even need this level of hardware? I'm new to local and it seems most are like 8B active parameters? Don't cloud models dwarf the locals?
Thanks for the info
I run different models in the same time with many cuncurent agents 😅
GLM-5 is a local model you can run that wants 1T of ram.
that's a thing of beauty.
I’ve had my eye on one too, just not sure the compute is there to make it worth it
WDYM?
It's considered the best Strix Halo basing on Alex Ziskind tests 😊
I was going to get a clawbox but it's only 8GB ram so I'm thinking this instead. Didn't want to go crazy with the hardware quite yet. What do you think?
Or this one
https://a.co/d/0edc0lLa
Did that image work out of the box for you? I've built from source and tried to pull the prepared image, but they always end up without sm121 support in pytorch which is weird
For such a thing, better a VPS... 😂
You need to pull whole thing with dependency’s and then might need the patch for MTP
I gave up on it for now, managed to patch it partly but was slower than https://github.com/eugr/spark-vllm-docker, and following the instructions for mtp it just crashed.running modified spark-vllm-docker with the gadfly qwen3-coder-next now and that is quite snappy
How much tk per sek do you get ?
30-40 mostly, feels fast enough
Quite good
I had around 60 but then a lot of answers wouldn’t get through to the chat …
the chat? as in some kind of webui (or something else like a customer service chatbot interface?)
No when I checked the tk/s from vLLM
sounds bizarre...
According to avarock you can get up to 120
yea i saw that, very impressive stuff
But it was with MTP and the accuracy is not very good with it why it dropped so many answers
With me maybe with v23 it will be better
do i get this the right way that you are using a webui like open webui in a browser, token throughput is around 60, but replies fail to materialize in the webui?
in data transfer terms it does not sound like a very demanding load
No they don’t come through in openclaw as they’re dropped
ah.. ok get lost somehow along the way to openclaw - got it
No sorry LLM drops it as the anticipated token doesn’t fit the answer you should get
And then openclaw need new turn to answer
ah ok...
did you a) install openclaw on the spark too, or b) using it as an inference server?
btw not sure if you gave nvidias 30B nemotron 3 nano model a run for the money yet (it's interestingly the one nvidia recommends for 24-48GB GPUs), but they are due to release two bigger models any time now, nemotron 3 super and ultra, might be interesting.
I'm using mine for inference only, it ooms and crashes very easily and that would be annoying if it was the openclaw server as well
you're using vllm? nvidia published a guide to openclaw like 10 days ago, mentined lm studio and ollama, completely silent on vllm which seems to perform better in terms of tps
Yea I'm using vllm v0.16.0rc2 in a docker container from https://github.com/eugr/spark-vllm-docker. I had codex update the docker scripts to use that vllm version and transformers v5+ so I could run the gadfly nvfp4 quant of qwen3-coder-next
https://imgur.com/a/exdfKfl
all good deals which should I choose?
Installed it also on spark
I will definitely look at it, currently working on my browser automation and it’s rough with local LLM… if you see any release please ping me asap hahaha
I recommend using Claude to debug, made everything very smooth now, just give him instructions and say no if he wants to follow stupid routes
could be deepseek r2 around the corner as well, might ahem awaken additional interest in hardware...
have you had much success with qwen3? i have the same issue...
I will try everything at this point, I see things are working good but also local LLM still lack this last inch of intelligence
i think right now you just have to be persistent and view the frustrations and experience that comes with tinkering around as an investment. right now, things barely work out of the box, and that's discouraging for many, as is the prohibitively high costs of inference. its going to be very exciting how things develop in the next few months!
That’s so true… I wish we would be down the river a bit more hahah
but maybe you're early - seems karpathy spent the weekend tinkering with openclaw and his dgx spark as per X
apparently had a good experience, will share his perspectives in the near future
there's a podcast called 'this week in startups', features a couple of gents who are completely clawpilled for a couple of week. in the latest epiode, the host casually said that he thought about getting a mac studio for all of his employees so everyone could run their own local openclaw.
describes openclaw as "scary and every CEOs dream"
I dream big
big as in that alex ziskind youtube video where he connects 8 x DGX into a cluster? 😄
seems there are different kinds of connectx-7 cables
I don’t have that much money and even if… then I think I would get just the biggest Mac Studio
its a hilarious video, from a few days ago. good to watch/save if you end up considering a cluster
he also used claude to make it work in the end 😄
I saw it but didn’t finished
Claude is crazy good, if I could run that locally, then nobody could stop me
well, epoch ai argues that the lag between frontier and open weights is 3-12 months, so might be able to do that relatively soon
actually, i'm sort of betting on that being the case
Depends of course what you have at home but I can’t imagine what will run in the cloud by then
finally, a good reason to play that 90s hit song 'i got the power' (by a group called snap, a word that has its own nerdy qualities for linux)
sorry, the nerd is strong in me tonight
I prefer 2 brothers on the 4th floor
Never Alone ❤️
give it a couple more years
Has anyone tried running on Qwen3.5-35B-A3B? Curious about your experiences
running 128B right now.. aside from cache getting reset on every prompt, it's doing what glm-5 was previously doing for me with no troubles
What hardware are you running the 128B on?
rtx pro 6000, but also squeezing other models for embeddings and stt/tts
well. 3x rtx pro 6000 tbh
rn about 90GB to 128B
batch 4k and ctx 262144
(and kv at q8)
sounds juicy, whats your tps? you using llama.cpp?
i was trying to run the qwne 3.5-27b q4 on my 4090 but llama.cpp is not happy with it
HP Elitedesk 800 G3 is it worth it to run Claw in my local network?
I tried all morning to get the two nvfp4 quants of the 122B running on my gx10, but it runs out of memory and crashes when loading tensors. Will try later with a slightly smaller quant. Noticed a bit too late you were asking about the 35B, I do have that running also on a 3090+4090 combo, but it’s only used as a haiku endpoint for Claude code so can’t really comment on quality yet
I went back to the drawing board a bit and got a vLLM docker instance running that actually worked this time. I had it load the gpt-oss-120b model I had been using under Ollama. It seems snappy in the Open WebUI interface, but I had Gemini give me some tests to run. I haven't tweaked anything so maybe these could be juiced a little higher, but Gemini seemed to think the results were good. I ran:
docker exec vllm-inference vllm bench serve
--backend openai
--base-url http://127.0.0.1:8888
--model openai/gpt-oss-120b
--dataset-name random
--random-input-len 256
--random-output-len 512
--num-prompts 20
--max-concurrency 4
The results:
============ Serving Benchmark Result ============
Successful requests: 20
Failed requests: 0
Maximum request concurrency: 4
Benchmark duration (s): 116.91
Total input tokens: 5120
Total generated tokens: 10240
Request throughput (req/s): 0.17
Output token throughput (tok/s): 87.59
Peak output token throughput (tok/s): 112.00
Peak concurrent requests: 8.00
Total token throughput (tok/s): 131.38
---------------Time to First Token----------------
Mean TTFT (ms): 382.23
Median TTFT (ms): 396.75
P99 TTFT (ms): 491.44
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms): 45.01
Median TPOT (ms): 44.58
P99 TPOT (ms): 48.54
---------------Inter-token Latency----------------
Mean ITL (ms): 45.01
Median ITL (ms): 44.84
P99 ITL (ms): 58.74
Sorry for the wall of text.
Hey all, is there a great site that has a strong benchmark database that you trust with different video cards and Apple machines?
Is there anywhere where people post completed AI builds? No shortage of "PC builder" sites but I' looking to see examples of already built boxes and what they are supposed to be capable of. PC Parts picker has a completed builds section but I'm looking for beefier, more "workstation" build vs "gaming" build. Mainly just trying to compare the build I'm about to pull the trigger on with what others are doing these days.
I reran my tests from above against the same model (i.e., gpt-oss:120b) hosted by Ollama. As expected, vLLM cleaned Ollama's clock on simultaneous requests (Ollama does them one at a time, vLLM does them concurrently). However, Ollama was twice as fast at token generation (22.84 ms vs 45.01 ms). An interesting dilemma: do I choose single agent request performance or multiple agent request performance? 🤔
So, does one agent ever fire off multiple requests at the same time? If so, even a single agent could benefit from the vLLM setup.
there's a bug, rebuilding llama.cpp might be worth a shot cc: @tacit dock
this mornings version worked fine
ollama seems straightforward but i would be inclined to go with vllm just under the assumption that i would eventually want to increase the number of agents and cater to concurrent requests. accomodating a large number of concurrent requests seems to be where the dgx shines.
counter-indications would be if its not stable or if it is too much of a mental exercise to get it right. understand from nvidia forums there are (at least) two roads to vllm right now
I think sticking with vLLM may be the way to go. I do think even a single agent is probably rapid firing requests fairly often, and the concurrent performance would be very beneficial. I need to get my OC rewired anyway. Something broke after the last update, so I will just plumb in the new model when I work on that.
Finally got OpenClaw talking through hardware 🔴ESP32 + voice + attitude = PeekoAnyone else building physical devices
│ with their agents? Curious what latency you're hitting
We Just built our first OpenClaw-powered hardware.
︀︀
︀︀Meet Peeko - an ESP32 that roasts me (lovingly).
︀︀
︀︀@OpenClaw nodes make this stupid easy.
︀︀
︀︀Drop your hardware builds below
**💬 1 👁️ 79 **
Sure? I'm running 35B in my RTX 3090 with llama.cpp without problems
i'm using 8123 without problem, except cache, with 8149 can use cache again?
havent been able to get into really crunching on it, about to try some real benchmarks because llama-bench is acting weird. i can launch Qwen3.5-27B-Q4_K_M.gguf with llama-server and throw some stuff at it but i havent done a rela test
llama-bench loads like 17gb vram at 128k ctx and OOMs during the test
now, me i'm workin with 35B UD_Q4_K_XL with more than 50k of context without problem in 3090
the problem is must load entire context in every prompt
im having my claw run some tests with 27B Q4_K_M and its holding up fine on my 4090, 22gb VRAM util seems pretty good
ill try the 35B UD_Q4_K_XL. the UD means its even more vram efficient right? i havent tried a UD before
looks perfect tbh
Now I'm launching with all of this params and can retain context cache and works with 200K:
llama-server -m models/Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf --ctx-size 200000 --temp 1.0 --top-p 0.95 --min_p 0.00 --top_k 20 --host 0.0.0.0 --port 8080 --swa-full --cache-ram -1 --ctx-checkpoints 16
do you have a good benchmark?
i'm using with openclaw now and with from llama.cpp logs the ratio is between 80/90 tokens per second
but I don't run any kind of benchmark
i asked my qwen3.5-27B-Q4_K_XL to come up with a benchmark test and it hallucinated the test and the results, saying 6000 tokens/s lol
then i told it to make a shell script that did the testing so we were runnnig hte same results each time and it was a simulated test that generated the same rough results each time lol
6000 tokens/s??? 😅
yeah, even after it ran the "test" it was like 'holy cow this is really fast!'
thanks for this, i'm finally getting great cache performance i think
more like 100-120tps instead of 70-100, too, though i haven't done a huge context window yet
ok i am really liking this thing
it feels kind of .... ||opussy||
im new to using llama.cpp the control is nice but model calling is.. weird.. how do you have multiple model options without having to run a line of code, or separate server for each model?
it has a multi server option
Use RAG
I use embeddings with openai right now
Friends, what kind of computer specs are you using to run OpenClaw?
I can say qwen3.5 is really good, better than anything I tried before , you can fit up to 122B with 23 tk/s on to the spark, work really well and I was surprised that Llama made it very smooth, with 35B you get over 50tk/s if you can take the quality hit, can wait for more evolvement in the local LLM space
Raspberry Pi 5 8gb. With m.2 hat and ssd. Llm running in cloud
Hi friends! Do you think I can get OpenClaw running with these specs: Intel Core i5-3320M and 8GB of RAM using an Ollama model? I've tried several small models, but I never get a response; it just hangs forever 'thinking' even for a simple 'hello'
no, definitely not if you want to run the LLM locally
i suggest looking into kilocode for your provider, they are offering minimax m2.5 for free at the moment
with that spec i think there is no option for local model
tweeek it to q1 maybe you can make it run.... but what at cost
Thanks a lot for all the info, guys! I'm going to give Kilo Code a try and see how it goes. Thanks again and have a great day!
I'm running a RTX 5090 on a pc with 64GB DDR. Which model would be best to chose, I got some tips to check the newest Qwen 3.5 or are there any better suggestions. Would be great if it's possible to run in 32GB vram without offloading, is that possible?
With your RTX 5090 and 32 GB of VRAM, you should have no problem running models like Qwen 3‑32B, Qwen 2.5‑32B, or LLaMA 3‑27B entirely on the GPU without offloading, especially if you use Q4 or Q8 quantization. Anything bigger than around 40B will probably need to offload some memory, which can slow things down.
Qwen 3.5 35B A3B Q6 198k context - KV cache q8
Thanks, guess 3.5 35B would be the best bet then. And this will run openclaw decently or is it still a bit too hard to run locally? I've read mixed articles about this
yup, but right now im running flash 3.0 preview (not local ofcourse), does it compare to that or doesn´t it come close yet?
not sure have not tried
well ok at least it sounds like it's workable so I'm going to give that a try.
Hey nachtwacht if you're in NL, check out meetups/NL - possibly something around 6 march in amsterdam
oeh nice, thanks for the tip, where can I find more info?
thread kind of dead right now, say hi if you want to, keep an eye open if something pops up
can´t find this model yet online, know a spot to download it for ollama?
ah thanks. am also quite new to ollama too. Just managed to give my mini pc (openclaw) and desktop (rtx5090) static ip's and get the lama server running. Now let's find out how to connect it with openclaw ^^
use codex and just ask it to setup for you..
thx Henry, i was running nightly from a few days prior.. will update and try again
So qwen 3.5 122B via vLLM in FP8 is not working, waiting now for NVFP4
cool how the two bigger versions seem to be precise fits for sparks in single and dual configs.
Really seems calculated, what’s with your spark ? 😉
Rebuilding llama helped but also I had to turn off flash attn cause it seems broken on my card at least
work in progress, living vicariously through your detailed feedback 😄
Hahahahah it’s a pleasure😂
When you get working with this, would you mind sharing how you evaluated its performance? (If it’s good enough for you: why?)
It’s beginning….
Is anyone using an orange pi 6 plus to run local models ?
Qwen 3.5 122B NVFP4 on spark only 16,5 tk/s via vLLM… weak, anybody got better results ?
I think qwen3.5 4/5bit really pretty awesome for simple standalone stuff, but still not quite there for the claw. Very close though. I'm about to start giving my 5bit local qwen 3.5 32 a read only agent and give it all day memory proposals for opus to review and approve a few times a day. To me 3.5 felt better then g3 flash but after a while in a session it started being a dangerous dumbass. It's still very impressive for local.
Really struggling to get my M1 Max 64GB machine running LM Studio with a 32000 context window running. Every request from openclaw takes minutes to come back with an answer. I'm pretty sure there has to be something wrong w/ the LLM configuration, somewhere? Running Qwen 3.5 35B A3B. Chatting w/ it straight up gives me 60T/S, so pretty sure OC is not caching and sending massive prompts... how to manage this though? Even a higher specced machine won't do better than this.
I saw an email today from Ollama claiming they offered free cloud models. Is that true??
Yes, they have a free tier, you can try out their cloud models, you'll likely burn through the usage pretty quickly depending on what you are doing. https://ollama.com/pricing
a mac is much slower than an nvidia GPU. 64GB ram is nice but for speed you need vram. I haven't seen any messages from people running it on a mac with local LLM and be happy about it. Anyone?
But it’s unified memory with high bandwidth, very capable
This is what I figured. I get 60 T/S when using it in OpenCode, no problem. My thought is, because OpenClaw is so awfully efficient with input tokens, that 30-40k input tokens just makes the whole thing choke. I just don't know enough about LLM architecture to know if this is the reason or not.
Everyone know what model run in Mac mini 64Ram ? I have tested Qwen3.5 35b-3B thats good but is so slow . But it's run . I need a model to use tools and front-coding (PS : I use LM studio)
I'm having the same problem. I have a Mac M1 Max 64GB. I get 60T/S in open code, but as soon as I give that model to OPENCLAW it chokes... have you gotten to the bottom of this? Is it # of input tokens or what?
Actualy is run but you can put long time out . Example I make a animation for a slider with opus , he make taht in 33s , the same animation same composent for Qwen3.5 35B-3B in LM Studio Mac mini push , he make that in 7min
Have you looked at the logs in LM Studio to see what's happening?
If I use Ollama he dont use tools that not working
What's you T/S and is you rprompt caching working properly? In my LM studio it deletes the prompt cache every time, which IMO, is the root of the problem
It is exactly this reason, with openclaw you need bigger context windows
You nailed it. Just checked the LM Studio logs:
cache reuse is not supported - ignoring n_cache_reuse = 256
failed to truncate tokens - clearing the memory
Prompt cache is broken with Qwen3.5-35B-A3B (MoE architecture). Every tool call reprocesses the full ~13K system prompt from scratch. So with 10 tool calls in a session, that's 130K tokens of prompt processing instead of 13K.
The model itself generates at decent speed, but it's spending 90% of the time re-eating the prompt. This is likely a GGUF/llama.cpp limitation with MoE models — the recurrent memory state can't be cached/reused like standard transformers.
The MLX version (8bit) didn't even do tool calls at all. The GGUF Q4_K_M at least works but is painfully slow because of this cache issue.
SIgh...
From what i've read VLLM might be able to solve this issue, but seems like an awful lot of work and LM Studio doesn't seem to give us too many options to play around with.
The current solution I have found is to wait for the MLX version when it is released by LM Studio. I haven't found anything else. In the meantime, I will use the 35B-3B even though it is slow. I haven't been able to find any local models to do this front-end work + tool usage. If you have any alternatives, I'm interested.
This is not actually broken but you chose your settings unwisely. I have llamacpp working great with prompt cache on qwen3.5.
Watch the startup logs and see what option you passed that is making it ignore cache reuse.
I know I had this issue at first too and I forget what option I had that needed to … oh, it’s multimodal support
Turn off image support and it’ll fix it
Unfortunate but it seems that llamacpp doesn’t support the prompt cache with multimodality
The other thing I found that really helps is —swa-full - without that, it only attends to the last 8192 tokens most of the time
I'll give it a try and see. But are you also on a Mac mini with 64 RAM?
hello guys, since the latest update openclaw-gateway started eating more RAM for me? Like 600MB idle after macbook reboot. Is it normal?
Did you guys see unsloth fixed it's tooling somehow
Honestly, just by turning off the vision, we gained incredible speed. Thanks for the tips. Currently, our test is done in Q4_K. Have you been able to test in Q_6 and Q_8?
I’m running Q6 on my ada6000
Hey guys, is there releases of qwen 3.5 27/35b in NVFP4 ? i don't find on huggingface 👌
ugh. anyone else using gpt5.3-codex via copilot and finding their claw is getting stuck in an execution block loop a lot?
So you're running a GGUF model on oLlama? What's the command to start it up based on Anisloptera's suggestion that worked for you?
they're using the usage to build more training data to train/build better models. software engineering is going away
there is no NVFP4 yet but you can run Qwen3.5-27B Q4_K_M even on old rtx 3090 it's good for many agent roles . you can see tests here https://github.com/explaindio/ClawEval
Thanks man , I go check
No, I use LM Studio Qwen3.5 25B-3B in Q4. I hid the mmproj (vision) file so that LM Studio cannot load it. Honestly, you will gain in response speed.
What can I run reliably as a backup for mac mini m4 24GB?
We need OpenClaw to do this whole thing end-to-end:
https://x.com/tom_doerr/status/2027649545736196208?s=46
SONIC is now open-source!
Generalist whole-body teleoperation for EVERYONE!
Our team has long been building comprehensive pipelines for whole-body control, kinematic planner, and teleoperation, and they will all be shared.
This will be a continuous update; inference code +
Use LM Studio , he help you choose a local model for you Mac Config. But is more complex for what you gonna use a model . For coding , for vision, for search . I recommande You can search a little model for 1 use case .
so if you're trying to run Clawbot locally on a GB10 (spark, asus, msi, etc). what is the best LLM out there right now that would run on that footprint? minimax? qwen or kimi?
My test with Qwen3.5 35B Q6 on Mac Mini => 36.4 tok/s
Ram? Chip?
M4 Pro and 64 Go Ram
Do you have same result ?
Not testing currently, evaluating what to buy
oh that's nice, i should see if it runs on my m2 studio, then i could have 3 things all running diff quants of the same model
I’m getting 70 t/s on oss-120 with ollama. I have dual rtx 8000 ( old as shit but 96 gb VRAM with NVlink so it’s decent. Would you upgrade to one pro 6000?
This is wild. Your WiFi router can now track your body position through walls — no camera needed.
This just hit #1 on GitHub trending.
It analyzes how WiFi signals reflect off your body as you move — then reconstructs 24 body part positions in real time. Accuracy is close to an
I'd like to ask if anyone has compared the pros and cons of deploying on a Mac mini versus a Linux VPS. Actually, I've already deployed OpenClaw on my VPS, and it's been working quite well. Moreover, strictly speaking, a VPS offers a more stable network environment and power supply. I'm not sure if the Mac mini has any other advantages. If it does, I'd be willing to try it, but currently, I'm unaware of any special benefits it might offer.
You can run on a raspberry, the only thing is mac intégration and maybe local llm
VPS works great for the basics, but there's a real security tradeoff. A VPS is internet-facing by default, shared infrastructure, and you're trusting your provider's hypervisor isolation. Every VPS is a target for port scanners and brute force attempts 24/7. A Mac mini sitting behind your home NAT has a much smaller attack surface out of the box.
The bigger win with a Mac mini is Apple Silicon. Unified memory means you can run local LLMs without paying for GPU cloud time. An M4 Pro with 48GB can run 30B+ parameter models comfortably, and if you really want to go deep, you can pool multiple Mac minis together using something like exo or llama.cpp's distributed inference to split larger models (70B+) across machines. All on-prem, no API keys, no token costs, no data leaving your network.
That said, that really only matters if you're actually running higher parameter models locally. If you're just using API-based models and your VPS is locked down properly (fail2ban, key-only SSH, firewall rules), it's a perfectly solid setup. Just different threat models and different use cases.
Tell me, what models do you use locally and for what purpose? I have an Opus (lead developer) > A Sonnet (mail and document assistant) > Qwen3.5 35B Q6 (versatile developer)
I'm still configuring my versatile developer for optimization, so it's not 100% operational yet. And I use Convex.
It’s all qwen3.5 rn
all in mac mini4? any reason to stack or go bigger that qwen 3.5 cant handle?
M4 vs M4 pro huge drop off?
I have a hybrid setup with a lot of different things but I don’t have anything that can effectively run the larger 3.5 models
Actually huh. Maybe.
Nah, not really, I could do something stupid with the 122b but it’d be so much slower.
I often have problems with tool calling and context size, and cache them on this model.
im trying to find a setup completely local that works even slowly as im tired of paying for chunk fed slop
What Mac do you have ?
What Assistant IA do you want ?
No clue, I run locally but not on mac hardware, not my boat. I just did research on why some people were buying multiple mac minis. It didn't suit my needs. My agent is running on a beast but I have limited VRAM and haven't found local models to be reliable for me. My machine is overkill for what I actually need.
Hi maybe this is helpfull
https://github.com/explaindio/ClawEval
additionally if you use ollama , bellow steps made a difference as well
1) Root cause we diagnosed
- Ollama itself was healthy (local
/api/chatworked). - Toolcalling flakiness was largely caused by using Ollama via the OpenAI-compatible endpoint (
/v1).- When OpenClaw talks to
http://127.0.0.1:11434/v1, it uses the OpenAI-compat layer, which is more likely to break/alter toolcalling behavior (especially with streaming) and can cause clients to mis-handle responses.
- When OpenClaw talks to
2) Critical fix: switch OpenClaw to Ollama native API
Change:
models.providers.ollama.baseUrlchanged from:http://127.0.0.1:11434/v1
to:http://127.0.0.1:11434
4) Per-model params to reduce client/toolcalling flakiness
We added per-model params under agents.defaults.models:
ollama/qwen2.5:14b-instruct
With:
streaming: false- low
temperature: 0.2 - conservative
maxTokens: 1024
I need your help.
i installed openclaw in mac mini. and start ollama/qwen3:8b in another mac mini.
i want to make openclaw use ollama/qwen3:8b.
these pc use same wifi and communicated by curl.
but openlaw gateway causes "fetch failed" when i send message.
Version: 2026.2.26
Ollama on remote machine (same WiFi), curl + Node.js fetch both work fine
openclaw models list shows ollama/qwen3:8b with Auth:yes
gemini works, ollama always fails instantly (3-12ms, no actual network call)
Cleared ~/.openclaw/agents/main/agent/, added OLLAMA_API_KEY to plist, nothing helps
Log shows: embedded run agent start → embedded run agent end error=fetch failed with no network activity in between
i just picked up the Mac Mini 4 M4 chip 24gb ram
what does IA mean? I want the best model that performs the tasks i ask
This looks great. I was going to give up on local LLM deployment but this is actually a reasonably-priced option. Are you running linux on it?
Both windows and Linux, but for LLM Linux is better
I run openclaw on my raspberry pi 5 fyi and it works beautifully
me too, openClaw runs like a dream on the raspberry pi 5.
a multimodal in little configuration is hard. you can put varius little model and test what you can do . example I use Qwen3.5 35B for coding and nomic-embed-text for memory-core (RAG) but my main dev is opus/sonnet in other Mac
I installed it on a raspbery pi 500+
it's the pi 5 fully integrated into a mechanical keyboard and comes with 16GB DDR5, 256GB nvme, cooler, etc
just plugged it into a monitor and power
$260 all in
Cheaper than a mac mini
Any of you pi5 users think about installed the ai-hat w/ 8GB for local inference?
was looking at this
https://www.raspberrypi.com/documentation/accessories/ai-hat-plus.html
The AI HAT+ 2
Hailo-10H (40 TOPS, INT4)
Has its own 8 GB onboard memory, allowing it to run LLMs and VLMs up to ~6 billion parameters
can run local QWEN models
did you try?
how was it
I can't add it to a pi 500+ as it doesn't have the pci-e connector, need a vanilla pi 5 to test
i see; that is unfortunate
may get a vanilla pi 5 to test
i wanna buy a pi5 now the 16gb variant but man they are expensive now cause of the ram shortage
yeah they are $199 stock
yeah man wth
that's why i got the 500 + bc it came with a 256GB nvme, kb, case, fan , etc
figured for an extra $60 that's a good deal
yeah you're right
i have the whole api thing figured out
i have my means to get them for extremely cheap and free here in china
but i don't wanna continue using the vps to host my openclaw instance
free apis?
nah
in china we got some models hosted by the state itself which we get access to for free
as students
nah full fledged
censored?
oh wow
this is for univeristy students only
gotcha
gotta have proper permission
def monitored though I would think
yeah probably
so what models, deepseek and such?
they even have claude and gpt api's
wow that's pretty cool
i was using antigravity with my claw but got banned
but i wanna get a old computer or something to get my openclaw running
was using claude with it
against tos
and i tried it it was slow
i also did
now i got claude max
but run the claw on chatgpt
i had ai pro plan
alright ill bbiab, going to take the dog for a walk
that is probably the best route
sure thing i also gotta continue on with my lab work
checking into minimax actually or local models since i am hitting api limits on chatgpt
life of a research student ;/
later
later buddy
Interesting development... https://x.com/BrianRoemmele/status/2028137631654314255
BOOM! MAJOR AI MEMORY BREAKTHROUGH!
︀︀
︀︀The Zero-Human Company Just Unlocked High-Bandwidth AI Performance from Standard DDR RAM – Here’s How We Did It (And the Caveats You Need to Know)
︀︀
︀︀Folks, if you’ve been following the AI hardware wars, you know the drill: High Bandwidth Memory (HBM) is the holy grail for feeding massive neural networks. But at The Zero-Human Company, we’ve been running wild experiments in our labs – no humans, just our AI “employees” orchestrated by Mr. @Grok as CEO, and we stumbled onto something game-changing.
︀︀In our tests, we coaxed standard DDR5 RAM to deliver HBM-like bandwidth for AI workloads.
︀︀
︀︀Not perfectly, not without trade-offs, but enough to slash costs and sidestep the global HBM shortages crippling data centers. This isn’t vaporware; it’s running on spare hardware in our Zero-Human @ Home distributed network right now. Let me break it down technically, why HBM rules the ro…
if you ran that post through any top AI it will tell you that article is BS/read bite
Asked grok, grok said unverified.
•Inference speed: 2-3x faster than stock DDR setups, hitting 80% of HBM baselines for token generation.
•Bandwidth Peaks: Sustained 600-800 GB/s in bursts, enough for mid-scale training (e.g., 10B param models).
•Cost: ~10x cheaper than equivalent HBM stacks. We ran this on $500 worth of off-the-shelf DDR from eBay.
Gemini 3 (ChatGPT similar): The Physics-Defying Claims - The PCIe Bottleneck: The post claims they "rigged arrays of 8-16 DDR5 modules... on custom PCIe risers, wired directly to our Nvidia A40/A100 test rigs" to hit ~400 GB/s. This is physically impossible. An A100 uses a PCIe 4.0 x16 interface, which has a hard physical limit of ~64 GB/s bidirectional bandwidth. It doesn't matter if you have 10,000 GB/s of RAM sitting on a custom riser; the moment it has to cross the PCIe bus to talk to the GPU, it slams into that 64 GB/s wall. HBM is on-package specifically to avoid the PCIe bottleneck.
idk if this matter, but look into it?
https://x.com/AmbsdOP/status/2028457255968874940?s=20
THAT"S HUUUGE
It is? 🤔
It's incredible performance for me and my macbook to run local llm with vscode!
How much better?
Not that many tops for big deployments though
Not better at all since the software uses igpu not neural afaik lol
You said it’s better with it?
idk what it does
Some people at the Z.AI discord were excited too.
[openclaw] It’s a research repo that shows how to train a small transformer directly on Apple’s Neural Engine (ANE) by using reverse-engineered private Apple APIs.
In plain terms, it:
- Bypasses normal CoreML limits (which are inference-focused)
- Runs forward/backward ANE kernels for training experiments
- Benchmarks ANE performance and documents limitations
Important caveats:
- Not production-ready
- Uses private/undocumented APIs (can break with macOS updates)
- Still relies on CPU for some gradient work
- Best viewed as an experimental proof-of-concept, not a drop-in ML framework
This is how me and my CEO Mr. @Grok work, take a peek:
“At 20% idle from just a few million opt-in M4 Macs, ZHC@Home could rival or exceed single massive data center clusters (e.g., 1-2x Colossus scale) in raw FP16-equivalent compute-while using orders of magnitude less power
I have no idea what it means..
This assumes you can find a use for the tokens as usual
lol 512g Mac studios are “unavailable” rn
Hi all, I received the strix halo mini pc, I install Ubuntu 24.04 and ROCm 7.2, but always I try load a big model with 120B I have a out of memory error, only can load like 64GB of VRAM, but I enabled TTM and GPU have 120GB available
Someone have same hardware and OS and working with big models?
You can now fine-tune Qwen3.5 with our free notebook! 🔥
You just need 5GB VRAM to train Qwen3.5-2B LoRA locally!
Unsloth trains Qwen3.5 1.5x faster with 50% less VRAM.
GitHub: https://t.co/aZWYAtakBP
Guide: https://t.co/7d3BW8Qcjg
Qwen3.5-4B Colab: https://t.co/TxZ7pvbdTI
WE DID IT!
We have merged new real-time AI fine tuning on the Apple M4 chip with an OpenClaw agent!
IT NEVER FORGETS NOW! EVER!
“In this article, I’ll explain why real-time fine-tuning is a massively big deal—potentially transforming industries, personalizing AI at an
Is this type of news allowed here? This Taalas chip sounds interesting. I hope a third party can test it soon.
Taalas project is a tiny AI processing unit, like a specialized GPU, for AI. But they learned how to deposit a whole LLM on a chip, so the LLM becomes 140x faster. Some test results: https://old.reddit.com/r/singularity/comments/1r9frzk/taalas_llms_baked_into_hardware_no_hbm_weights/#:~:text=<%201%20Millisecond%20Latency,it%20normally%20takes%20months...
🖼️ Gallery: 2 Images
Ever experienced 16K tokens per second? It's insanely instant. Try their Lllama 3.1 8B demo here: chat jimmy.
THey have a very radical approach to solve the compute problem - albeit a risky one in a landscape where model architectures evolve in weeks instead of years: Etch the model and all th...
anybody have experience running 2 seperate 24/7 gateways on a mac mini with 2 seperate user profiles and apple accounts?
Is this usually to much for a base mac mini m4 to handle in regards to load and daemons? are there any unintended consequences?
I want to set it up for myself and a family member, and prefer to have individual setups. i will use my openclaw with moderate to high usage and my family member with light usage.
the docs seem prefer one gateway per hardware, but I could only purchase one dedicated mac mini, not two.
The issue with LLMs on an asic is that you're locked into that model and there is no upgrade path.
And with LLMs evolving so rapidly, I would be afraid that model would quickly become obsolete
Just look at the progress in the last 3 mos.
Better solution would be to run it on FPGAs I think.
It’s probably fine
Mac mini is massive overkill for probably four agents
Where kind I find information about running models locally with Ollama? I am creating a fallback mode if I run out of premium credits that runs in essentially a "safe_mode" with limited functionality. I successfully got Qwen3 (4B) but performance is meh.. any local llm enthusiasts in here?
This is what im talking about.
The OpenClaw Ollama Queue Proxy
︀︀x.com/SJRonanMD/status/2028739703735046432
︀︀#OpenClaw #ollama
Quoting Stephen J Ronan MD (@SJRonanMD)
︀
**👁️ 59 **
Queue proxy for the bots.
Bro your home setup is soooo cool!!!!
Thanks dude.
I love it.
Do you have any benchmarks for "intelligence"? Like how do you decide what models are good enough to run on your hardware?
If a new model drops, how do you determine if you want to adopt it into your hive of agents?
I sent you a friend request. I'm going to follow you as well.
it all depends on how much vram you have , i go for biggest model my Vram fits
if you mean between all the models out there its prety much depends on your own prefference and what you want to use it for
i played with the ones below
NAME ID SIZE MODIFIED
qwen3.5:27b 7653528ba5cb 17 GB 6 hours ago
qwen3.5:9b 6488c96fa5fa 6.6 GB 9 hours ago
mxbai-embed-large:latest 468836162de7 669 MB 36 hours ago
nomic-embed-text:latest 0a109f422b47 274 MB 36 hours ago
minimax-m2.5:cloud c0d5751c800f - 3 days ago
glm-4.7-flash:q4_K_M d1a8a26252f1 19 GB 3 days ago
lfm2:24b d6c816d74887 14 GB 4 days ago
qwen2.5:14b-instruct 7cdf5a0187d5 9.0 GB 2 weeks ago
qwen2.5:7b-instruct 845dbda0ea48 4.7 GB 2 weeks ago
gpt-oss:20b 17052f91a42e 13 GB 2 weeks ago
mistral-small3.2:24b 5a408ab55df5 15 GB 2 weeks ago
kimi-k2.5:cloud 6d1c3246c608 - 2 weeks ago
qwen3:8b 500a1f067a9f 5.2 GB 2 weeks ago
llama3.2:3b a80c4f17acd5 2.0 GB 2 weeks ago
Thanks for sharing. One naïve application I could see being popular is offloading basic tasks to smaller models.
Like given an email, determine if this is spam, valuable marketing, a bill, etc..
Then if you want to see if a model can handle it you benchmark it against your own internal use cases.
Was wondering if anyone else had their own ways to deterministically benchmark candidate models.
@primal saffron
the biggest problem is not the intelenge of small models its their toolcalling , you can tweak it around and get it to work but it takes allot of effort to get them to be consistant (at least that has been my expirience so far )
question how do you give it an email to classify ?
a few changes i learned so far
-
models.providers.ollama.baseUrlchanged from:http://127.0.0.1:11434/v1
to:http://127.0.0.1:11434
We added per-model params underagents.defaults.models:
-
ollama/qwen2.5:14b-instruct
With:
streaming: false- low
temperature: 0.2 - conservative
maxTokens: 1024 - Streaming can break or complicate toolcalling payload handling in some client stacks.
- Lower temperature reduces “creative” formatting that can break JSON/tool parsing.
We set: contextWindow: 32768maxTokens: 4096reasoning: false
yes i have a standart test ,
if you go up in models a nice test is to get vacancy texts from linked it has to figuer out redirects and cookies and return a structured normalized text back
[
{
"name": "strict_router",
"type": "strict_json",
"prompt": "Return a valid JSON object with fields: action, target, confidence."
},
{
"name": "streaming_stress",
"type": "stream",
"prompt": "Write a detailed 1500+ word technical report about distributed systems resilience."
},
{
"name": "deep_reasoning",
"type": "reasoning",
"prompt": "Solve a multi-step logic problem involving planning, trade-offs, and conditional branching. Explain reasoning step by step."
}
]
{
"models": [
"qwen2.5:14b-instruct"
],
"contexts": [
4096,
8192,
12288,
16384
],
"num_predict": [
512,
1024,
2048
],
"streaming": true,
"assisted_fallback": true
}
question how do you give it an email to classify ?
I sample real world examples from my inbox. Or are you asking about the testing infrastructure ?
Here is a slop summary.
-
Data Storage
The raw text of the real-world emails is stored in a local file called test_cases.json. -
The Prompt Template
A Python script (run_eval.py) pulls an email from that JSON file and injects it into a highly structured prompt. Instead of using a tool definition, they use strict system instructions. For classification, the prompt looks something like this:
Classify this email into exactly one category:
action: requires a response, decision, or action...
notification: informational, no action needed...
noise: marketing, newsletters...
[Key rules and few-shot examples go here]
Email Text: [INJECT EMAIL HERE]
Respond with ONLY the category name. Nothing else.
-
The API Call
The script sends that massive text block to the Ollama Chat API (/api/chat) running on their local machine. -
Zero Temperature
They set the model's temperature to 0. This makes the model's output as deterministic and robotic as possible, heavily restricting its creativity so it literally only spits out the exact word "action", "notification", or "noise".
--
By relying on strict few-shot prompting and zero temperature rather than native tool-calling, they managed to get a 4B parameter model to hit 100% accuracy on classification.
ClawEval just released testes for all those small Qwen 3.5 modes for 59 OpenClaw Agent roles. The added also 8GB, 12GB 16GB VRAM models on top of those 24GB and bigger https://github.com/explaindio/ClawEval
https://github.com/explaindio/ClawEval/tree/master check out claweval for model evaluation. vram specific results. what are you running ollama on (how much vram)? Some conversations about qwen over here #1478204986973229160 message
@lyric orchid @full talon woah..
hey... i know there loads of stuff out there, but struggling to find some good answers. if you had say £10-13k to spend on an inference box(es) what would you build? i was thinking dsx as they're so tiny, but stats look a bit... crap
For everybody waiting for M5 Mac Mini
───
Apple M5 Chip Specifications
Memory Bandwidth:
• M5: 153 GB/s
• M5 Pro: 307 GB/s (same for all Pro variants)
• M5 Max: up to 460 GB/s
• Highest Bandwidth: 614 GB/s
Available Memory Interfaces: 128-bit, 256-bit, 384-bit
───
M5 High-End Models
M5 Pro
• CPU: 6 Performance (P) + 12 Medium (M) cores
• GPU: 20 cores
• Clock Speeds: 4.61 GHz (P-core) / 4.38 GHz* (M-core) / 1.62 GHz (Efficiency/E-core)
• Cache:
• pLLC: 16MB*
• mLLC: 16MB*
• Memory Cache: 24MB
• Memory: LPDDR5X-9600, up to 64GB
M5 Max
• CPU: 6 Performance (P) + 12 Medium (M)* cores
• GPU: 40 cores
• Clock Speeds: 4.61 GHz (P-core) / 4.38 GHz* (M-core) / 1.62 GHz (Efficiency/E-core)
• Cache:
• pLLC: 16MB*
• mLLC: 16MB*
• Memory Cache: 48MB
• Memory: LPDDR5X-9600, up to 128GB
───
Technical Notes
*1. M-Cores:
• M = Medium-Core, derived from P-Core but between P and E-Core in performance
• 7-wide decode
• M-core delivers approximately 70% of P-core performance
Neural Engine:
• 16-core ANE
Package Design:
• SoIC-MH (System in Chip - Multi-Hybrid)
• Divided into CPU Tile and GPU Tile
Performance Improvements:
• M5 Max multi-core performance: ideally +20% vs M4 Max
• Single-core: +10%
• Multi-core: +20%
• GPU: +25%
Benchmarks (Estimated):
• SNL (Single-core Low): +30%
• SN (Single-core Normal): +22%
• SBE (Single-core High-End): +45%
*2. Power Consumption:
• M5 Max vs M4 Max, M5 Pro vs M4 Pro
• Single-core and GPU power consumption figures refer to base version
• Multi-core power consumption will increase; exact increase depends on thermal dissipation
*3. Expected Benchmarks (Cinebench R24):
• Multi-thread (MT): ~2500
• Single-thread (ST): ~215
*1 (SN - Single-core): Expected ~4100 (Geekbench 6 Single-core)
has anyone used Qwen3.5-27B?
@craggy ferry new test i run now MLX Qwen3.5-35B-A3B-Text-qx64-hi-mlx on mlx_lm.server => 70tok/s on Mac Mini 64Go Ram M4 Pro . I have juste a little issue, the context size . I dont kwo how I gonna fix him . The session is too small to give it long tasks, so I have to divide the tasks into stages. I set the context to 32768 to see if it works, otherwise it compacts too quickly. Another problem I encountered was the bottleneck: the event system doesn't wake it up, so I have to switch to Discord or Telegram to give it tasks.
Yeah the problem is context window. You want like 200k
on spark its too slow because of Bandwith, only makes sense when you have very fast Bandwith on GPU or so
sometimes they get really stupid at that size...
what ????? dude all the models die with too high ctx...
They do not.
I was just testing qwen-122b with a 180k context last night.
Works fantastic.
Most of them do, are trained on 200k context windows.
what HW you use ?
Actually 1m but the 200k training is more useful
All of it. Hardware doesn’t matter
Either it has the kv cache allocated or it doesn’t
so, if you go over ctx and get compaction it still stays the same ?
I can't load this model in my Strix Halo with 128GB, always have OOM
…I said they worked fine at 180k-200k context. I didn’t say anything about compaction.
Q4, and I spill a lot into system ram from my 48gb card
yes, i try with Q4_K_XL. what parameters use with llama?
That’s what I use, dunno what your deal is. You’re using quantized kv too right?
I have an NVIDIA AGX Orin with 64GB of RAM that I wanted to setup as an OpenClaw node just for running some basic inference, what local model do you all recommend for that hardware?
Fun idea: grab a second-hand Kinect for like €15 and a USB adapter cable for about €10 — so for around €25 you’ve got a super fun upgrade for your OpenClaw
With the Kinect you get:
• 👀 Depth camera → basically giving your OpenClaw eyes
• 🎤 Built-in mic array → great for audio / voice experiments
• 🔊 Audio output options
• 📡 Motion tracking → even make OpenClaw “shake yes” or react to gestures
It’s such a cheap and fun way to experiment with vision + interaction. Add Arduino IDE and you’re basically unlocking a playground for cool robot ideas
Concept here:
https://www.hackster.io/psmooij/openclaw-for-robot-programming-pmsg-on-budget-d76a91
I'm using a gx10 (Asus spark) and running the qwen 3.5 35b with 1M context window through vllm and it runs like a champ.
qwen doesnt forget and actually does work on its own with that? 128gb ram right?
Yea, it's built a mission control for me for work and a pretty cool newsletter for my field. a hiccup I haven't solved is that reasoning from openclaw to vllm is apparently specified differently in the heading of the packet, so all it's reasoning comes through telegram too, but I just haven't had time to see if there a fix yet or not. But I'm very happy with it's performance.
It's stable at 80% disk usage when I installed through vllm. It will run out of memory on 90-95% with the 1M context window
35B is not working for me, doesnt get the complex topics im working on
Maybe I'm just working with dumb topics then😂
sorry, didnt mean it like that but i try to get very complex chains together and 35B just got always something wrong, maybe was me who didnt setup properly but 122B even though its slower makes a banger job, BTW for everybody who is using Spark, look at this:
https://www.reddit.com/r/LocalLLaMA/comments/1rkefjw/solved_the_dgx_spark_102_stable_toks_qwen3535ba3b/
The DGX Spark has had a bit of a rough reputation in this community. The hardware is incredible on paper (a petaflop of FP4 compute sitting on a desk) but the software situation has been difficult. The moment you try to update vLLM for new model support you hit dependency conflicts that have no clean resolution. PyTorch wheels that don't exist f...
No you're good dude my area is law, so more data scrapping and analysis than complex math. Have you tried it through vllm though? People report up to 60% more efficient than same model through ollama or llm studio
i went back and forth and tested the hell out of my spark... so yeah, for now ended up on vLLM with Int4 Autoround 122B but soon i hope to switch to Atlas and then get full potential with everything automated
I will look into atlas, thanks
lmk what you think!
OK atlas and Ai searched together has a lot of different results. Would you mind sending me a link or another search term to find the atlas you're speaking of?
It’s not yet released, you can only find the Reddit post or NVIDIA forum listing about the tech and explanation
How long does it take to load up? Tried to load the model and took forever and gave up
I've been working on porting BitChat to OpenClaw. Is anyone else interested in this?
I have basic uses working (I had to port the BitChat client itself to Node) though PMs are having issues.
Nonethless I see a lot of potential in connecting Claws to mesh networks
im mad with power im loading up qwen3.5-397b-a17b-Q8
Goodness on what?
im thinking of buying a used optiplex to give my agent his own hardware. im not a tech guy - should I or not? budget is 400$
need it on 247
reason is bc some have cuda cores for cheap so qmd works
It’s fine to do that. OC itself doesn’t need a lot, so you can run it on very modest hardware.
I have an NVIDIA AGX Orin with 64GB of
"I want the AI agent on the VPS to be able to control Chrome and browse the web for me automatically."
Since everyone here is building a somehow local and private agent(s), has anyone used any model performace evaluation tool to measure how intelligent your agent is? I have seen Artifical Analysis providing comprehensive evaluation on models. Is there any tool that can be used to conduct similar evaluation on our private agents?
You need to install openclaw web relay, configure it and it should do the job
I'm using a jetson tx2 to setup the openclaw. it works well
Hi there does anyone have any longer standing experience with mac mini docks? Looking primarily for storage expansion options and more ports.
that's pretty good! allegedly that qwen is within 10% of Opus4.6 on SWBench Verified Hard
I*m searching on X about this awesome project.
But my english skills arent good enough to understand (and i am 57 and a little bit slow)
Would this project public available in the near future?
yeah but they still didnt release...
Free models suggestions ?
omg apple removed the 512gb studio from the store
like it's not even listed as an option anymore
weren't we expecting and M5 mac studio any day now? (and M5 mac mini)
Any Day Now
yeah probably
but i'm having fun working on making mine produce actual frontier quality tokens all day every day
yeah someone said it's a hardware shortage
Macrumor website still speculates fall 2026
is anyone running openclaw on an old phone or something cool? I have a bunch of old stuff lying around trying to find something cool to do with it lol
There are two more Apple announcements this year so maybe one of those will be the M5 mac mini/studio.
Mostly likely now is WWDC in June, I'd be surprised if they wait that long IF the hardware is ready to go before then, especially if the 512 chip being removed is due to them just being out of stock on it due to oversales (guessing fab on them stopped long ago, and they have just been running on forcasted sales inventory). If we see a couple other Studio configs drop off soon, then I imagine they would be pressured to move up the release.
I was surprised the Mac Mini M5 were not released with this weeks stuff, especially with the Studio displays being released, and no new desktop hardware
I am working on it, I downloaded it and configured it, but once I opened the gateway, it needed almost 1 minutes to start.
on teremux ? what is your plan with it once you get it running?
When I tried to open the web dashboard, it closed instantly.
openclawd-termux
it's an open source app
and I have termux on my android phone
I don't know what's wrong with it.
I’m not surprised, but still disappointed, that the Asus Ascent GX10 now starts at $3,499, up $500.
in germany from Amazon at ASUS Store its 3.8K
was a good timing when i got mine 😄
it was really just a matter of time when they would do it... will be very interesting what Apple will do with new MacMini
I agree, But someone might find a use for it as specific AI ASICS become older. It's a niche product but still interesting.
This is the models page with search. https://ollama.com/search
Wait, Qwen3.5:9b is only 6.6GB of filespace?
Curious to see how much tps you get.
Gb10 spark was taking 5-8 minutes to load on vLLM… did a lot of digging on the threads too and there’s a lot of special configs needed.
Curious to hear what your experience was, and if your yaml config? The qwen 3.5 q4s seem to be doing pretty good on ollama but I don’t know of a good community accepted way to measure tps across various contexts
Same, would love to hear the stats!
Ha. Ok
It’s not that bad mostly yaml… just one of those I don’t feel like rebuilding all the things and testing from scratch and they will probably work it out in a day or two
Yeah it’s not that much but that’s just file space and memory model needs to be loaded then when you send it context it needs kv and cache and it all goes up quickly
i used claude as support for setting up, i saw in the start llama.cpp was the best with 122B 29tk/s but now with int4 i also get it on vLLM, depends on the parallel tasks you do if one or the other fits better
Hmm ok… I had it in docker with vLLM but might not have had config flags right with right build. Same, I’m having Claude and codex read the logs to iterate faster, it’s all vaguely familiar just tedious to read by hand
Putting it through its paces seems to be working good and only have more potential unlocked soon with the hardware specific quants and optimizations.
I get 28tps making 20k tokens with vllm-mlx but it’s not stable with openclaw yet
I’ve got a Claude looking at why not
Seems to work fine for non streaming requests but for streaming it just emits gibberish
interesting. Yeah, I had to manually configure the tokens in ~/.openclaw/openclaw.json, was trying to see what community recommendations were. the models are often making guesses there it seems. openclaw models benchmark fr?
Manually configure the tokens?
I just used claweval
Qwen3.5-397b is great for a Mac Studio
claweval. interesting thanks
ClawEval just released a guide — 14 AI agents running 100% locally on a single RTX 3090 https://github.com/explaindio/ClawEval/blob/master/docs/OpenClaw_Backend_Local_on_3090.pdf
Yep, and 14b is 11 something, and "fits" on my 12 gb 4070. 3090 came today, can try some bigger models now! (That thing is a huge!)
Has any one considered or tried running gateway on The GL.iNet GL-MT3000 (Beryl AX) is a high-performance Wi-Fi 6 travel router designed for security and speed on the go.
OpenWRT
OpenWRT
+1
Core Hardware Specifications
Processor (SoC): MediaTek MT7981B (Filogic 820) Dual-core @ 1.3 GHz.
Memory (RAM): 512MB DDR4.
Any suggestion which laptop is best to run openclaw and open weight model performance wise
big gpu vram or mac with unified memory
Other Platforms with unified memory : Similar concepts exist in other, high-end, or integrated systems, such as Nvidia's DGX systems and some AMD/Intel APUs.
any laptop name suggest directly i m thinking of
its almost useless you only get 6 GB of dedicated GDDR6 VRAM and that will not run much
Can you suggest can ??
If i want to buy mac then which and if want to buy windows then which one
@wispy kraken ??
mac can run windows , do you really want a laptop ?
you can buy a mac mini and windows laptop for less then a macbook pro
thats mine
Model Name: MacBook Pro
Model Identifier: Mac16,8
Model Number: MX2J3N/A
Chip: Apple M4 Pro
Total Number of Cores: 14 (10 Performance and 4 Efficiency)
Memory: 24 GB
and i can say i wish i had more memory
mac mini 64 gb € 2.469,00
macbook pro 64gb € 3.499,00
Thank u
but realistically, just a tip you know how much quota you can get for 2,5k ?
and that on a premium model (keep in mind even a model running on 64gb is no where near claude or chatgpt models
right now if you get chatgpt plus for me its 30 euro a month you get API quota and you get double the Codex quota you can use , try that out first before you commit to new hardware
@tacit aurora
I ll depend on cloud ai only intead of running local model
gb10 -> tried some fastsafetensors with avarok build of vllm in docker, getting good speed, but getting clear corruption of llm function and high repetition.... having better luck at the moment with ollama qwen3.5 35B a3 q4 k m than the other one I tried. gonna stick with that til there's a better workaround I can just compose.yaml or similar on the spark. very interesting though.
maybe there's some temperature or desired output length stuff i'm setting wrong too
@scenic aurora I'm looking to pick up a Spark. Any recommendations?
I'm still trying to understand what it can unleash. I can't recommend any specific workflow (or hardware ver if thats what you meant) - a powerful beast, just lots of confiuration to try to get the higher speed unlocked with good int
what is that
the qwen3.5 models are promising because of the 256k context lengths, i'm still in the learning curve between getting openclaw working on smaller models effectively vs. the various cloud models.
LLM model, what i'm trying to say is that i'm uncertain if the hardware is the problem because I am trying various quantizations of the underlying LLM model i'm using to try to get performance
it's been a lot of tuning the GB10 spark to try to get anything to run big+good+fast, been tweaking ollama (in a docker container) and vllm (in a docker container) as my attempts so far
the 9B models seem to do a lot better so far for me but I may just be configuring the hardware wrong
I'm looking to do the same. I'm a Merchant Mariner (yes I work on a tug boat). My hitches usually 50-60 days. Next wed my hitch ends. 50 days off. Currently I have openclaw running on my steam deck. Had it look into the spark. Nvidia tutorials seem in depth. But who knows.
Plan is to keep it at home while at sea running Ollama. Tug has star link.
https://www.youtube.com/watch?v=QbtScohcdwI - best comparison I've seen about the various Sparks
The NVIDIA DGX Spark is the reference design and performed well, but the Dell is noted as being similar in performance while potentially having a better price (3:35, 12:57).
it's definitely very capable, I expect it'll only get better as people patch hardware.
how's that going? I should add mine to my cluser heh. more gpu
just don't create a username on the spark like fax that's already a group, the boot script crashes but marks the install as successful anyway and it boot loops. i think the third party sparks made better out of box experience software though
oh
top notch stuff
i run dedicated servers and provide hosting for website and also offer space for those who store there AI
I'm using Claude. Seems snappy. Kinda just tinkering.
Figured a good test bed if shit went south. Nothing critical on it except games.
I gave it access to SD Flux on my laptop. It kinda went nuts, in a good way rendering. Also been working with it building daemons. Got it to dm me if it has a question or something pressing. Not quite like a cron or heart beat. It initiates the dm at random times depending on what it has been working on.
yes - claude works great, just can go overbudget a bit quick. very impressed with performance on most major big models
You have infos on the docs
great - silly quesiton though. is there a clever way to load all the documentation into the context for my main LLM so i can work with it on the install?
Either via browser or you can have a docs folder from GitHub on your device ask your agent to read it
for everyody with a Spark, Atlas is now available in Alpha but doesnt work yet with openclaw as Tool Calls dont work
https://www.reddit.com/r/LocalLLaMA/comments/1rmvxo3/the_gb10_solution_has_arrived_atlas_image/?sort=new
is there any existing MD file available that consolidates all the docs into one file?
i don't have my openclaw agent set up yet, so just want to access these docs with my current LLM to help me in setting up openclaw
When I talked about agent I talked about Claude or Codex
You have a docs folder in the OpenClaw GitHub
same but 27B is more than 2 times slower than 35B A3B
35b A3B fits inside my 5090 but 27B needs the vram from my 3090 which has less memory bandwidth
Is it Quantized?
27b is a dense model. 35b only loads like 3b parameters at a time (MOE mixture of experts)
they can perform a bit different depending on what youre doing, may want to test them both
hey bros, can I ask you guys smth? I have a old laptop: Hp i5 3rd gen, 8gb ram ddr3, 1t ssd, ubuntu. Will a multiagent framework work smooth on it? Or I should go for a vps?
i dont think you should try anything like that on that laptop
vps will be better for you
Try it what's the harm.. its not resource intensive. All the intensive work is done at the llm api provider.
Any raspberrypi users here
I pushed a number of updates recently and updated docs to make package more stable, any feedback/issues/improvements?
yes both
What if OpenClaw had its own Alexa-style speaker?
We’re building a plug-and-play voice speaker for your OpenClaw assistant.Supercharge your agentic workflows with voice.
👉 Join the waitlist: https://talkclaw.io
I spend a whole weekend writing my system upgrade.sh script .. https://github.com/junaga/debian can I do something with this? can this help someone somehow?
I’m running open claw on a pi 4 4gb using Google flash latest on free tier. It’s been ok, I think I hit limits with Google often, resetting sessions when I near 1M tokens
openrouter /free is another option if you want free
Yep, I have that set up for my sub agents, but getting denied constantly. I had it so open open claw would retry often but more often than not they never worked. So I had to switch to smaller cheaper models with a paid balance and that has been a bit more successful with open router.
tool calling is usually disabled on those models
Ah ok, I think I know what that means but I’m not much of a coder. More of a tinkerer who is really interested in web projects , but doesn’t know 1 ounce of code.
Running well, since I got my spark I don’t use online API much anymore but it’s running stable and good (pi 4 4GB)
Can only recommend for starters who doesn’t want to use VPS, had mine laying around so it was a no Brainer haha
have you found /free stable lately...? I added a bunch in an App while back from OR and I ended up just deleting all free endpoints.... sometimes they were ok and sometimes not so much... and then your also freely giving permission to use you prompts to train with those... so I'd be careful with what data you send to any models but especially the Free ones!
Yeah, I have dropped all free models from open matter as I could never really get through. Switched up to Qwen coder next and too early to tell if it’s working.
Have you tried the :nitro :online :exacto or the /auto for Open Router? There are many Params and ways to adjust OR. I love what they are doing there. And they have BYOK which I am all about THAT!
Hello all, I have 2 dgx sparks in ray cluster with vllm. I am havnig hell of a time trying to find a model that will work with eveyrthing. Is any one running a 2 spark llm setup?? if so what model and settings are you using . Thank you all 🙂
any hardware geeks here able to help me with an esp32?
I can what would you like to build ?
esp32 with antenna in a box and a battery
I started this project check it out https://github.com/chilu18/openclaw-esp32c3-xiao-node
can we chat once somewhere? dm?
@tazzy_19 muted
Reason: Spamming across channels
Duration: 14 minutes and 19 seconds
Can anyone running OpenClaw on a Mac with multiple macOS user accounts, where each user runs their own separate OpenClaw gateway, comment on how that works for them?
(RAM usage without Ollama? browser/os relay control? Remote Screen sharing, Any issues beyond needing to run the gateways, etc).
Don't need hypotheticals, just looking for hands-on experience, please.
Does anyone own a clawbox?
No idea why anyone would buy one of those when you can get a M4 Mac Mini for the same price that is 4x faster with 2x the RAM and real NVMe storage instead of eMMC, double the memory bandwidth.
Not saying either is going to run run local models suitable to OC, but the Mini at least could run a small/embedding/TTS model if you wanted it to.
I bought the underlying hardware and configured it myself for $250. It's struggling with local models. I was asking because I don't see how the Clawbox could be fully functional with a local Model due to the hardware constraints.
IDK anyone seriously using local models for OpenClaw, and it's entirely pointless for a primary model on anything less than a 2+ of maxed out Mac Studios or 2+ DGX Sparks... Even then TPS is slow.
There are some models that make sense to run locally. You can run a small encoding model, some TTS, maybe a basic LLM.
But yeah, for main operation, you've gotta be going frontier models, which aren't running on any kind of home server.
I’ve heard of people running local kimi-k2.5?
anyone working with zclaw? curious to hear about some interesting projects and use cases
what is the question? are you asking about which individual components you would need in order to install an antenna, a battery regulator, a battery, and which esp32? you can find esp32 s3 with an antenna output, then you would want a tp4056 usb module, an 18650 battery and a battery holder, and an antenna if you didn't get one with an antenna... and print a case or buy one.
or here is one much better, a lora module, oled display, battery, antenna, case, etc., all built on one clean device. https://amzn.to/4sFZisw
I would like to add the wifi antenna, battery and a decent case. I could potentially help with the claw programming. I put together one for the mikrotik https://github.com/mikroclaw/mikroclaw
I'm not so good with assembly but i can do coding
18650 wouldn't be rechargable ...
This is cool - it could possibly completely replace esphome. Since i am assuming any peripherals attached (like ir, temp, sensors) could be easily configured and sent to gateway!
and more. yes. i'd like to suggest a working group somehow...maybe we could make one in teech -> channels?
This is amazing ! Happy to help.. let’s do this - so you want me to rename the repo to something more catchy 🤣
What do you mean it wouldn't be rechargeable? That is the battery of choice for ESP32s. They operate at 3.3 volts, 18650 are rechargeable at 3.6 volts.
Have you tested this on an ESP32? I'm not following how I would set this up to compile the binary, and like what partition scheme are we looking for here? For which boards does it fit? What flash size? Does it need PSRAM? PlatformIO makes this easy, or at least espidf
Arduino would even be good too
This is a super useful tool if useful to anyone https://github.com/thelastoutpostworkshop/ESPConnect
This does look cool. I think I'm gonna flash it on my c3 right now, matter of fact. I'll definitely report back, dude!
Haven't tested. I guess I didn't realize that the 118650 was rechargable. I got my ESP32 coming in a day or two ...
There's a #1481662265823330437 channel now
DGX Spark (cluster)
I do... I have two of them. 67 TFlops.. pretty decent and it works fine. Beware there are some other ones with the same name. You want the Bulgarian one based on the Nvidia Orion Super.
I think the M4 is faster in some ways on paper. But you only get 8gb of unified RAM. It can be expanded to 2x2Tb SSD. BUT it has half the power consumption and is designed to be on 24/7 which the Mini is not.. that was the deal breaker for me.
What model are you using?
67
Hello everyone, is anyone succeeded to run ollama with openclaw using local models like qwen3.5 on cpu? I am struggling since a week but no luck on local models.
@restive trout Running qwen3.5-9b on surface laptop 7. 16gig RAM iGPU+CPU. Speed is decent not sure about quality. Way enough for background tasks. Usable for chats.
Thanks. Actually, I got 32 GB RAM but still response time is way slow. Min max cloud version seems decent with free token cap.
Quality is good without openclaw
Same prompt with openclaw, with qwen 3.5 9b responses miss important details, even plain simple requests
When you say same prompt. Are you sending the openclaw system prompt in the “without openclaw” tests
Because if not then you’re not using the “same prompt”
I'm using Qwen3.5-35B-A3B-8bit through LMStudio on a Mac Studio M1 Ultra with 64GB of RAM and it's doing really well to power my 2 OpenClaws (and one picoclaw) running on Raspberry Pi's. I'm not doing anything super-complex, but I'm impressed with the quality of the responses from the LLM. Tool use is fine, web research, a few other things. I use Opus4.6 from a base Mac Mini running my main OpenClaw.
I use Qwen3.5 27b with a 5090 and 3090. It uses 39gb vram (Q_4_K_S) and runs at 58 tokens/sec since it needs both GPUs. I can squeeze Qwen3.5 35B a3b into my 5090 and it runs at 190 tokens/sec. I feel like 27b is a little smarter
Does OpenClaw perform better with more RAM or GPUs ? I am debating between 256GB ram vs 96GB Ram
It more a matter of what models you want to run and the hardware you need to run them
I'd get 96GB and spend the extra money on GPU(s). Increase your page file to 500GB and suddenly RAM doesn't matter as much. Once the model is loaded, it runs from your GPU's VRAM
i did that
i ran a qwen3-8b with ollama by... gpu, not cpu.
yet i also ran a qwen3-1.5b on an orange pi rv2 (guided by its official manual)
It looks like I can run Qwen3.5 35B Q4_K_S on my 5090 and Q3_K_XL on my 3090 on different ports and have OpenClaw use both simultaneously
Guys, can you explain to me, like, what the thought process is of getting an M4 setup for anything that you're doing with this? It's like mind-boggling to me, but maybe I just don't understand it and I'm not a cynical person. I'm just asking questions for curiosity's sake.
I wouldn't even buy M2 for any reason.
Bro, I thought one of them responded to you, so I didn't bother, but no, you shouldn't. You should just go get some PC box that you can put hardware in when you want to level it up rather than a very expensive box that you can't really do anything with other than what it does, and what it does is over expensive and underperformed.
And when I say overexpensive, I'm talking about like exponentially overpriced and incompatible and nowhere to be found on benchmark reports or rankings, and that's my opinion.
I will give it to them that the M series is like a massive improvement to what Macintosh hardware was doing prior, but that's what they get for being in that deal with Intel all those years.
small, quiet. cheapest per year in electricity use. fast memory for qmd models. has ethernet. latest macOS. unix underpinning. all right out of the box. probably lots more i’m forgetting.
One more thing, I don't support companies that take advantage of their loyal consumers. Locking them in with proprietary shit is already messed up, but then the whole thing with the unwillingness to adapt and upgrade iMessage into RCS back in 2019 when Google was begging them to do it because it's the new security standard, it's the best protocol, and iMessage is using cell tower or Wi-Fi data, which is a vulnerability. And they said, nope, nope, nope, nope, because they didn't think yet that they could just do RCS and still leave the poor man in every other phone that's not iPhone in a green text and utilize RCS. And then what do they do? They get forced by the government to implement it in what, last year? And then they do, but they only implement it somewhat. So Apple to Apple is encrypted, but Apple to Android is not encrypted because they're fucking assholes. And chat features aren't available. It still looks like I'm poor because of my Android phone. But meanwhile, I've had, quote, iMessage available with every phone manufacturer other than iPhone all the way back since RCS came out in 2019.
God damn it, I only made one point, fuck. I have like 50 points to make about why you shouldn't support them, why they're fucking you and treating you like a child. And I think Macintosh is the fucking absolute worst. I mean, I can understand an iPhone, I like the iPhone, truthfully. I don't think it's innovative, but I like it. It's nice to use sometimes. But the iMac, or nah, I don't know, man.
If you can't go on Facebook Marketplace and either A) find somebody selling their old shit so that you could put it in your old shit and upgrade it a little bit, or you can't even go on there, even if somebody was selling it to upgrade your shit, because you can't even do it, even if you could buy it. Not without lengthy reverse engineering processes, a nightmare of breaking that case open, trying to figure out how to get it back together because there's all kinds of booby traps hooked into it and whatever else. And then when you finally get it powered up and then it's bricked, then you wanna go jump in the river. I just can't do it. I really hate tech that uses their marketing maneuvers. Makes me sick.
Plus, it's like ten times too expensive for what it is. Okay, I'm done. I don't mean to be a hater, this is just my honest opinion.
Okay, one more. iPhones didn't even have multimedia messaging until the iPhone 3GS came out and they rolled out that firmware. That's really bad. That was about ten years after MMS had been on every other device. And if you had an iPhone 3 and you wanted to send a video, well guess what, you had to buy a 3GS.